[gpaw-users] Restarting an optimization run

Jens Jørgen Mortensen jensj at fysik.dtu.dk
Fri Jan 18 12:54:13 CET 2013


Den 18-01-2013 12:46, Ask Hjorth Larsen skrev:
> Hi
>
> Could someone summarize what exactly the problem with trajectory
> restarts was?  Isn't it just a question of fixing the optimizers so
> they use all the data available?

If you restart like this:

atoms = read('a1.traj')
atoms.set_calculator(GPAW(...))
opt = Optimizer(atoms, trajectory='a2.traj')
opt.run(fmax=0.05)


then after the first line, atoms will have a SinglePointCalculator 
object as its calculator and this object knows about the forces from the 
last image in the trajectory.

Second line:  the SinglePointCalculator is replaced by a fresh GPAW 
calculator which doesn't know anything.

Line 4:  The forces are calculated again :-(

Jens Jørgen

> Regards
> Ask
>
> 2013/1/18 Jussi Enkovaara <jussi.enkovaara at aalto.fi>:
>> On 2013-01-18 11:44, Jens Jørgen Mortensen wrote:
>>> Den 17-01-2013 17:31, Nichols A. Romero skrev:
>>>> JJ,
>>>>
>>>> I have a different script for restarting.
>>>>
>>>> Note that this was doesn't have the shutdown observer
>>>> http://en.pastebin.ca/2303938
>>>>
>>>> But I basically do as Jussi, I restart from HDF5.
>>>>
>>>> Which we highly encourage people to use, because it works very well. Even on 100,000 cores :)
>>>>
>>>> I agree with what you say, most people just submit structural optimization for the maximum walltime and just let GPAW run.
>>>> Then its just interrupted either in the middle of an SCF or between ionic steps.
>>>>
>>>> I think all these tricks should get documented somewhere. Because the most common example of a GPAW calculation is a structural optimization.
>>> So, one needs to write a gpw or hdf file after every step in order to
>>> make this work!  Hmm ... these files are huge and contains a lot of
>>> stuff that you normally don't need and they also increase network
>>> traffic.  I wish there was a simple way to restart from a trajectory
>>> file without loosing the last step.
>> I do not think restart files are that huge if you do not save the
>> wavefunctions (they are in any case definitely larger than trajectory
>> files). Also, I think that at least in some cases it is worthwhile
>> to save restart files during the SCF cycles. However, I agree that
>> it would be useful to be able to restart also from a trajectory.
>>
>>> We could work on making it possible to restart from GPAW's text output.
>>> We would need to write all the digits for positions and unit cell in
>>> order not to loose accuracy.  Would this be a good idea?
>> At least I am not very fond of writing all the digits to a text file...
>> One possibility might be to copy the forces and energy which are read
>> from the trajectory to GPAW calculator, and indicate GPAW that the
>> calculation is already converged.
>>
>> With BFGS, one can do a single step directly after reading the image
>> without calculator attached (just with the forces read from the image)
>> e.g.
>>
>> if os.path.isfile('opt.traj'):
>>       atoms = read('opt.traj')
>>       traj = PickleTrajectory('opt' + '.traj', 'a', atoms=atoms)
>>       opt = BFGS(atoms, trajectory=traj, logfile='qn.log')
>>       opt.run(steps=1)
>>
>> but that does not work with optimizers that perform linesearch.
>>
>> Best regards,
>> Jussi
>>
>>> Jens Jørgen
>>>
>>>> ----- Original Message -----
>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>> To: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>> Sent: Thursday, January 17, 2013 7:00:08 AM
>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
>>>>>> I should add that this method is clearly not fault tolerant, is that
>>>>>> what your are thinking of? For example, some node has an error in
>>>>>> the middle of a force evaluation which brings down the whole code.
>>>>> No, I wasn't thinking about such cases.
>>>>>
>>>>> I think most people will just let their jobs run until it gets killed
>>>>> by
>>>>> the queuing system. If you do this:
>>>>>
>>>>> atoms = ...
>>>>> atoms.set_calculator(GPAW(...))
>>>>> opt = Optimizer(atoms, trajectory='a1.traj')
>>>>> opt.run(fmax=0.05)
>>>>>
>>>>> Let's say GPAW is stopped in the middle of calculating the forces for
>>>>> image 8. Then the last image in a1.traj will be image 7 and
>>>>> corresponding forces. If you then do:
>>>>>
>>>>> atoms = read('a1.traj')
>>>>> atoms.set_calculator(GPAW(...))
>>>>> opt = Optimizer(atoms, trajectory='a2.traj')
>>>>> opt.run(fmax=0.05)
>>>>>
>>>>> GPAW will recalculate the forces for image 7 ...
>>>>>
>>>>> How does one solve this problem? Read atoms and calculator from a gpw
>>>>> file?
>>>>>
>>>>> Jens Jørgen
>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>>>> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
>>>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>>>> JJ,
>>>>>>>
>>>>>>> I deal with it by not allowing it to happen.
>>>>>>> http://en.pastebin.ca/2303447
>>>>>>>
>>>>>>> Basically, I use the requested scheduler time (PBS, LSF, or
>>>>>>> whatever)
>>>>>>> and use that as a parameter to an observer.
>>>>>>>
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>>>> To: gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
>>>>>>>> Subject: [gpaw-users] Restarting an optimization run
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I'd like to know how people continue optimization runs with GPAW
>>>>>>>> that
>>>>>>>> are killed in the middle of a force-calculation.
>>>>>>>>
>>>>>>>> Do you have some if-else magic in your script to handle both the
>>>>>>>> first
>>>>>>>> run and a continuation run or do just edit the first script to
>>>>>>>> start
>>>>>>>> from the last image in the trajectory file form the previous run?
>>>>>>>>
>>>>>>>> Do you worry about not repeating the force-calculation for the
>>>>>>>> last
>>>>>>>> image in the trajectory file you continue from? If yes, how?
>>>>>>>>
>>>>>>>> Jens Jørgen
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gpaw-users mailing list
>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>>> --
>>>>>>> Nichols A. Romero, Ph.D.
>>>>>>> Argonne Leadership Computing Facility
>>>>>>> Argonne National Laboratory
>>>>>>> Building 240 Room 2-127
>>>>>>> 9700 South Cass Avenue
>>>>>>> Argonne, IL 60490
>>>>>>> (630) 252-3441
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gpaw-users mailing list
>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users



More information about the gpaw-users mailing list