[gpaw-users] Restarting an optimization run

Ask Hjorth Larsen asklarsen at gmail.com
Fri Jan 18 13:14:03 CET 2013


Hi

Oh, right...

So there are two questions: One is how it should/can be done when
already aware of the issue, and the other is whether something can be
done so it is easy for a user who isn't aware of the issue at all.

One could use replay_trajectory or an optimizer restart file (the
latter was already mentioned somewhere, but the only thing guaranteed
to exist is the previous geometry file).  With respect to the former,
it should be possible to verify that the current image and the last
replayed one are identical, and thus reuse the forces - if
replay_trajectory is implemented that way, which could require another
line of code or two.  I think that would solve the first of the above
questions.  For the second one it would all have to work automagically
without requiring any action from the user.  That doesn't seem
possible unless we refrain from throwing away forces when setting a
different calculator, and the only way to make such a decision
sensibly is to know enough about the calculator to say that they are
identical, and that's a bit on the involved side.

Regards
Ask

2013/1/18 Jens Jørgen Mortensen <jensj at fysik.dtu.dk>:
> Den 18-01-2013 12:46, Ask Hjorth Larsen skrev:
>
>> Hi
>>
>> Could someone summarize what exactly the problem with trajectory
>> restarts was?  Isn't it just a question of fixing the optimizers so
>> they use all the data available?
>
>
> If you restart like this:
>
>
> atoms = read('a1.traj')
> atoms.set_calculator(GPAW(...))
> opt = Optimizer(atoms, trajectory='a2.traj')
> opt.run(fmax=0.05)
>
>
> then after the first line, atoms will have a SinglePointCalculator object as
> its calculator and this object knows about the forces from the last image in
> the trajectory.
>
> Second line:  the SinglePointCalculator is replaced by a fresh GPAW
> calculator which doesn't know anything.
>
> Line 4:  The forces are calculated again :-(
>
> Jens Jørgen
>
>
>> Regards
>> Ask
>>
>> 2013/1/18 Jussi Enkovaara <jussi.enkovaara at aalto.fi>:
>>>
>>> On 2013-01-18 11:44, Jens Jørgen Mortensen wrote:
>>>>
>>>> Den 17-01-2013 17:31, Nichols A. Romero skrev:
>>>>>
>>>>> JJ,
>>>>>
>>>>> I have a different script for restarting.
>>>>>
>>>>> Note that this was doesn't have the shutdown observer
>>>>> http://en.pastebin.ca/2303938
>>>>>
>>>>> But I basically do as Jussi, I restart from HDF5.
>>>>>
>>>>> Which we highly encourage people to use, because it works very well.
>>>>> Even on 100,000 cores :)
>>>>>
>>>>> I agree with what you say, most people just submit structural
>>>>> optimization for the maximum walltime and just let GPAW run.
>>>>> Then its just interrupted either in the middle of an SCF or between
>>>>> ionic steps.
>>>>>
>>>>> I think all these tricks should get documented somewhere. Because the
>>>>> most common example of a GPAW calculation is a structural optimization.
>>>>
>>>> So, one needs to write a gpw or hdf file after every step in order to
>>>> make this work!  Hmm ... these files are huge and contains a lot of
>>>> stuff that you normally don't need and they also increase network
>>>> traffic.  I wish there was a simple way to restart from a trajectory
>>>> file without loosing the last step.
>>>
>>> I do not think restart files are that huge if you do not save the
>>> wavefunctions (they are in any case definitely larger than trajectory
>>> files). Also, I think that at least in some cases it is worthwhile
>>> to save restart files during the SCF cycles. However, I agree that
>>> it would be useful to be able to restart also from a trajectory.
>>>
>>>> We could work on making it possible to restart from GPAW's text output.
>>>> We would need to write all the digits for positions and unit cell in
>>>> order not to loose accuracy.  Would this be a good idea?
>>>
>>> At least I am not very fond of writing all the digits to a text file...
>>> One possibility might be to copy the forces and energy which are read
>>> from the trajectory to GPAW calculator, and indicate GPAW that the
>>> calculation is already converged.
>>>
>>> With BFGS, one can do a single step directly after reading the image
>>> without calculator attached (just with the forces read from the image)
>>> e.g.
>>>
>>> if os.path.isfile('opt.traj'):
>>>       atoms = read('opt.traj')
>>>       traj = PickleTrajectory('opt' + '.traj', 'a', atoms=atoms)
>>>       opt = BFGS(atoms, trajectory=traj, logfile='qn.log')
>>>       opt.run(steps=1)
>>>
>>> but that does not work with optimizers that perform linesearch.
>>>
>>> Best regards,
>>> Jussi
>>>
>>>> Jens Jørgen
>>>>
>>>>> ----- Original Message -----
>>>>>>
>>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>> To: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>>> Sent: Thursday, January 17, 2013 7:00:08 AM
>>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
>>>>>>>
>>>>>>> I should add that this method is clearly not fault tolerant, is that
>>>>>>> what your are thinking of? For example, some node has an error in
>>>>>>> the middle of a force evaluation which brings down the whole code.
>>>>>>
>>>>>> No, I wasn't thinking about such cases.
>>>>>>
>>>>>> I think most people will just let their jobs run until it gets killed
>>>>>> by
>>>>>> the queuing system. If you do this:
>>>>>>
>>>>>> atoms = ...
>>>>>> atoms.set_calculator(GPAW(...))
>>>>>> opt = Optimizer(atoms, trajectory='a1.traj')
>>>>>> opt.run(fmax=0.05)
>>>>>>
>>>>>> Let's say GPAW is stopped in the middle of calculating the forces for
>>>>>> image 8. Then the last image in a1.traj will be image 7 and
>>>>>> corresponding forces. If you then do:
>>>>>>
>>>>>> atoms = read('a1.traj')
>>>>>> atoms.set_calculator(GPAW(...))
>>>>>> opt = Optimizer(atoms, trajectory='a2.traj')
>>>>>> opt.run(fmax=0.05)
>>>>>>
>>>>>> GPAW will recalculate the forces for image 7 ...
>>>>>>
>>>>>> How does one solve this problem? Read atoms and calculator from a gpw
>>>>>> file?
>>>>>>
>>>>>> Jens Jørgen
>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>>
>>>>>>>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>>>>> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
>>>>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>>>>> JJ,
>>>>>>>>
>>>>>>>> I deal with it by not allowing it to happen.
>>>>>>>> http://en.pastebin.ca/2303447
>>>>>>>>
>>>>>>>> Basically, I use the requested scheduler time (PBS, LSF, or
>>>>>>>> whatever)
>>>>>>>> and use that as a parameter to an observer.
>>>>>>>>
>>>>>>>>
>>>>>>>> ----- Original Message -----
>>>>>>>>>
>>>>>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>>>>> To: gpaw-users at listserv.fysik.dtu.dk
>>>>>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
>>>>>>>>> Subject: [gpaw-users] Restarting an optimization run
>>>>>>>>> Hi!
>>>>>>>>>
>>>>>>>>> I'd like to know how people continue optimization runs with GPAW
>>>>>>>>> that
>>>>>>>>> are killed in the middle of a force-calculation.
>>>>>>>>>
>>>>>>>>> Do you have some if-else magic in your script to handle both the
>>>>>>>>> first
>>>>>>>>> run and a continuation run or do just edit the first script to
>>>>>>>>> start
>>>>>>>>> from the last image in the trajectory file form the previous run?
>>>>>>>>>
>>>>>>>>> Do you worry about not repeating the force-calculation for the
>>>>>>>>> last
>>>>>>>>> image in the trajectory file you continue from? If yes, how?
>>>>>>>>>
>>>>>>>>> Jens Jørgen
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> gpaw-users mailing list
>>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>>>>
>>>>>>>> --
>>>>>>>> Nichols A. Romero, Ph.D.
>>>>>>>> Argonne Leadership Computing Facility
>>>>>>>> Argonne National Laboratory
>>>>>>>> Building 240 Room 2-127
>>>>>>>> 9700 South Cass Avenue
>>>>>>>> Argonne, IL 60490
>>>>>>>> (630) 252-3441
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gpaw-users mailing list
>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>



More information about the gpaw-users mailing list