[gpaw-users] Restarting an optimization run

Ask Hjorth Larsen asklarsen at gmail.com
Fri Jan 18 12:46:07 CET 2013


Hi

Could someone summarize what exactly the problem with trajectory
restarts was?  Isn't it just a question of fixing the optimizers so
they use all the data available?

Regards
Ask

2013/1/18 Jussi Enkovaara <jussi.enkovaara at aalto.fi>:
> On 2013-01-18 11:44, Jens Jørgen Mortensen wrote:
>> Den 17-01-2013 17:31, Nichols A. Romero skrev:
>>> JJ,
>>>
>>> I have a different script for restarting.
>>>
>>> Note that this was doesn't have the shutdown observer
>>> http://en.pastebin.ca/2303938
>>>
>>> But I basically do as Jussi, I restart from HDF5.
>>>
>>> Which we highly encourage people to use, because it works very well. Even on 100,000 cores :)
>>>
>>> I agree with what you say, most people just submit structural optimization for the maximum walltime and just let GPAW run.
>>> Then its just interrupted either in the middle of an SCF or between ionic steps.
>>>
>>> I think all these tricks should get documented somewhere. Because the most common example of a GPAW calculation is a structural optimization.
>>
>> So, one needs to write a gpw or hdf file after every step in order to
>> make this work!  Hmm ... these files are huge and contains a lot of
>> stuff that you normally don't need and they also increase network
>> traffic.  I wish there was a simple way to restart from a trajectory
>> file without loosing the last step.
>
> I do not think restart files are that huge if you do not save the
> wavefunctions (they are in any case definitely larger than trajectory
> files). Also, I think that at least in some cases it is worthwhile
> to save restart files during the SCF cycles. However, I agree that
> it would be useful to be able to restart also from a trajectory.
>
>> We could work on making it possible to restart from GPAW's text output.
>> We would need to write all the digits for positions and unit cell in
>> order not to loose accuracy.  Would this be a good idea?
>
> At least I am not very fond of writing all the digits to a text file...
> One possibility might be to copy the forces and energy which are read
> from the trajectory to GPAW calculator, and indicate GPAW that the
> calculation is already converged.
>
> With BFGS, one can do a single step directly after reading the image
> without calculator attached (just with the forces read from the image)
> e.g.
>
> if os.path.isfile('opt.traj'):
>      atoms = read('opt.traj')
>      traj = PickleTrajectory('opt' + '.traj', 'a', atoms=atoms)
>      opt = BFGS(atoms, trajectory=traj, logfile='qn.log')
>      opt.run(steps=1)
>
> but that does not work with optimizers that perform linesearch.
>
> Best regards,
> Jussi
>
>>
>> Jens Jørgen
>>
>>> ----- Original Message -----
>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>> To: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>> Sent: Thursday, January 17, 2013 7:00:08 AM
>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
>>>>> I should add that this method is clearly not fault tolerant, is that
>>>>> what your are thinking of? For example, some node has an error in
>>>>> the middle of a force evaluation which brings down the whole code.
>>>> No, I wasn't thinking about such cases.
>>>>
>>>> I think most people will just let their jobs run until it gets killed
>>>> by
>>>> the queuing system. If you do this:
>>>>
>>>> atoms = ...
>>>> atoms.set_calculator(GPAW(...))
>>>> opt = Optimizer(atoms, trajectory='a1.traj')
>>>> opt.run(fmax=0.05)
>>>>
>>>> Let's say GPAW is stopped in the middle of calculating the forces for
>>>> image 8. Then the last image in a1.traj will be image 7 and
>>>> corresponding forces. If you then do:
>>>>
>>>> atoms = read('a1.traj')
>>>> atoms.set_calculator(GPAW(...))
>>>> opt = Optimizer(atoms, trajectory='a2.traj')
>>>> opt.run(fmax=0.05)
>>>>
>>>> GPAW will recalculate the forces for image 7 ...
>>>>
>>>> How does one solve this problem? Read atoms and calculator from a gpw
>>>> file?
>>>>
>>>> Jens Jørgen
>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>>> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
>>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>>> JJ,
>>>>>>
>>>>>> I deal with it by not allowing it to happen.
>>>>>> http://en.pastebin.ca/2303447
>>>>>>
>>>>>> Basically, I use the requested scheduler time (PBS, LSF, or
>>>>>> whatever)
>>>>>> and use that as a parameter to an observer.
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>>> To: gpaw-users at listserv.fysik.dtu.dk
>>>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
>>>>>>> Subject: [gpaw-users] Restarting an optimization run
>>>>>>> Hi!
>>>>>>>
>>>>>>> I'd like to know how people continue optimization runs with GPAW
>>>>>>> that
>>>>>>> are killed in the middle of a force-calculation.
>>>>>>>
>>>>>>> Do you have some if-else magic in your script to handle both the
>>>>>>> first
>>>>>>> run and a continuation run or do just edit the first script to
>>>>>>> start
>>>>>>> from the last image in the trajectory file form the previous run?
>>>>>>>
>>>>>>> Do you worry about not repeating the force-calculation for the
>>>>>>> last
>>>>>>> image in the trajectory file you continue from? If yes, how?
>>>>>>>
>>>>>>> Jens Jørgen
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gpaw-users mailing list
>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>> --
>>>>>> Nichols A. Romero, Ph.D.
>>>>>> Argonne Leadership Computing Facility
>>>>>> Argonne National Laboratory
>>>>>> Building 240 Room 2-127
>>>>>> 9700 South Cass Avenue
>>>>>> Argonne, IL 60490
>>>>>> (630) 252-3441
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gpaw-users mailing list
>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users



More information about the gpaw-users mailing list