[gpaw-users] Restarting an optimization run
Jussi Enkovaara
jussi.enkovaara at aalto.fi
Fri Jan 18 11:59:51 CET 2013
On 2013-01-18 11:44, Jens Jørgen Mortensen wrote:
> Den 17-01-2013 17:31, Nichols A. Romero skrev:
>> JJ,
>>
>> I have a different script for restarting.
>>
>> Note that this was doesn't have the shutdown observer
>> http://en.pastebin.ca/2303938
>>
>> But I basically do as Jussi, I restart from HDF5.
>>
>> Which we highly encourage people to use, because it works very well. Even on 100,000 cores :)
>>
>> I agree with what you say, most people just submit structural optimization for the maximum walltime and just let GPAW run.
>> Then its just interrupted either in the middle of an SCF or between ionic steps.
>>
>> I think all these tricks should get documented somewhere. Because the most common example of a GPAW calculation is a structural optimization.
>
> So, one needs to write a gpw or hdf file after every step in order to
> make this work! Hmm ... these files are huge and contains a lot of
> stuff that you normally don't need and they also increase network
> traffic. I wish there was a simple way to restart from a trajectory
> file without loosing the last step.
I do not think restart files are that huge if you do not save the
wavefunctions (they are in any case definitely larger than trajectory
files). Also, I think that at least in some cases it is worthwhile
to save restart files during the SCF cycles. However, I agree that
it would be useful to be able to restart also from a trajectory.
> We could work on making it possible to restart from GPAW's text output.
> We would need to write all the digits for positions and unit cell in
> order not to loose accuracy. Would this be a good idea?
At least I am not very fond of writing all the digits to a text file...
One possibility might be to copy the forces and energy which are read
from the trajectory to GPAW calculator, and indicate GPAW that the
calculation is already converged.
With BFGS, one can do a single step directly after reading the image
without calculator attached (just with the forces read from the image)
e.g.
if os.path.isfile('opt.traj'):
atoms = read('opt.traj')
traj = PickleTrajectory('opt' + '.traj', 'a', atoms=atoms)
opt = BFGS(atoms, trajectory=traj, logfile='qn.log')
opt.run(steps=1)
but that does not work with optimizers that perform linesearch.
Best regards,
Jussi
>
> Jens Jørgen
>
>> ----- Original Message -----
>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>> To: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>> Sent: Thursday, January 17, 2013 7:00:08 AM
>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
>>>> I should add that this method is clearly not fault tolerant, is that
>>>> what your are thinking of? For example, some node has an error in
>>>> the middle of a force evaluation which brings down the whole code.
>>> No, I wasn't thinking about such cases.
>>>
>>> I think most people will just let their jobs run until it gets killed
>>> by
>>> the queuing system. If you do this:
>>>
>>> atoms = ...
>>> atoms.set_calculator(GPAW(...))
>>> opt = Optimizer(atoms, trajectory='a1.traj')
>>> opt.run(fmax=0.05)
>>>
>>> Let's say GPAW is stopped in the middle of calculating the forces for
>>> image 8. Then the last image in a1.traj will be image 7 and
>>> corresponding forces. If you then do:
>>>
>>> atoms = read('a1.traj')
>>> atoms.set_calculator(GPAW(...))
>>> opt = Optimizer(atoms, trajectory='a2.traj')
>>> opt.run(fmax=0.05)
>>>
>>> GPAW will recalculate the forces for image 7 ...
>>>
>>> How does one solve this problem? Read atoms and calculator from a gpw
>>> file?
>>>
>>> Jens Jørgen
>>>
>>>> ----- Original Message -----
>>>>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>>> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>>> JJ,
>>>>>
>>>>> I deal with it by not allowing it to happen.
>>>>> http://en.pastebin.ca/2303447
>>>>>
>>>>> Basically, I use the requested scheduler time (PBS, LSF, or
>>>>> whatever)
>>>>> and use that as a parameter to an observer.
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>>> To: gpaw-users at listserv.fysik.dtu.dk
>>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
>>>>>> Subject: [gpaw-users] Restarting an optimization run
>>>>>> Hi!
>>>>>>
>>>>>> I'd like to know how people continue optimization runs with GPAW
>>>>>> that
>>>>>> are killed in the middle of a force-calculation.
>>>>>>
>>>>>> Do you have some if-else magic in your script to handle both the
>>>>>> first
>>>>>> run and a continuation run or do just edit the first script to
>>>>>> start
>>>>>> from the last image in the trajectory file form the previous run?
>>>>>>
>>>>>> Do you worry about not repeating the force-calculation for the
>>>>>> last
>>>>>> image in the trajectory file you continue from? If yes, how?
>>>>>>
>>>>>> Jens Jørgen
>>>>>>
>>>>>> _______________________________________________
>>>>>> gpaw-users mailing list
>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>> --
>>>>> Nichols A. Romero, Ph.D.
>>>>> Argonne Leadership Computing Facility
>>>>> Argonne National Laboratory
>>>>> Building 240 Room 2-127
>>>>> 9700 South Cass Avenue
>>>>> Argonne, IL 60490
>>>>> (630) 252-3441
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
More information about the gpaw-users
mailing list