[gpaw-users] Restarting an optimization run
Jens Jørgen Mortensen
jensj at fysik.dtu.dk
Fri Jan 18 10:44:20 CET 2013
Den 17-01-2013 17:31, Nichols A. Romero skrev:
> JJ,
>
> I have a different script for restarting.
>
> Note that this was doesn't have the shutdown observer
> http://en.pastebin.ca/2303938
>
> But I basically do as Jussi, I restart from HDF5.
>
> Which we highly encourage people to use, because it works very well. Even on 100,000 cores :)
>
> I agree with what you say, most people just submit structural optimization for the maximum walltime and just let GPAW run.
> Then its just interrupted either in the middle of an SCF or between ionic steps.
>
> I think all these tricks should get documented somewhere. Because the most common example of a GPAW calculation is a structural optimization.
So, one needs to write a gpw or hdf file after every step in order to
make this work! Hmm ... these files are huge and contains a lot of
stuff that you normally don't need and they also increase network
traffic. I wish there was a simple way to restart from a trajectory
file without loosing the last step.
We could work on making it possible to restart from GPAW's text output.
We would need to write all the digits for positions and unit cell in
order not to loose accuracy. Would this be a good idea?
Jens Jørgen
> ----- Original Message -----
>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>> To: "Nichols A. Romero" <naromero at alcf.anl.gov>
>> Cc: gpaw-users at listserv.fysik.dtu.dk
>> Sent: Thursday, January 17, 2013 7:00:08 AM
>> Subject: Re: [gpaw-users] Restarting an optimization run
>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
>>> I should add that this method is clearly not fault tolerant, is that
>>> what your are thinking of? For example, some node has an error in
>>> the middle of a force evaluation which brings down the whole code.
>> No, I wasn't thinking about such cases.
>>
>> I think most people will just let their jobs run until it gets killed
>> by
>> the queuing system. If you do this:
>>
>> atoms = ...
>> atoms.set_calculator(GPAW(...))
>> opt = Optimizer(atoms, trajectory='a1.traj')
>> opt.run(fmax=0.05)
>>
>> Let's say GPAW is stopped in the middle of calculating the forces for
>> image 8. Then the last image in a1.traj will be image 7 and
>> corresponding forces. If you then do:
>>
>> atoms = read('a1.traj')
>> atoms.set_calculator(GPAW(...))
>> opt = Optimizer(atoms, trajectory='a2.traj')
>> opt.run(fmax=0.05)
>>
>> GPAW will recalculate the forces for image 7 ...
>>
>> How does one solve this problem? Read atoms and calculator from a gpw
>> file?
>>
>> Jens Jørgen
>>
>>> ----- Original Message -----
>>>> From: "Nichols A. Romero" <naromero at alcf.anl.gov>
>>>> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
>>>> Subject: Re: [gpaw-users] Restarting an optimization run
>>>> JJ,
>>>>
>>>> I deal with it by not allowing it to happen.
>>>> http://en.pastebin.ca/2303447
>>>>
>>>> Basically, I use the requested scheduler time (PBS, LSF, or
>>>> whatever)
>>>> and use that as a parameter to an observer.
>>>>
>>>>
>>>> ----- Original Message -----
>>>>> From: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
>>>>> To: gpaw-users at listserv.fysik.dtu.dk
>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
>>>>> Subject: [gpaw-users] Restarting an optimization run
>>>>> Hi!
>>>>>
>>>>> I'd like to know how people continue optimization runs with GPAW
>>>>> that
>>>>> are killed in the middle of a force-calculation.
>>>>>
>>>>> Do you have some if-else magic in your script to handle both the
>>>>> first
>>>>> run and a continuation run or do just edit the first script to
>>>>> start
>>>>> from the last image in the trajectory file form the previous run?
>>>>>
>>>>> Do you worry about not repeating the force-calculation for the
>>>>> last
>>>>> image in the trajectory file you continue from? If yes, how?
>>>>>
>>>>> Jens Jørgen
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>> --
>>>> Nichols A. Romero, Ph.D.
>>>> Argonne Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> Building 240 Room 2-127
>>>> 9700 South Cass Avenue
>>>> Argonne, IL 60490
>>>> (630) 252-3441
>>>>
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
More information about the gpaw-users
mailing list