[gpaw-users] Restarting an optimization run
Nichols A. Romero
naromero at alcf.anl.gov
Fri Jan 18 16:49:45 CET 2013
Calculating the forces is not a big deal for small system, but we (ANL people) have done geometry relaxation of Pt 961 cluster with the FD code. Recalculating anything was quite painful.
Are the forces stored in either the .traj or .pckl file? Then the thing to do is to have the calculator not throw away the forces in restart mode and to have a new restart mode that does not require .gpw or .hdf5, we can call it restart 'lite'.
Usually I don't think too much about the space wasted by .gpw or .hdf5 file, but if you have a Linux cluster where everyone is running GPAW.... I could see it using up a lot of space quite quickly. Hopefully at CAMd people are using GPAW and not VASP :)
----- Original Message -----
> From: "Michael Walter" <Michael.Walter at fmf.uni-freiburg.de>
> To: "Jens Jørgen Mortensen" <jensj at fysik.dtu.dk>
> Cc: gpaw-users at listserv.fysik.dtu.dk
> Sent: Friday, January 18, 2013 7:17:13 AM
> Subject: Re: [gpaw-users] Restarting an optimization run
> 2013/1/18 Jens Jørgen Mortensen < jensj at fysik.dtu.dk >
>
>
> Den 18-01-2013 12:46, Ask Hjorth Larsen skrev:
>
> > Hi
> >
> > Could someone summarize what exactly the problem with trajectory
> > restarts was? Isn't it just a question of fixing the optimizers so
> > they use all the data available?
>
> If you restart like this:
>
>
> atoms = read('a1.traj')
> atoms.set_calculator(GPAW(...))
> opt = Optimizer(atoms, trajectory='a2.traj')
> opt.run(fmax=0.05)
>
>
> then after the first line, atoms will have a SinglePointCalculator
> object as its calculator and this object knows about the forces from
> the
> last image in the trajectory.
>
> Second line: the SinglePointCalculator is replaced by a fresh GPAW
> calculator which doesn't know anything.
>
> Line 4: The forces are calculated again :-(
>
>
>
> Is it that bad ? The electronic structure, i.e. the Kohn-Sham states
> have to be calculated again anyway. It should be of little cost to
> calculate the forces again.
>
>
> In particular in a optimization run, that can have up to hundreds of
> steps (my structure guesses are bad sometimes ;) this should be a
> vanishing low computational effort.
>
>
> Or did I overlook something ? I always restart from the trajectory
> file in the way of the "bad" example above.
>
>
> Best,
> Michael
>
>
>
> Jens Jørgen
>
>
>
> > Regards
> > Ask
> >
> > 2013/1/18 Jussi Enkovaara < jussi.enkovaara at aalto.fi >:
> >> On 2013-01-18 11:44, Jens Jørgen Mortensen wrote:
> >>> Den 17-01-2013 17:31, Nichols A. Romero skrev:
> >>>> JJ,
> >>>>
> >>>> I have a different script for restarting.
> >>>>
> >>>> Note that this was doesn't have the shutdown observer
> >>>> http://en.pastebin.ca/2303938
> >>>>
> >>>> But I basically do as Jussi, I restart from HDF5.
> >>>>
> >>>> Which we highly encourage people to use, because it works very
> >>>> well. Even on 100,000 cores :)
> >>>>
> >>>> I agree with what you say, most people just submit structural
> >>>> optimization for the maximum walltime and just let GPAW run.
> >>>> Then its just interrupted either in the middle of an SCF or
> >>>> between ionic steps.
> >>>>
> >>>> I think all these tricks should get documented somewhere. Because
> >>>> the most common example of a GPAW calculation is a structural
> >>>> optimization.
> >>> So, one needs to write a gpw or hdf file after every step in order
> >>> to
> >>> make this work! Hmm ... these files are huge and contains a lot of
> >>> stuff that you normally don't need and they also increase network
> >>> traffic. I wish there was a simple way to restart from a
> >>> trajectory
> >>> file without loosing the last step.
> >> I do not think restart files are that huge if you do not save the
> >> wavefunctions (they are in any case definitely larger than
> >> trajectory
> >> files). Also, I think that at least in some cases it is worthwhile
> >> to save restart files during the SCF cycles. However, I agree that
> >> it would be useful to be able to restart also from a trajectory.
> >>
> >>> We could work on making it possible to restart from GPAW's text
> >>> output.
> >>> We would need to write all the digits for positions and unit cell
> >>> in
> >>> order not to loose accuracy. Would this be a good idea?
> >> At least I am not very fond of writing all the digits to a text
> >> file...
> >> One possibility might be to copy the forces and energy which are
> >> read
> >> from the trajectory to GPAW calculator, and indicate GPAW that the
> >> calculation is already converged.
> >>
> >> With BFGS, one can do a single step directly after reading the
> >> image
> >> without calculator attached (just with the forces read from the
> >> image)
> >> e.g.
> >>
> >> if os.path.isfile('opt.traj'):
> >> atoms = read('opt.traj')
> >> traj = PickleTrajectory('opt' + '.traj', 'a', atoms=atoms)
> >> opt = BFGS(atoms, trajectory=traj, logfile='qn.log')
> >> opt.run(steps=1)
> >>
> >> but that does not work with optimizers that perform linesearch.
> >>
> >> Best regards,
> >> Jussi
> >>
> >>> Jens Jørgen
> >>>
> >>>> ----- Original Message -----
> >>>>> From: "Jens Jørgen Mortensen" < jensj at fysik.dtu.dk >
> >>>>> To: "Nichols A. Romero" < naromero at alcf.anl.gov >
> >>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
> >>>>> Sent: Thursday, January 17, 2013 7:00:08 AM
> >>>>> Subject: Re: [gpaw-users] Restarting an optimization run
> >>>>> Den 16-01-2013 17:55, Nichols A. Romero skrev:
> >>>>>> I should add that this method is clearly not fault tolerant, is
> >>>>>> that
> >>>>>> what your are thinking of? For example, some node has an error
> >>>>>> in
> >>>>>> the middle of a force evaluation which brings down the whole
> >>>>>> code.
> >>>>> No, I wasn't thinking about such cases.
> >>>>>
> >>>>> I think most people will just let their jobs run until it gets
> >>>>> killed
> >>>>> by
> >>>>> the queuing system. If you do this:
> >>>>>
> >>>>> atoms = ...
> >>>>> atoms.set_calculator(GPAW(...))
> >>>>> opt = Optimizer(atoms, trajectory='a1.traj')
> >>>>> opt.run(fmax=0.05)
> >>>>>
> >>>>> Let's say GPAW is stopped in the middle of calculating the
> >>>>> forces for
> >>>>> image 8. Then the last image in a1.traj will be image 7 and
> >>>>> corresponding forces. If you then do:
> >>>>>
> >>>>> atoms = read('a1.traj')
> >>>>> atoms.set_calculator(GPAW(...))
> >>>>> opt = Optimizer(atoms, trajectory='a2.traj')
> >>>>> opt.run(fmax=0.05)
> >>>>>
> >>>>> GPAW will recalculate the forces for image 7 ...
> >>>>>
> >>>>> How does one solve this problem? Read atoms and calculator from
> >>>>> a gpw
> >>>>> file?
> >>>>>
> >>>>> Jens Jørgen
> >>>>>
> >>>>>> ----- Original Message -----
> >>>>>>> From: "Nichols A. Romero" < naromero at alcf.anl.gov >
> >>>>>>> To: "Jens Jørgen Mortensen" < jensj at fysik.dtu.dk >
> >>>>>>> Cc: gpaw-users at listserv.fysik.dtu.dk
> >>>>>>> Sent: Wednesday, January 16, 2013 10:49:38 AM
> >>>>>>> Subject: Re: [gpaw-users] Restarting an optimization run
> >>>>>>> JJ,
> >>>>>>>
> >>>>>>> I deal with it by not allowing it to happen.
> >>>>>>> http://en.pastebin.ca/2303447
> >>>>>>>
> >>>>>>> Basically, I use the requested scheduler time (PBS, LSF, or
> >>>>>>> whatever)
> >>>>>>> and use that as a parameter to an observer.
> >>>>>>>
> >>>>>>>
> >>>>>>> ----- Original Message -----
> >>>>>>>> From: "Jens Jørgen Mortensen" < jensj at fysik.dtu.dk >
> >>>>>>>> To: gpaw-users at listserv.fysik.dtu.dk
> >>>>>>>> Sent: Wednesday, January 16, 2013 10:34:27 AM
> >>>>>>>> Subject: [gpaw-users] Restarting an optimization run
> >>>>>>>> Hi!
> >>>>>>>>
> >>>>>>>> I'd like to know how people continue optimization runs with
> >>>>>>>> GPAW
> >>>>>>>> that
> >>>>>>>> are killed in the middle of a force-calculation.
> >>>>>>>>
> >>>>>>>> Do you have some if-else magic in your script to handle both
> >>>>>>>> the
> >>>>>>>> first
> >>>>>>>> run and a continuation run or do just edit the first script
> >>>>>>>> to
> >>>>>>>> start
> >>>>>>>> from the last image in the trajectory file form the previous
> >>>>>>>> run?
> >>>>>>>>
> >>>>>>>> Do you worry about not repeating the force-calculation for
> >>>>>>>> the
> >>>>>>>> last
> >>>>>>>> image in the trajectory file you continue from? If yes, how?
> >>>>>>>>
> >>>>>>>> Jens Jørgen
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> gpaw-users mailing list
> >>>>>>>> gpaw-users at listserv.fysik.dtu.dk
> >>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> >>>>>>> --
> >>>>>>> Nichols A. Romero, Ph.D.
> >>>>>>> Argonne Leadership Computing Facility
> >>>>>>> Argonne National Laboratory
> >>>>>>> Building 240 Room 2-127
> >>>>>>> 9700 South Cass Avenue
> >>>>>>> Argonne, IL 60490
> >>>>>>> (630) 252-3441
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> gpaw-users mailing list
> >>>>>>> gpaw-users at listserv.fysik.dtu.dk
> >>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> >>> _______________________________________________
> >>> gpaw-users mailing list
> >>> gpaw-users at listserv.fysik.dtu.dk
> >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> >>>
> >> _______________________________________________
> >> gpaw-users mailing list
> >> gpaw-users at listserv.fysik.dtu.dk
> >> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
>
> --
> ------------------------------------------
> PD Dr Michael Walter
> Address: Freiburger Materialforschungszentrum
> Stefan-Meier-Straße 21
> D-79104 Freiburg i. Br.
> Germany
> Tel.: +49 761 203 4758 and +49 761 203 7695
> Fax: +49 761 203 4701
> email: Michael.Walter at fmf.uni-freiburg.de
> www: http://omnibus.uni-freiburg.de/~mw767
>
> publications:
> http://scholar.google.com/citations?user=vlmryKEAAAAJ&hl=en
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
--
Nichols A. Romero, Ph.D.
Argonne Leadership Computing Facility
Argonne National Laboratory
Building 240 Room 2-127
9700 South Cass Avenue
Argonne, IL 60490
(630) 252-3441
More information about the gpaw-users
mailing list