[gpaw-users] Regarding 'RuntimeError: Atoms objects on different processors are not identical!'
Ask Hjorth Larsen
asklarsen at gmail.com
Mon Mar 16 12:59:43 CET 2015
Hello
Yes. But the implementation is meant to avoid such things.
Could you please check with one of the common functionals if you get the
error, e.g. pbe? If you don't, then we know that the problem is within the
vdw-beef implementation which helps a lot.
Best regards
Ask
El 16/03/2015 11:31, "Yuriy Elesin" <yuel at topsoe.dk> escribió:
> Hi Ask,
>
> After updating to the latest trunk version of gpaw and ase the script
> works, but now instead of failing after 4-5 iterations, it fails after
> 20-70 iterations. It worked once out of about 8 runs. During that
> successful run it managed to converge after 74 iterations. The rest of the
> runs (running on 24-32 cores) it was reaching the tolerance of 1e-8 after
> some random 20-70 number of iterations.
>
> It appears that the error in atom positions on different cores increases
> over time and does not diffuse out. I do not know about Gpaw internal
> structure, but for MPI codes it might happen if part of the calculations
> are repeated on different MPI processes, and it is expected that the result
> of such duplicate calculation would be the same. If it is not the same (the
> reason might be different order of instruction execution) then it 1) might
> diffuse itself out if the problem if of diffusive type (or it is introduced
> artificially) or 2) accumulate the error if there is no natural physical
> diffusion in the problem and it is not introduced artificially.
>
> Again I do not know the internal structure of gpaw, but if the source of
> error is not clear, a solution might be is to introduce some artificial
> diffusion. E.g. all atom positions on different MPI ranks are averaged
> every now and then.
>
> Here is the input script:
> # ===========================================
> #!/usr/bin/env python
> from ase.io import read
> from ase.optimize import BFGS
> # from ase.data.molecules import molecule
> from ase import Atom,Atoms
> import numpy as np
> from ase.constraints import FixAtoms
> from gpaw import GPAW,FermiDirac, MixerSum
> from ase.visualize import view
> import ase.utils.geometry as geometry
>
> #read unit cell and positions
> atoms = read('dyn1.traj')
>
> # set and start calculator and optimizer
> calc =
> GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True,
> maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt')
> atoms.set_calculator(calc)
>
> atoms.write('test.traj')
>
> dyn =BFGS(atoms,trajectory='dyn2.traj' ,logfile = 'dyn2.log')
> dyn.run(fmax=0.03)
> calc.write('gpaw.gpw')
> # ===========================================
> And the input file:
>
>
> Kind regards
> Yuriy
>
>
> From: "Yuriy Elesin" <yuel at topsoe.dk> To: Ask Hjorth Larsen <
> asklarsen at gmail.com>, Cc: gpaw-users <gpaw-users at listserv.fysik.dtu.dk>
> Date: 03/04/2015 02:53 PM Subject: Re: [gpaw-users] Regarding
> 'RuntimeError: Atoms objects on different processors are not identical!' Sent
> by: gpaw-users-bounces at listserv.fysik.dtu.dk
> ------------------------------
>
>
>
> Hi Ask
>
> The version we have is 0.10.0.11364. I will try the trunk and let you know
> the results
>
> Kind regards
> Yuriy
>
> From: Ask Hjorth Larsen <asklarsen at gmail.com> To: Yuriy Elesin <
> yuel at topsoe.dk>, Cc: Morten Niklas Gjerding <mogje at fysik.dtu.dk>,
> gpaw-users <gpaw-users at listserv.fysik.dtu.dk>, Hanne Falsig <
> hafa at topsoe.dk> Date: 03/03/2015 08:16 PM Subject: Re: [gpaw-users]
> Regarding 'RuntimeError: Atoms objects on different processors are not
> identical!'
>
> ------------------------------
>
>
>
> I recently changed something having to do with that part. You could try
> with trunk. Which version are you using?
>
> After the change, when the code runs the check it will allow for an error
> of 1e-8 (approx. square root of machine precision), then simply use the
> positions from the master process if it doesn't raise the error.
>
> Best regards
> Ask
>
> 2015-03-03 14:26 GMT+01:00 Yuriy Elesin <*yuel at topsoe.dk* <yuel at topsoe.dk>>:
>
> Thank you Morten and Ask!
>
> It has generated the output as a bunch of
> compare_atoms_r00**_positions.pickle files. Plus it generated one
> compare_atoms_fps_positions.pickle file. When I compare all the r00** files
> it indicates that positions on most of the cores are identical, except a
> few cores where the difference between positions generates one line with
> 4.33680869e-19 as difference:
> ....
> [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]
> [ 0.00000000e+00, 4.33680869e-19, 0.00000000e+00]
> [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]
> ....
>
> It is always the same value, when the positions are different. But again
> out of 24 cores there are only 8 (I think) cores with different then the
> rest coordinates.
>
> The compare_atoms_fps_positions.pickle file has the following content:
>
> array([-8145271529702367328, 3281356470061976748, 3281356470061976748,
> 3281356470061976748, 3281356470061976748, 3281356470061976748,
> 3281356470061976748, 3281356470061976748, 3281356470061976748,
> -8145271529702367328, 3281356470061976748, 3281356470061976748,
> 3281356470061976748, 3281356470061976748, -8145271529702367328,
> -8145271529702367328, 3281356470061976748, 3281356470061976748,
> 3281356470061976748, 3281356470061976748, -8145271529702367328,
> 3281356470061976748, 3281356470061976748, 3281356470061976748])
>
> Does it tell you anything? If it is just minor inaccuracies then how do we
> prevent the following error that seem to stop the calculation:
>
> RuntimeError: Atoms objects on different processors are not identical!
> GPAW CLEANUP (node 23): <type 'exceptions.RuntimeError'> occurred.
> Calling MPI_Abort!
>
>
> Kind regards
> Yuriy
> From: Morten Niklas Gjerding <*mogje at fysik.dtu.dk* <mogje at fysik.dtu.dk>>
> To: Yuriy Elesin <*yuel at topsoe.dk* <yuel at topsoe.dk>>, Ask Hjorth Larsen <
> *asklarsen at gmail.com* <asklarsen at gmail.com>>, Cc: gpaw-users <
> *gpaw-users at listserv.fysik.dtu.dk* <gpaw-users at listserv.fysik.dtu.dk>>
> Date: 03/03/2015 11:15 AM Subject: RE: [gpaw-users] Regarding
> 'RuntimeError: Atoms objects on different processors are not identical!'
>
>
> ------------------------------
>
>
>
> Hi Yury.
>
> I think that:
>
> gpaw-python script.py --debug
>
> is what you want.
>
> KR. Morten.
> ------------------------------
>
> * From:* *gpaw-users-bounces at listserv.fysik.dtu.dk*
> <gpaw-users-bounces at listserv.fysik.dtu.dk> [
> *gpaw-users-bounces at listserv.fysik.dtu.dk*
> <gpaw-users-bounces at listserv.fysik.dtu.dk>] on behalf of Yuriy Elesin [
> *yuel at topsoe.dk* <yuel at topsoe.dk>]
> * Sent:* Tuesday, March 03, 2015 9:56 AM
> * To:* Ask Hjorth Larsen
> * Cc:* gpaw-users
> * Subject:* Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on
> different processors are not identical!'
>
> Hi Ask
>
> How do we enable the debug mode in GPAW? I tried gpaw-python with -d
> option but it says it is only for the parser, so no files with faulty
> positions were generated.
>
> Kind regards
> Yuiry From: Ask Hjorth Larsen <*asklarsen at gmail.com*
> <asklarsen at gmail.com>> To: Hanne Falsig <*hafa at topsoe.dk*
> <hafa at topsoe.dk>>, Cc: gpaw-users <*gpaw-users at listserv.fysik.dtu.dk*
> <gpaw-users at listserv.fysik.dtu.dk>>, *yuel at topsoe.dk* <yuel at topsoe.dk>
> Date: 03/02/2015 04:03 PM Subject: Re: [gpaw-users] Regarding
> 'RuntimeError: Atoms objects on different processors are not identical!'
>
>
>
> ------------------------------
>
>
>
> Hi Hanne
>
> In trunk (and possibly in the old version; this I don't remember) the
> faulty positions will be dumped to files if you use debug mode. Then you
> can check whether they completely disagree (something is wrong with the
> parallelization, probably of the XC functional) or whether there's just
> some numerical noise (some calculation is duplicated in a way that allows
> numerical inaccuracies across different cores).
>
> Best regards
> Ask
>
> 2015-03-02 11:30 GMT+01:00 Hanne Falsig <*hafa at topsoe.dk* <hafa at topsoe.dk>>:
>
> Dear all,
>
> I get the following error message:
> RuntimeError: Atoms objects on different processors are not identical!
>
> I can see there have been a number of questions about this. Did someone
> find a solution to this problem?
>
> My input is:
> calc =
> GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True,
> maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt')
>
> We use gpaw version 0.10.0.11364
>
> Best regards,
> Hanne
> ------------------------------
> *Hanne Falsig *
> Research Scientist | Physical Analytical, R&D
>
> * Haldor Topsoe A/S*
> Nymøllevej 55, DK-2800 Kgs. Lyngby
> Phone (direct): *+45 2275 4776* <%2B45%202275%204776>
> Switchboard: *+45 4527 2000* <%2B45%204527%202000>
>
> * Read more at **www.topsoe.com* <http://www.topsoe.com/>
> ------------------------------
>
> Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
> attachments, if any) is confidential and may be privileged. It is intended
> only for the addressee.
> Any unauthorised distribution or disclosure is prohibited. Disclosure to
> anyone other than the intended recipient does not constitute waiver of
> privilege.
> If you have received this email in error, please notify the sender by
> email and delete it and any attachments from your computer system and
> records.
> HALDOR TOPSOE (*www.topsoe.com* <http://www.topsoe.com/>)
>
>
>
>
>
> _______________________________________________
> gpaw-users mailing list
> *gpaw-users at listserv.fysik.dtu.dk* <gpaw-users at listserv.fysik.dtu.dk>
> *https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users*
> <https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users>
> ------------------------------
>
> Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
> attachments, if any) is confidential and may be privileged. It is intended
> only for the addressee.
> Any unauthorised distribution or disclosure is prohibited. Disclosure to
> anyone other than the intended recipient does not constitute waiver of
> privilege.
> If you have received this email in error, please notify the sender by
> email and delete it and any attachments from your computer system and
> records.
> HALDOR TOPSOE (*www.topsoe.com* <http://www.topsoe.com/>)
>
>
>
> ------------------------------
>
> Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
> attachments, if any) is confidential and may be privileged. It is intended
> only for the addressee.
> Any unauthorised distribution or disclosure is prohibited. Disclosure to
> anyone other than the intended recipient does not constitute waiver of
> privilege.
> If you have received this email in error, please notify the sender by
> email and delete it and any attachments from your computer system and
> records.
> HALDOR TOPSOE (*www.topsoe.com* <http://www.topsoe.com/>)
>
>
>
>
> ------------------------------
>
> Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
> attachments, if any) is confidential and may be privileged. It is intended
> only for the addressee.
> Any unauthorised distribution or disclosure is prohibited. Disclosure to
> anyone other than the intended recipient does not constitute waiver of
> privilege.
> If you have received this email in error, please notify the sender by
> email and delete it and any attachments from your computer system and
> records.
> HALDOR TOPSOE (*www.topsoe.com* <http://www.topsoe.com/>)
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
> ------------------------------
>
> Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
> attachments, if any) is confidential and may be privileged. It is intended
> only for the addressee.
> Any unauthorised distribution or disclosure is prohibited. Disclosure to
> anyone other than the intended recipient does not constitute waiver of
> privilege.
> If you have received this email in error, please notify the sender by
> email and delete it and any attachments from your computer system and
> records.
> HALDOR TOPSOE (www.topsoe.com)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150316/6bc0033c/attachment-0001.html>
More information about the gpaw-users
mailing list