[gpaw-users] Regarding 'RuntimeError: Atoms objects on different processors are not identical!'
Yuriy Elesin
yuel at topsoe.dk
Mon Mar 16 11:31:18 CET 2015
Hi Ask,
After updating to the latest trunk version of gpaw and ase the script
works, but now instead of failing after 4-5 iterations, it fails after
20-70 iterations. It worked once out of about 8 runs. During that
successful run it managed to converge after 74 iterations. The rest of the
runs (running on 24-32 cores) it was reaching the tolerance of 1e-8 after
some random 20-70 number of iterations.
It appears that the error in atom positions on different cores increases
over time and does not diffuse out. I do not know about Gpaw internal
structure, but for MPI codes it might happen if part of the calculations
are repeated on different MPI processes, and it is expected that the
result of such duplicate calculation would be the same. If it is not the
same (the reason might be different order of instruction execution) then
it 1) might diffuse itself out if the problem if of diffusive type (or it
is introduced artificially) or 2) accumulate the error if there is no
natural physical diffusion in the problem and it is not introduced
artificially.
Again I do not know the internal structure of gpaw, but if the source of
error is not clear, a solution might be is to introduce some artificial
diffusion. E.g. all atom positions on different MPI ranks are averaged
every now and then.
Here is the input script:
# ===========================================
#!/usr/bin/env python
from ase.io import read
from ase.optimize import BFGS
# from ase.data.molecules import molecule
from ase import Atom,Atoms
import numpy as np
from ase.constraints import FixAtoms
from gpaw import GPAW,FermiDirac, MixerSum
from ase.visualize import view
import ase.utils.geometry as geometry
#read unit cell and positions
atoms = read('dyn1.traj')
# set and start calculator and optimizer
calc =
GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True,
maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt')
atoms.set_calculator(calc)
atoms.write('test.traj')
dyn =BFGS(atoms,trajectory='dyn2.traj' ,logfile = 'dyn2.log')
dyn.run(fmax=0.03)
calc.write('gpaw.gpw')
# ===========================================
And the input file:
Kind regards
Yuriy
From:
"Yuriy Elesin" <yuel at topsoe.dk>
To:
Ask Hjorth Larsen <asklarsen at gmail.com>,
Cc:
gpaw-users <gpaw-users at listserv.fysik.dtu.dk>
Date:
03/04/2015 02:53 PM
Subject:
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different
processors are not identical!'
Sent by:
gpaw-users-bounces at listserv.fysik.dtu.dk
Hi Ask
The version we have is 0.10.0.11364. I will try the trunk and let you know
the results
Kind regards
Yuriy
From:
Ask Hjorth Larsen <asklarsen at gmail.com>
To:
Yuriy Elesin <yuel at topsoe.dk>,
Cc:
Morten Niklas Gjerding <mogje at fysik.dtu.dk>, gpaw-users
<gpaw-users at listserv.fysik.dtu.dk>, Hanne Falsig <hafa at topsoe.dk>
Date:
03/03/2015 08:16 PM
Subject:
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different
processors are not identical!'
I recently changed something having to do with that part. You could try
with trunk. Which version are you using?
After the change, when the code runs the check it will allow for an error
of 1e-8 (approx. square root of machine precision), then simply use the
positions from the master process if it doesn't raise the error.
Best regards
Ask
2015-03-03 14:26 GMT+01:00 Yuriy Elesin <yuel at topsoe.dk>:
Thank you Morten and Ask!
It has generated the output as a bunch of
compare_atoms_r00**_positions.pickle files. Plus it generated one
compare_atoms_fps_positions.pickle file. When I compare all the r00**
files it indicates that positions on most of the cores are identical,
except a few cores where the difference between positions generates one
line with 4.33680869e-19 as difference:
....
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]
[ 0.00000000e+00, 4.33680869e-19, 0.00000000e+00]
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]
....
It is always the same value, when the positions are different. But again
out of 24 cores there are only 8 (I think) cores with different then the
rest coordinates.
The compare_atoms_fps_positions.pickle file has the following content:
array([-8145271529702367328, 3281356470061976748, 3281356470061976748,
3281356470061976748, 3281356470061976748, 3281356470061976748,
3281356470061976748, 3281356470061976748, 3281356470061976748,
-8145271529702367328, 3281356470061976748, 3281356470061976748,
3281356470061976748, 3281356470061976748, -8145271529702367328,
-8145271529702367328, 3281356470061976748, 3281356470061976748,
3281356470061976748, 3281356470061976748, -8145271529702367328,
3281356470061976748, 3281356470061976748, 3281356470061976748])
Does it tell you anything? If it is just minor inaccuracies then how do we
prevent the following error that seem to stop the calculation:
RuntimeError: Atoms objects on different processors are not identical!
GPAW CLEANUP (node 23): <type 'exceptions.RuntimeError'> occurred. Calling
MPI_Abort!
Kind regards
Yuriy
From:
Morten Niklas Gjerding <mogje at fysik.dtu.dk>
To:
Yuriy Elesin <yuel at topsoe.dk>, Ask Hjorth Larsen <asklarsen at gmail.com>,
Cc:
gpaw-users <gpaw-users at listserv.fysik.dtu.dk>
Date:
03/03/2015 11:15 AM
Subject:
RE: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different
processors are not identical!'
Hi Yury.
I think that:
gpaw-python script.py --debug
is what you want.
KR. Morten.
From: gpaw-users-bounces at listserv.fysik.dtu.dk [
gpaw-users-bounces at listserv.fysik.dtu.dk] on behalf of Yuriy Elesin [
yuel at topsoe.dk]
Sent: Tuesday, March 03, 2015 9:56 AM
To: Ask Hjorth Larsen
Cc: gpaw-users
Subject: Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on
different processors are not identical!'
Hi Ask
How do we enable the debug mode in GPAW? I tried gpaw-python with -d
option but it says it is only for the parser, so no files with faulty
positions were generated.
Kind regards
Yuiry
From:
Ask Hjorth Larsen <asklarsen at gmail.com>
To:
Hanne Falsig <hafa at topsoe.dk>,
Cc:
gpaw-users <gpaw-users at listserv.fysik.dtu.dk>, yuel at topsoe.dk
Date:
03/02/2015 04:03 PM
Subject:
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different
processors are not identical!'
Hi Hanne
In trunk (and possibly in the old version; this I don't remember) the
faulty positions will be dumped to files if you use debug mode. Then you
can check whether they completely disagree (something is wrong with the
parallelization, probably of the XC functional) or whether there's just
some numerical noise (some calculation is duplicated in a way that allows
numerical inaccuracies across different cores).
Best regards
Ask
2015-03-02 11:30 GMT+01:00 Hanne Falsig <hafa at topsoe.dk>:
Dear all,
I get the following error message:
RuntimeError: Atoms objects on different processors are not identical!
I can see there have been a number of questions about this. Did someone
find a solution to this problem?
My input is:
calc =
GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True,
maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt')
We use gpaw version 0.10.0.11364
Best regards,
Hanne
Hanne Falsig
Research Scientist | Physical Analytical, R&D
Haldor Topsoe A/S
Nymøllevej 55, DK-2800 Kgs. Lyngby
Phone (direct): +45 2275 4776
Switchboard: +45 4527 2000
Read more at www.topsoe.com
Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
attachments, if any) is confidential and may be privileged. It is intended
only for the addressee.
Any unauthorised distribution or disclosure is prohibited. Disclosure to
anyone other than the intended recipient does not constitute waiver of
privilege.
If you have received this email in error, please notify the sender by
email and delete it and any attachments from your computer system and
records.
HALDOR TOPSOE (www.topsoe.com)
_______________________________________________
gpaw-users mailing list
gpaw-users at listserv.fysik.dtu.dk
https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
attachments, if any) is confidential and may be privileged. It is intended
only for the addressee.
Any unauthorised distribution or disclosure is prohibited. Disclosure to
anyone other than the intended recipient does not constitute waiver of
privilege.
If you have received this email in error, please notify the sender by
email and delete it and any attachments from your computer system and
records.
HALDOR TOPSOE (www.topsoe.com)
Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
attachments, if any) is confidential and may be privileged. It is intended
only for the addressee.
Any unauthorised distribution or disclosure is prohibited. Disclosure to
anyone other than the intended recipient does not constitute waiver of
privilege.
If you have received this email in error, please notify the sender by
email and delete it and any attachments from your computer system and
records.
HALDOR TOPSOE (www.topsoe.com)
Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including
attachments, if any) is confidential and may be privileged. It is intended
only for the addressee.
Any unauthorised distribution or disclosure is prohibited. Disclosure to
anyone other than the intended recipient does not constitute waiver of
privilege.
If you have received this email in error, please notify the sender by
email and delete it and any attachments from your computer system and
records.
HALDOR TOPSOE (www.topsoe.com)
_______________________________________________
gpaw-users mailing list
gpaw-users at listserv.fysik.dtu.dk
https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
Haldor Topsøe A/S: CVR: 41853816. This e-mail message (including attachments, if any) is confidential and may be privileged. It is intended only for the addressee.Any unauthorised distribution or disclosure is prohibited. Disclosure to anyone other than the intended recipient does not constitute waiver of privilege.If you have received this email in error, please notify the sender by email and delete it and any attachments from your computer system and records.HALDOR TOPSOE (www.topsoe.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150316/3aeec8e7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dyn1.traj
Type: application/octet-stream
Size: 230956 bytes
Desc: not available
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150316/3aeec8e7/attachment-0001.obj>
More information about the gpaw-users
mailing list