[gpaw-users] Regarding 'RuntimeError: Atoms objects on different processors are not identical!'

Yuriy Elesin yuel at topsoe.dk
Mon Mar 16 11:31:18 CET 2015


Hi Ask,

After updating to the latest trunk version of gpaw and ase the script 
works, but now instead of failing after 4-5 iterations, it fails after 
20-70 iterations. It worked once out of about 8 runs. During that 
successful run it managed to converge after 74 iterations. The rest of the 
runs (running on 24-32 cores) it was reaching the tolerance of 1e-8 after 
some random 20-70 number of iterations.

It appears that the error in atom positions on different cores increases 
over time and does not diffuse out. I do not know about Gpaw internal 
structure, but for MPI codes it might happen if part of the calculations 
are repeated on different MPI processes, and it is expected that the 
result of such duplicate calculation would be the same. If it is not the 
same (the reason might be different order of instruction execution) then 
it 1) might diffuse itself out if the problem if of diffusive type (or it 
is introduced artificially) or 2) accumulate the error if there is no 
natural physical diffusion in the problem and it is not introduced 
artificially. 

Again I do not know the internal structure of gpaw, but if the source of 
error is not clear, a solution might be is to introduce some artificial 
diffusion. E.g. all atom positions on different MPI ranks are averaged 
every now and then.

Here is the input script:
# ===========================================
#!/usr/bin/env python
from ase.io import read
from ase.optimize import BFGS
# from ase.data.molecules import molecule
from ase import Atom,Atoms
import numpy as np
from ase.constraints import FixAtoms 
from gpaw import GPAW,FermiDirac, MixerSum
from ase.visualize import view
import ase.utils.geometry as geometry

#read unit cell and positions
atoms = read('dyn1.traj')

# set and start calculator and optimizer
calc = 
GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True, 
maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt')
atoms.set_calculator(calc)

atoms.write('test.traj')

dyn =BFGS(atoms,trajectory='dyn2.traj' ,logfile = 'dyn2.log')
dyn.run(fmax=0.03)
calc.write('gpaw.gpw')
# ===========================================
And the input file:


Kind regards
Yuriy



From:
"Yuriy Elesin" <yuel at topsoe.dk>
To:
Ask Hjorth Larsen <asklarsen at gmail.com>, 
Cc:
gpaw-users <gpaw-users at listserv.fysik.dtu.dk>
Date:
03/04/2015 02:53 PM
Subject:
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different 
processors are not identical!'
Sent by:
gpaw-users-bounces at listserv.fysik.dtu.dk



Hi Ask 

The version we have is 0.10.0.11364. I will try the trunk and let you know 
the results 

Kind regards 
Yuriy 


From: 
Ask Hjorth Larsen <asklarsen at gmail.com> 
To: 
Yuriy Elesin <yuel at topsoe.dk>, 
Cc: 
Morten Niklas Gjerding <mogje at fysik.dtu.dk>, gpaw-users 
<gpaw-users at listserv.fysik.dtu.dk>, Hanne Falsig <hafa at topsoe.dk> 
Date: 
03/03/2015 08:16 PM 
Subject: 
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different 
processors are not identical!'




I recently changed something having to do with that part.  You could try 
with trunk.  Which version are you using?

After the change, when the code runs the check it will allow for an error 
of 1e-8 (approx. square root of machine precision), then simply use the 
positions from the master process if it doesn't raise the error. 

Best regards 
Ask 

2015-03-03 14:26 GMT+01:00 Yuriy Elesin <yuel at topsoe.dk>: 
Thank you Morten and Ask! 

It has generated the output as a bunch of 
compare_atoms_r00**_positions.pickle files. Plus it generated one 
compare_atoms_fps_positions.pickle file. When I compare all the r00** 
files it indicates that positions on most of the cores are identical, 
except a few cores where the difference between positions generates one 
line with 4.33680869e-19 as difference: 
.... 
[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00] 
[  0.00000000e+00,   4.33680869e-19,   0.00000000e+00] 
[  0.00000000e+00,   0.00000000e+00,   0.00000000e+00] 
.... 

It is always the same value, when the positions are different. But again 
out of 24 cores there are only 8 (I think) cores with different then the 
rest coordinates. 

The compare_atoms_fps_positions.pickle file has the following content: 

array([-8145271529702367328,  3281356470061976748,  3281356470061976748, 
        3281356470061976748,  3281356470061976748,  3281356470061976748, 
        3281356470061976748,  3281356470061976748,  3281356470061976748, 
       -8145271529702367328,  3281356470061976748,  3281356470061976748, 
        3281356470061976748,  3281356470061976748, -8145271529702367328, 
       -8145271529702367328,  3281356470061976748,  3281356470061976748, 
        3281356470061976748,  3281356470061976748, -8145271529702367328, 
        3281356470061976748,  3281356470061976748,  3281356470061976748]) 

Does it tell you anything? If it is just minor inaccuracies then how do we 
prevent the following error that seem to stop the calculation: 

RuntimeError: Atoms objects on different processors are not identical! 
GPAW CLEANUP (node 23): <type 'exceptions.RuntimeError'> occurred. Calling 
MPI_Abort! 


Kind regards 
Yuriy 

From: 
Morten Niklas Gjerding <mogje at fysik.dtu.dk> 
To: 
Yuriy Elesin <yuel at topsoe.dk>, Ask Hjorth Larsen <asklarsen at gmail.com>, 
Cc: 
gpaw-users <gpaw-users at listserv.fysik.dtu.dk> 
Date: 
03/03/2015 11:15 AM 
Subject: 
RE: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different 
processors are not identical!'





Hi Yury.

I think that:

gpaw-python script.py --debug

is what you want.

KR. Morten. 

From: gpaw-users-bounces at listserv.fysik.dtu.dk [
gpaw-users-bounces at listserv.fysik.dtu.dk] on behalf of Yuriy Elesin [
yuel at topsoe.dk]
Sent: Tuesday, March 03, 2015 9:56 AM
To: Ask Hjorth Larsen
Cc: gpaw-users
Subject: Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on 
different processors are not identical!'

Hi Ask 

How do we enable the debug mode in GPAW? I tried gpaw-python with -d 
option but it says it is only for the parser, so no files with faulty 
positions were generated. 

Kind regards 
Yuiry 
From: 
Ask Hjorth Larsen <asklarsen at gmail.com> 
To: 
Hanne Falsig <hafa at topsoe.dk>, 
Cc: 
gpaw-users <gpaw-users at listserv.fysik.dtu.dk>, yuel at topsoe.dk 
Date: 
03/02/2015 04:03 PM 
Subject: 
Re: [gpaw-users] Regarding 'RuntimeError: Atoms objects on different 
processors are not identical!'






Hi Hanne

In trunk (and possibly in the old version; this I don't remember) the 
faulty positions will be dumped to files if you use debug mode.  Then you 
can check whether they completely disagree (something is wrong with the 
parallelization, probably of the XC functional) or whether there's just 
some numerical noise (some calculation is duplicated in a way that allows 
numerical inaccuracies across different cores).

Best regards 
Ask 

2015-03-02 11:30 GMT+01:00 Hanne Falsig <hafa at topsoe.dk>: 
Dear all, 

I get the following error message: 
RuntimeError: Atoms objects on different processors are not identical! 

I can see there have been a number of questions about this. Did someone 
find a solution  to this problem? 

My input is: 
calc = 
GPAW(xc='BEEF-vdW',kpts=[1,1,1],h=0.20,mixer=MixerSum(),spinpol=True, 
maxiter=300,occupations=FermiDirac(width=0.1),txt='gpaw.txt') 

We use gpaw version 0.10.0.11364 

Best regards, 
Hanne 
Hanne Falsig 
Research Scientist | Physical Analytical, R&D 

Haldor Topsoe A/S
Nymøllevej 55, DK-2800 Kgs. Lyngby
Phone (direct): +45 2275 4776 
Switchboard: +45 4527 2000 

Read more at www.topsoe.com 


Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including 
attachments, if any) is confidential and may be privileged. It is intended 
only for the addressee. 
Any unauthorised distribution or disclosure is prohibited. Disclosure to 
anyone other than the intended recipient does not constitute waiver of 
privilege. 
If you have received this email in error, please notify the sender by 
email and delete it and any attachments from your computer system and 
records. 
HALDOR TOPSOE (www.topsoe.com) 






_______________________________________________
gpaw-users mailing list
gpaw-users at listserv.fysik.dtu.dk
https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users 


Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including 
attachments, if any) is confidential and may be privileged. It is intended 
only for the addressee. 
Any unauthorised distribution or disclosure is prohibited. Disclosure to 
anyone other than the intended recipient does not constitute waiver of 
privilege. 
If you have received this email in error, please notify the sender by 
email and delete it and any attachments from your computer system and 
records. 
HALDOR TOPSOE (www.topsoe.com) 






Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including 
attachments, if any) is confidential and may be privileged. It is intended 
only for the addressee. 
Any unauthorised distribution or disclosure is prohibited. Disclosure to 
anyone other than the intended recipient does not constitute waiver of 
privilege. 
If you have received this email in error, please notify the sender by 
email and delete it and any attachments from your computer system and 
records. 
HALDOR TOPSOE (www.topsoe.com) 







Haldor Topsøe A/S: CVR: 41853816 This e-mail message (including 
attachments, if any) is confidential and may be privileged. It is intended 
only for the addressee. 
Any unauthorised distribution or disclosure is prohibited. Disclosure to 
anyone other than the intended recipient does not constitute waiver of 
privilege. 
If you have received this email in error, please notify the sender by 
email and delete it and any attachments from your computer system and 
records. 
HALDOR TOPSOE (www.topsoe.com) 

_______________________________________________
gpaw-users mailing list
gpaw-users at listserv.fysik.dtu.dk
https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users

Haldor Topsøe A/S: CVR: 41853816. This e-mail message (including attachments, if any) is confidential and may be privileged. It is intended only for the addressee.Any unauthorised distribution or disclosure is prohibited. Disclosure to anyone other than the intended recipient does not constitute waiver of privilege.If you have received this email in error, please notify the sender by email and delete it and any attachments from your computer system and records.HALDOR TOPSOE (www.topsoe.com)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150316/3aeec8e7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dyn1.traj
Type: application/octet-stream
Size: 230956 bytes
Desc: not available
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150316/3aeec8e7/attachment-0001.obj>


More information about the gpaw-users mailing list