[gpaw-users] Error when relaxing atoms
jingzhe
jingzhe.chen at gmail.com
Fri Feb 6 05:41:09 CET 2015
Dear all,
I ran again in the debug mode, the results I got for the
atoms positions on
different ranks can differ in the order of 0.01A. And even the forces on
different
ranks differ in the order of 1eV/A, while every time there is only one
rank behaves
oddly, now I have exchanged the two lines ( broadcast and symmetric
correction)
in the force calculator to see what will happen.
Best.
Jingzhe
于 2015年02月05日 15:53, Jens Jørgen Mortensen 写道:
> On 02/04/2015 05:12 PM, Ask Hjorth Larsen wrote:
>> I committed something in r12401 which should make the check more
>> reliable. It does not use hashing because the atoms object is sent
>> anyway.
>
> Thanks a lot for fixing this! Should there also be some tolerance for
> the unit cell?
>
> Jens Jørgen
>
>> Best regards
>> Ask
>>
>> 2015-02-04 14:47 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>> Well, to clarify a bit.
>>>
>>> The hashing is useful if we don't want to send stuff around.
>>>
>>> If we are actually sending the positions now (by broadcast; I am only
>>> strictly aware that the forces are broadcast), then each core can
>>> compare locally without the need for hashing, to see if it wants to
>>> raise an error. (Raising errors on some cores but not all is
>>> sometimes annoying though.)
>>>
>>> Best regards
>>> Ask
>>>
>>> 2015-02-04 12:57 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>>> Hello
>>>>
>>>> 2015-02-04 10:21 GMT+01:00 Torsten Hahn <torstenhahn at fastmail.fm>:
>>>>> Probably we could do this but my feeling is, that this would only
>>>>> cure the symptoms not the real origin of this annoying bug.
>>>>>
>>>>>
>>>>> In fact there is code in
>>>>>
>>>>> mpi/__init__.py
>>>>>
>>>>> that says:
>>>>>
>>>>> # Construct fingerprint:
>>>>> # ASE may return slightly different atomic positions (e.g. due
>>>>> # to MKL) so compare only first 8 decimals of positions
>>>>>
>>>>>
>>>>> The code says that only 8 decimal positions are used for the
>>>>> generation of atomic „fingerprints“. These code relies on numpy
>>>>> and therefore lapack/blas functions. However i have no idea what
>>>>> that md5_array etc. stuff really does. But there is some
>>>>> debug-code which should at least tell you which Atom(s) causes the
>>>>> problems.
>>>> md5_array calculates the md5 sum of the data of an array. It is a
>>>> kind of checksum.
>>>>
>>>> Rounding unfortunately does not solve the problem. For any epsilon
>>>> however little, there exist numbers that differ by epsilon but round
>>>> to different numbers. So the check will not work the way it is
>>>> implemented at the moment: Positions that are "close enough" can
>>>> currently generate an error. In other words if you get this error,
>>>> maybe there was no problem at all. Given the vast thousands of DFT
>>>> calculations that are done, this may not be so unlikely.
>>>>
>>>>> However, that error is *very* strange because mpi.broadcast(...)
>>>>> should result in *exactly* the same objects on all cores. No idea
>>>>> why there should be any difference at all and what was the
>>>>> intention behind the fancy fingerprint-generation stuff in the
>>>>> compare_atoms(atoms, comm=world) method.
>>>> The check was introduced because there were (infrequent) situations
>>>> where different cores had different positions, due e.g. to the finicky
>>>> numerics elsewhere discussed. Later, I guess we have accepted the
>>>> numerical issues and relaxed the check so it is no longer exact,
>>>> preferring instead to broadcast. Evidently something else is
>>>> happening aside from the broadcast, which allows things to go wrong.
>>>> Perhaps the error in the rounding scheme mentioned above.
>>>>
>>>> To explain the hashing: We want to check that numbers on two different
>>>> CPUs are equal. Either we have to send all the numbers, or hash them
>>>> and send the hash. Hence hashing is much nicer. But maybe it would
>>>> be better to hash them with a continuous function. For example adding
>>>> all numbers with different (pseudorandom?) complex phase factors.
>>>> Then one can compare the complex hashes and see if they are close
>>>> enough to each other. There are probably better ways.
>>>>
>>>> Best regards
>>>> Ask
>>>>
>>>>> Best,
>>>>> Torsten.
>>>>>
>>>>>> Am 04.02.2015 um 10:00 schrieb jingzhe <jingzhe.chen at gmail.com>:
>>>>>>
>>>>>> Hi Torsten,
>>>>>>
>>>>>> Thanks for quick reply, but I use gcc and
>>>>>> lapack/blas, I mean if the positions
>>>>>> of the atoms are slightly different for different ranks because
>>>>>> of compiler/lib stuff,
>>>>>> can we just set a tolerance in the check_atoms and jump off the
>>>>>> error?
>>>>>>
>>>>>> Best.
>>>>>>
>>>>>> Jingzhe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 于 2015年02月04日 14:32, Torsten Hahn 写道:
>>>>>>> Dear Jingzhe,
>>>>>>>
>>>>>>> we often recognized this error if we use GPAW together with
>>>>>>> Intel MKL <= 11.x on Intel CPU’s. I never tracked down the error
>>>>>>> because it was gone after compiler/library upgrade.
>>>>>>>
>>>>>>> Best,
>>>>>>> Torsten.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Torsten Hahn
>>>>>>> torstenhahn at fastmail.fm
>>>>>>>
>>>>>>>> Am 04.02.2015 um 07:27 schrieb jingzhe Chen
>>>>>>>> <jingzhe.chen at gmail.com>:
>>>>>>>>
>>>>>>>> Dear GPAW guys,
>>>>>>>>
>>>>>>>> I used the latest gpaw to run a relaxation job, and
>>>>>>>> find the below
>>>>>>>> error message.
>>>>>>>>
>>>>>>>> RuntimeError: Atoms objects on different processors are
>>>>>>>> not identical!
>>>>>>>>
>>>>>>>> I find a line in the force calculator
>>>>>>>> 'wfs.world.broadcast(self.F_av, 0)'
>>>>>>>> so that all the forces on different ranks should be the same,
>>>>>>>> which makes
>>>>>>>> me confused, I can not think out any other reason can lead to
>>>>>>>> this error.
>>>>>>>>
>>>>>>>> Could anyone take a look at it?
>>>>>>>>
>>>>>>>> I attached the structure file and running script here,
>>>>>>>> I used 24 cores.
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>> Jingzhe
>>>>>>>>
>>>>>>>> <main.py><model.traj>_______________________________________________
>>>>>>>>
>>>>>>>> gpaw-users mailing list
>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
More information about the gpaw-users
mailing list