[gpaw-users] Error when relaxing atoms

Ask Hjorth Larsen asklarsen at gmail.com
Thu Feb 5 16:31:43 CET 2015


Let's not worry about that until we do. :)

Best regards
Ask

2015-02-05 8:53 GMT+01:00 Jens Jørgen Mortensen <jensj at fysik.dtu.dk>:
> On 02/04/2015 05:12 PM, Ask Hjorth Larsen wrote:
>>
>> I committed something in r12401 which should make the check more
>> reliable.  It does not use hashing because the atoms object is sent
>> anyway.
>
>
> Thanks a lot for fixing this!  Should there also be some tolerance for the
> unit cell?
>
> Jens Jørgen
>
>
>> Best regards
>> Ask
>>
>> 2015-02-04 14:47 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>>
>>> Well, to clarify a bit.
>>>
>>> The hashing is useful if we don't want to send stuff around.
>>>
>>> If we are actually sending the positions now (by broadcast; I am only
>>> strictly aware that the forces are broadcast), then each core can
>>> compare locally without the need for hashing, to see if it wants to
>>> raise an error.  (Raising errors on some cores but not all is
>>> sometimes annoying though.)
>>>
>>> Best regards
>>> Ask
>>>
>>> 2015-02-04 12:57 GMT+01:00 Ask Hjorth Larsen <asklarsen at gmail.com>:
>>>>
>>>> Hello
>>>>
>>>> 2015-02-04 10:21 GMT+01:00 Torsten Hahn <torstenhahn at fastmail.fm>:
>>>>>
>>>>> Probably we could do this but my feeling is, that this would only cure
>>>>> the symptoms not the real origin of this annoying bug.
>>>>>
>>>>>
>>>>> In fact there is code in
>>>>>
>>>>> mpi/__init__.py
>>>>>
>>>>> that says:
>>>>>
>>>>> # Construct fingerprint:
>>>>> # ASE may return slightly different atomic positions (e.g. due
>>>>> # to MKL) so compare only first 8 decimals of positions
>>>>>
>>>>>
>>>>> The code says that only 8 decimal positions are used for the generation
>>>>> of atomic „fingerprints“. These code relies on numpy and therefore
>>>>> lapack/blas functions. However i have no idea what that md5_array etc. stuff
>>>>> really does. But there is some debug-code which should at least tell you
>>>>> which Atom(s) causes the problems.
>>>>
>>>> md5_array calculates the md5 sum of the data of an array.  It is a
>>>> kind of checksum.
>>>>
>>>> Rounding unfortunately does not solve the problem.  For any epsilon
>>>> however little, there exist numbers that differ by epsilon but round
>>>> to different numbers.  So the check will not work the way it is
>>>> implemented at the moment: Positions that are "close enough" can
>>>> currently generate an error.  In other words if you get this error,
>>>> maybe there was no problem at all.  Given the vast thousands of DFT
>>>> calculations that are done, this may not be so unlikely.
>>>>
>>>>> However, that error is *very* strange because mpi.broadcast(...) should
>>>>> result in *exactly* the same objects on all cores. No idea why there should
>>>>> be any difference at all and what was the intention behind the fancy
>>>>> fingerprint-generation stuff in the compare_atoms(atoms, comm=world) method.
>>>>
>>>> The check was introduced because there were (infrequent) situations
>>>> where different cores had different positions, due e.g. to the finicky
>>>> numerics elsewhere discussed.  Later, I guess we have accepted the
>>>> numerical issues and relaxed the check so it is no longer exact,
>>>> preferring instead to broadcast.  Evidently something else is
>>>> happening aside from the broadcast, which allows things to go wrong.
>>>> Perhaps the error in the rounding scheme mentioned above.
>>>>
>>>> To explain the hashing: We want to check that numbers on two different
>>>> CPUs are equal.  Either we have to send all the numbers, or hash them
>>>> and send the hash.  Hence hashing is much nicer.  But maybe it would
>>>> be better to hash them with a continuous function.  For example adding
>>>> all numbers with different (pseudorandom?) complex phase factors.
>>>> Then one can compare the complex hashes and see if they are close
>>>> enough to each other.  There are probably better ways.
>>>>
>>>> Best regards
>>>> Ask
>>>>
>>>>> Best,
>>>>> Torsten.
>>>>>
>>>>>> Am 04.02.2015 um 10:00 schrieb jingzhe <jingzhe.chen at gmail.com>:
>>>>>>
>>>>>> Hi Torsten,
>>>>>>
>>>>>>               Thanks for quick reply, but I use gcc and lapack/blas, I
>>>>>> mean if the positions
>>>>>> of the atoms are slightly different for different ranks because of
>>>>>> compiler/lib stuff,
>>>>>> can we just set a tolerance in the check_atoms and jump off the error?
>>>>>>
>>>>>>               Best.
>>>>>>
>>>>>>               Jingzhe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 于 2015年02月04日 14:32, Torsten Hahn 写道:
>>>>>>>
>>>>>>> Dear Jingzhe,
>>>>>>>
>>>>>>> we often recognized this error if we use GPAW together with Intel MKL
>>>>>>> <= 11.x on Intel CPU’s. I never tracked down the error because it was gone
>>>>>>> after compiler/library upgrade.
>>>>>>>
>>>>>>> Best,
>>>>>>> Torsten.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Dr. Torsten Hahn
>>>>>>> torstenhahn at fastmail.fm
>>>>>>>
>>>>>>>> Am 04.02.2015 um 07:27 schrieb jingzhe Chen
>>>>>>>> <jingzhe.chen at gmail.com>:
>>>>>>>>
>>>>>>>> Dear GPAW guys,
>>>>>>>>
>>>>>>>>          I used the latest gpaw to run a relaxation job, and find
>>>>>>>> the below
>>>>>>>> error message.
>>>>>>>>
>>>>>>>>       RuntimeError: Atoms objects on different processors are not
>>>>>>>> identical!
>>>>>>>>
>>>>>>>>          I find a line in the force calculator
>>>>>>>> 'wfs.world.broadcast(self.F_av, 0)'
>>>>>>>> so that all the forces on different ranks should be the same, which
>>>>>>>> makes
>>>>>>>> me confused, I can not think out any other reason can lead to this
>>>>>>>> error.
>>>>>>>>
>>>>>>>>         Could anyone take a look at it?
>>>>>>>>
>>>>>>>>         I attached the structure file and running script here, I
>>>>>>>> used 24 cores.
>>>>>>>>
>>>>>>>>         Thanks in advance.
>>>>>>>>
>>>>>>>>           Jingzhe
>>>>>>>>
>>>>>>>> <main.py><model.traj>_______________________________________________
>>>>>>>> gpaw-users mailing list
>>>>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>



More information about the gpaw-users mailing list