[gpaw-users] Error when relaxing atoms

Marcin Dulak Marcin.Dulak at fysik.dtu.dk
Wed Feb 4 11:40:45 CET 2015


On 02/04/2015 10:54 AM, Tristan Maxson wrote:
> This same problem is being discussed on gpaw-developers, the problem 
> arises because due to the geometry optimization being replicated on 
> all of the cores and there being small differences due to 
> optimizations certain compilers make.  It seems that it should be 
> possible to turn on a debug variable to dump the mismatches to a file 
> for debugging unsurprisingly.  You always could manually edit the file 
> to lower the required precision and try again,  but 8 decimal places 
> already sounds like quite an allowance for variance.
>
> Is it out of the question to try a different compiler and does this 
> occur with all systems you try?
the first thing when investigating any tricky problems is to mention the 
GPAW/ASE versions used,
and if gpaw-test passed in parallel: 
https://wiki.fysik.dtu.dk/gpaw/install/installationguide.html#run-the-tests
It is true that these kind of problems disappear after changing 
compiler/libraries, but then often
appear for other systems. Let me add that the most reliable combination 
for GPAW I found over the years is gcc/acml,
also on intel processors.

Best regards,

Marcin
>
> Thank you,
> Tristan Maxson
>
> On Wed, Feb 4, 2015 at 4:21 AM, Torsten Hahn <torstenhahn at fastmail.fm 
> <mailto:torstenhahn at fastmail.fm>> wrote:
>
>     Probably we could do this but my feeling is, that this would only
>     cure the symptoms not the real origin of this annoying bug.
>
>
>     In fact there is code in
>
>     mpi/__init__.py
>
>     that says:
>
>     # Construct fingerprint:
>     # ASE may return slightly different atomic positions (e.g. due
>     # to MKL) so compare only first 8 decimals of positions
>
>
>     The code says that only 8 decimal positions are used for the
>     generation of atomic „fingerprints“. These code relies on numpy
>     and therefore lapack/blas functions. However i have no idea what
>     that md5_array etc. stuff really does. But there is some
>     debug-code which should at least tell you which Atom(s) causes the
>     problems.
>
>     However, that error is *very* strange because mpi.broadcast(...)
>     should result in *exactly* the same objects on all cores. No idea
>     why there should be any difference at all and what was the
>     intention behind the fancy fingerprint-generation stuff in the
>     compare_atoms(atoms, comm=world) method.
>
>     Best,
>     Torsten.
>
>     > Am 04.02.2015 um 10:00 schrieb jingzhe <jingzhe.chen at gmail.com
>     <mailto:jingzhe.chen at gmail.com>>:
>     >
>     > Hi Torsten,
>     >
>     >              Thanks for quick reply, but I use gcc and
>     lapack/blas, I mean if the positions
>     > of the atoms are slightly different for different ranks because
>     of compiler/lib stuff,
>     > can we just set a tolerance in the check_atoms and jump off the
>     error?
>     >
>     >              Best.
>     >
>     >              Jingzhe
>     >
>     >
>     >
>     >
>     >
>     > 于 2015年02月04日 14:32, Torsten Hahn 写道:
>     >> Dear Jingzhe,
>     >>
>     >> we often recognized this error if we use GPAW together with
>     Intel MKL <= 11.x on Intel CPU’s. I never tracked down the error
>     because it was gone after compiler/library upgrade.
>     >>
>     >> Best,
>     >> Torsten.
>     >>
>     >>
>     >> --
>     >> Dr. Torsten Hahn
>     >> torstenhahn at fastmail.fm <mailto:torstenhahn at fastmail.fm>
>     >>
>     >>> Am 04.02.2015 um 07:27 schrieb jingzhe Chen
>     <jingzhe.chen at gmail.com <mailto:jingzhe.chen at gmail.com>>:
>     >>>
>     >>> Dear GPAW guys,
>     >>>
>     >>>         I used the latest gpaw to run a relaxation job, and
>     find the below
>     >>> error message.
>     >>>
>     >>>      RuntimeError: Atoms objects on different processors are
>     not identical!
>     >>>
>     >>>         I find a line in the force calculator 
>     'wfs.world.broadcast(self.F_av, 0)'
>     >>> so that all the forces on different ranks should be the same,
>     which makes
>     >>> me confused, I can not think out any other reason can lead to
>     this error.
>     >>>
>     >>>        Could anyone take a look at it?
>     >>>
>     >>>        I attached the structure file and running script here,
>     I used 24 cores.
>     >>>
>     >>>        Thanks in advance.
>     >>>
>     >>>          Jingzhe
>     >>>
>     >>>
>     <main.py><model.traj>_______________________________________________
>     >>> gpaw-users mailing list
>     >>> gpaw-users at listserv.fysik.dtu.dk
>     <mailto:gpaw-users at listserv.fysik.dtu.dk>
>     >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>     >>
>     >
>
>
>     _______________________________________________
>     gpaw-users mailing list
>     gpaw-users at listserv.fysik.dtu.dk
>     <mailto:gpaw-users at listserv.fysik.dtu.dk>
>     https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150204/55155216/attachment-0001.html>


More information about the gpaw-users mailing list