[gpaw-users] Error when relaxing atoms

Tristan Maxson tgmaxson at gmail.com
Wed Feb 4 10:54:56 CET 2015


This same problem is being discussed on gpaw-developers,  the problem
arises because due to the geometry optimization being replicated on all of
the cores and there being small differences due to optimizations certain
compilers make.  It seems that it should be possible to turn on a debug
variable to dump the mismatches to a file for debugging unsurprisingly.
You always could manually edit the file to lower the required precision and
try again,  but 8 decimal places already sounds like quite an allowance for
variance.

Is it out of the question to try a different compiler and does this occur
with all systems you try?

Thank you,
Tristan Maxson

On Wed, Feb 4, 2015 at 4:21 AM, Torsten Hahn <torstenhahn at fastmail.fm>
wrote:

> Probably we could do this but my feeling is, that this would only cure the
> symptoms not the real origin of this annoying bug.
>
>
> In fact there is code in
>
> mpi/__init__.py
>
> that says:
>
> # Construct fingerprint:
> # ASE may return slightly different atomic positions (e.g. due
> # to MKL) so compare only first 8 decimals of positions
>
>
> The code says that only 8 decimal positions are used for the generation of
> atomic „fingerprints“. These code relies on numpy and therefore lapack/blas
> functions. However i have no idea what that md5_array etc. stuff really
> does. But there is some debug-code which should at least tell you which
> Atom(s) causes the problems.
>
> However, that error is *very* strange because mpi.broadcast(...) should
> result in *exactly* the same objects on all cores. No idea why there should
> be any difference at all and what was the intention behind the fancy
> fingerprint-generation stuff in the compare_atoms(atoms, comm=world) method.
>
> Best,
> Torsten.
>
> > Am 04.02.2015 um 10:00 schrieb jingzhe <jingzhe.chen at gmail.com>:
> >
> > Hi Torsten,
> >
> >              Thanks for quick reply, but I use gcc and lapack/blas, I
> mean if the positions
> > of the atoms are slightly different for different ranks because of
> compiler/lib stuff,
> > can we just set a tolerance in the check_atoms and jump off the error?
> >
> >              Best.
> >
> >              Jingzhe
> >
> >
> >
> >
> >
> > 于 2015年02月04日 14:32, Torsten Hahn 写道:
> >> Dear Jingzhe,
> >>
> >> we often recognized this error if we use GPAW together with Intel MKL
> <= 11.x on Intel CPU’s. I never tracked down the error because it was gone
> after compiler/library upgrade.
> >>
> >> Best,
> >> Torsten.
> >>
> >>
> >> --
> >> Dr. Torsten Hahn
> >> torstenhahn at fastmail.fm
> >>
> >>> Am 04.02.2015 um 07:27 schrieb jingzhe Chen <jingzhe.chen at gmail.com>:
> >>>
> >>> Dear GPAW guys,
> >>>
> >>>         I used the latest gpaw to run a relaxation job, and find the
> below
> >>> error message.
> >>>
> >>>      RuntimeError: Atoms objects on different processors are not
> identical!
> >>>
> >>>         I find a line in the force calculator
> 'wfs.world.broadcast(self.F_av, 0)'
> >>> so that all the forces on different ranks should be the same, which
> makes
> >>> me confused, I can not think out any other reason can lead to this
> error.
> >>>
> >>>        Could anyone take a look at it?
> >>>
> >>>        I attached the structure file and running script here, I used
> 24 cores.
> >>>
> >>>        Thanks in advance.
> >>>
> >>>          Jingzhe
> >>>
> >>> <main.py><model.traj>_______________________________________________
> >>> gpaw-users mailing list
> >>> gpaw-users at listserv.fysik.dtu.dk
> >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> >>
> >
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20150204/2f0003e8/attachment.html>


More information about the gpaw-users mailing list