[ase-users] Problems with dftd3 and multiple nodes.
Eric Hermes
ehermes at chem.wisc.edu
Tue Oct 31 15:59:02 CET 2017
On Tue, 2017-10-31 at 10:29 +0100, Sascha Thinius wrote:
> Hi Eric,
>
> I've installed the ase-version including your changes. But it did not
> change anything.
> The ase_dftd3.xyz file was writen, while the ase_dftd3.out file is
> still empty.
>
> Running the calculation on one node, the following (not empty) files
> were created:
> .EDISP
> dftd3_gradient
> ase_dftd3.xyz
> ase_dftd3.out
>
> Thanks for help so far. I appreciate more suggestions to fix the
> problem.
Please keep replies on-list.
Can you confirm that the tests you are running only differ in the
number of nodes used? Are there any other differences between the
calculation that works and the one that doesn't besides the number of
nodes? Does this calculation work without the DFTD3 calculator?
I don't really understand the output in the files you attached... I
don't know where the .EDISP file is coming from, and I don't know where
the "d3-else" messages are coming from in your output. The DFTD3
calculator does neither of those. Have you made any modifications to
GPAW, the dftd3 program itself, or the DFTD3 calculator in ASE to
produce this output? The DFTD3 calculator will only work with the
unmodified reference implementation of dftd3 from Grimme's website.
Keep in mind that the dftd3 executable is serial; it is neither MPI nor
OpenMP parallelized. That means for very large systems with three-body
contributions, it can take a very long time. But, your system is not
that big and you are not using threebody corrections. I can run the
dispersion calculation on my laptop in much less than a second.
However, if you are testing larger systems with threebody corrections,
it may look like the calculation is hanging just because it takes a
long time.
Also, GPAW is only MPI parallelized, not OpenMP, so you should probably
disable OpenMP on this calculation anyway with something like
"OMP_NUM_THREADS=1" in your environment. The crash is coming from
libpoe, which I take it is some sort of IBM parallel execution library,
but I am not at all familiar with IBM hardware or software. It may be
related to OpenMP, but I genuinely have no idea how to even start
diagnosing that stack trace. It's certainly not crashing in the dftd3
executable itself though.
Eric
>
> Sascha.
>
>
> On Thu, 26 Oct 2017 15:48:21 +0000
> Eric Hermes via ase-users <ase-users at listserv.fysik.dtu.dk> wrote:
> > On Thu, 2017-10-26 at 11:11 +0200, Sascha Thinius via ase-users
> > wrote:
> > > Good morning,
> > >
> > > I am happy that dftd3 is available in the ASE from version ase-
> > > 3.15.0b1.
> > > Using a single node everything works fine for me.
> > > Using two or more nodes the code get stuck.
> > > Attached you find out-File, err-File, structure-File, python-
> > > scipt
> > > and the submission-scipt. Ignore bad settings in the
> > > python-scipt.
> > > The code get stuck in the calculate() fuction line 228 (in the if
> > > world.rank == 0 statement).
> > > ase_dftd3.xyz ist written, ase_dftd3.out is written but empty.
> > >
> > > Thanks for any advise.
> >
> > Hm, it's hard to tell what's going wrong based on the files you
> > shared.
> > I am the one who wrote this module, but I never tested it with
> > gpaw-
> > python or mpi4py, so it's not terribly surprising that it's not
> > working
> > across multiple hosts. It looks like Alexander Tygesen did some
> > work on
> > the code to make it more compatible with parallel calculations, so
> > he
> > might have some insight into what's going wrong.
> >
> > I've committed some additional changes to the code which
> > reorganizes
> > some of the parallel logic and gets rid of the assumption that the
> > dftd3 files are readable by all MPI processes (for example, if you
> > are
> > running the calculation on local storage across multiple hosts...).
> > There's a chance this will fix it for you, just pull from the git
> > head
> > and try again. If this doesn't solve your issue, please share any
> > files
> > created by the DFTD3 calculator (i.e. ase_dftd3.{out,POSCAR,xyz},
> > dftd3_cellgradient, dftd3_gradient, .dftd3par.local).
> >
> > Eric
> >
> > >
> > > All the best,
> > > Sascha.
> > > _______________________________________________
> > > ase-users mailing list
> > > ase-users at listserv.fysik.dtu.dk
> > > https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
> >
> > _______________________________________________
> > ase-users mailing list
> > ase-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
>
>
More information about the ase-users
mailing list