[gpaw-users] "Failed to orthogonalize" when domain size changed.
Marcin Dulak
Marcin.Dulak at fysik.dtu.dk
Wed Apr 27 11:36:55 CEST 2011
Hi,
we had such an error already reported:
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2010-November/000467.html
After a long discussion it seemed that the cell was set incorrectly, i
don't think this is the case now.
What about:
mpirun -np 8 gpaw-python `which gpaw-test` 2>&1 | tee test.log
Maybe some tests will fail like here:
https://listserv.fysik.dtu.dk/pipermail/gpaw-developers/2010-September/001094.html
Best regards,
Marcin
Chris Willmore wrote:
> Hi Marcin,
>
> Please find the error trace below. The exception occurred after one
> hour of execution, on the first iteration over dl (value 1.66).
> I have successfully run other scripts utilizing 8 cores.
> This job was run on an Amazon EC2 type c1.xlarge
> (http://aws.amazon.com/ec2/instance-types/)
>
> Thanks and regards,
> Chris
>
> Traceback (most recent call last):
> File "dl-k-2nodes.py", line 34, in <module>
> e = energy(k, h, dl, da, vac, 'dl', dl)
> File "dl-k-2nodes.py", line 21, in energy
> e = slab.get_potential_energy()
> File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in
> get_potential_energy
> return self.calc.get_potential_energy(self)
> File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
> 32, in get_potential_energy
> self.calculate(atoms, converge=True)
> File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
> calculate
> self.occupations):
> File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
> wfs.eigensolver.iterate(hamiltonian, wfs)
> File
> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
> line 71, in iterate
> wfs.orthonormalize()
> File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
> line 190, in orthonormalize
> self.overlap.orthonormalize(self, kpt)
> File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in
> orthonormalize
> self.ksl.inverse_cholesky(S_nn)
> File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
> inverse_cholesky
> raise RuntimeError('Failed to orthogonalize: %d' % info)
> RuntimeError: Failed to orthogonalize: 1
> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.
> Calling MPI_Abort!
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 42.
>
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Marcin Dulak <Marcin.Dulak at fysik.dtu.dk>
> *To:* Chris Willmore <chris.willmore at yahoo.com>
> *Cc:* "gpaw-users at listserv.fysik.dtu.dk"
> <gpaw-users at listserv.fysik.dtu.dk>
> *Sent:* Wednesday, April 27, 2011 10:04 AM
> *Subject:* Re: [gpaw-users] "Failed to orthogonalize" when domain size
> changed.
>
> Hi,
>
> i'm unable to reproduce the problem on 8 cores on our cluster
> (https://wiki.fysik.dtu.dk/niflheim/Hardware), gpaw/0.7.2.6974,
> ase/3.4.1.1765.
> Which of the jobs fail? Can you present the output, especially the
> part that concerns the parallelization,
> for example for Bi111k-2.txt I get:
> ------------------------
> Total number of cores used: 8
> Domain Decomposition: 1 x 2 x 4
> Diagonalizer layout: Serial LAPACK
> Orthonormalizer layout: Serial LAPACK
>
> Symmetries present: 2
> 2 k-points in the Irreducible Part of the Brillouin Zone (total: 4)
> ------------------------
> Please also run tests in parallel (see
> https://wiki.fysik.dtu.dk/gpaw/install/installationguide.html#run-the-tests),
> assuming bash:
>
> mpirun -np 8 gpaw-python `which gpaw-test` 2>&1 | tee test.log
>
> Best regards,
>
> Marcin
>
> Chris Willmore wrote:
> > Hi All,
> >
> > I was given a script to run on some spare hardware, which had a hard
> coded domain of 2. I had 8 cpu's so, I modified the script to use the
> variable gpaw.mpi.world.size. When the script runs with only 2 nodes
> it works fine (albeit slower than desired), but when I run with 8
> nodes, it crashes with a "Failed to orthogonalize" error. Attached is
> the script. Any suggestions?
> >
> > Thanks,
> > Chris
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> <mailto:gpaw-users at listserv.fysik.dtu.dk>
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> -- ***********************************
>
> Marcin Dulak
> Technical University of Denmark
> Department of Physics
> Building 307, Room 229
> DK-2800 Kongens Lyngby
> Denmark
> Tel.: (+45) 4525 3157
> Fax.: (+45) 4593 2399
> email: Marcin.Dulak at fysik.dtu.dk <mailto:Marcin.Dulak at fysik.dtu.dk>
>
> ***********************************
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
--
***********************************
Marcin Dulak
Technical University of Denmark
Department of Physics
Building 307, Room 229
DK-2800 Kongens Lyngby
Denmark
Tel.: (+45) 4525 3157
Fax.: (+45) 4593 2399
email: Marcin.Dulak at fysik.dtu.dk
***********************************
More information about the gpaw-users
mailing list