[gpaw-users] "Failed to orthogonalize" when domain size changed.

Jens Jørgen Mortensen jensj at fysik.dtu.dk
Fri Apr 29 08:33:57 CEST 2011


On Wed, 2011-04-27 at 11:07 +0200, Chris Willmore wrote:
> Hi Marcin,
> 
> 
> Please find the error trace below. The exception occurred after one
> hour of execution, on the first iteration over dl (value 1.66).

Could you send us the text output from this one?  Does it work if you
parallelize over the 8 k-points in the IBZ (default behavior)?

Jens Jørgen

> I have successfully run other scripts utilizing 8 cores.
> This job was run on an Amazon EC2 type c1.xlarge
> (http://aws.amazon.com/ec2/instance-types/)
> 
> 
> Thanks and regards,
> Chris
> 
> 
> 
> Traceback (most recent call last):
>   File "dl-k-2nodes.py", line 34, in <module>
>     e = energy(k, h, dl, da, vac, 'dl', dl)
>   File "dl-k-2nodes.py", line 21, in energy
>     e = slab.get_potential_energy()
>   File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in
> get_potential_energy
>     return self.calc.get_potential_energy(self)
>   File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
> 32, in get_potential_energy
>     self.calculate(atoms, converge=True)
>   File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
> calculate
>     self.occupations):
>   File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
>     wfs.eigensolver.iterate(hamiltonian, wfs)
>   File
> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
> line 71, in iterate
>     wfs.orthonormalize()
>   File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
> line 190, in orthonormalize
>     self.overlap.orthonormalize(self, kpt)
>   File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in
> orthonormalize
>     self.ksl.inverse_cholesky(S_nn)
>   File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
> inverse_cholesky
>     raise RuntimeError('Failed to orthogonalize: %d' % info)
> RuntimeError: Failed to orthogonalize: 1
> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.
> Calling MPI_Abort!
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
> with errorcode 42.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ______________________________________________________________________
> From: Marcin Dulak <Marcin.Dulak at fysik.dtu.dk>
> To: Chris Willmore <chris.willmore at yahoo.com>
> Cc: "gpaw-users at listserv.fysik.dtu.dk"
> <gpaw-users at listserv.fysik.dtu.dk>
> Sent: Wednesday, April 27, 2011 10:04 AM
> Subject: Re: [gpaw-users] "Failed to orthogonalize" when domain size
> changed.
> 
> Hi,
> 
> i'm unable to reproduce the problem on 8 cores on our cluster
> (https://wiki.fysik.dtu.dk/niflheim/Hardware), gpaw/0.7.2.6974,
> ase/3.4.1.1765.
> Which of the jobs fail? Can you present the output, especially the
> part that concerns the parallelization,
> for example for Bi111k-2.txt I get:
> ------------------------
> Total number of cores used: 8
> Domain Decomposition: 1 x 2 x 4
> Diagonalizer layout: Serial LAPACK
> Orthonormalizer layout: Serial LAPACK
> 
> Symmetries present: 2
> 2 k-points in the Irreducible Part of the Brillouin Zone (total: 4)
> ------------------------
> Please also run tests in parallel (see
> https://wiki.fysik.dtu.dk/gpaw/install/installationguide.html#run-the-tests), assuming bash:
> 
> mpirun -np 8 gpaw-python `which gpaw-test`  2>&1 | tee test.log
> 
> Best regards,
> 
> Marcin
> 
> Chris Willmore wrote:
> > Hi All,
> > 
> > I was given a script to run on some spare hardware, which had a hard
> coded domain of 2. I had 8 cpu's so, I modified the script to use the
> variable gpaw.mpi.world.size. When the script runs with only 2 nodes
> it works fine (albeit slower than desired), but when I run with 8
> nodes, it crashes with a "Failed to orthogonalize" error. Attached is
> the script. Any suggestions?
> > 
> > Thanks,
> > Chris
> >
> ------------------------------------------------------------------------
> > 
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> 
> -- ***********************************
> 
> Marcin Dulak
> Technical University of Denmark
> Department of Physics
> Building 307, Room 229
> DK-2800 Kongens Lyngby
> Denmark
> Tel.: (+45) 4525 3157
> Fax.: (+45) 4593 2399
> email: Marcin.Dulak at fysik.dtu.dk
> 
> ***********************************
> 
> 
> 
> 




More information about the gpaw-users mailing list