[gpaw-users] "Failed to orthogonalize" when domain size changed.

Marcin Dulak Marcin.Dulak at fysik.dtu.dk
Wed Apr 27 11:36:55 CEST 2011


Hi,

we had such an error already reported:
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2010-November/000467.html
After a long discussion it seemed that the cell was set incorrectly, i 
don't think this is the case now.
What about:
mpirun -np 8 gpaw-python `which gpaw-test`  2>&1 | tee test.log
Maybe some tests will fail like here: 
https://listserv.fysik.dtu.dk/pipermail/gpaw-developers/2010-September/001094.html

Best regards,

Marcin

Chris Willmore wrote:
> Hi Marcin,
>
> Please find the error trace below. The exception occurred after one 
> hour of execution, on the first iteration over dl (value 1.66).
> I have successfully run other scripts utilizing 8 cores.
> This job was run on an Amazon EC2 type c1.xlarge 
> (http://aws.amazon.com/ec2/instance-types/)
>
> Thanks and regards,
> Chris
>
> Traceback (most recent call last):
>   File "dl-k-2nodes.py", line 34, in <module>
>     e = energy(k, h, dl, da, vac, 'dl', dl)
>   File "dl-k-2nodes.py", line 21, in energy
>     e = slab.get_potential_energy()
>   File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in 
> get_potential_energy
>     return self.calc.get_potential_energy(self)
>   File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line 
> 32, in get_potential_energy
>     self.calculate(atoms, converge=True)
>   File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in 
> calculate
>     self.occupations):
>   File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
>     wfs.eigensolver.iterate(hamiltonian, wfs)
>   File 
> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py", 
> line 71, in iterate
>     wfs.orthonormalize()
>   File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py", 
> line 190, in orthonormalize
>     self.overlap.orthonormalize(self, kpt)
>   File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in 
> orthonormalize
>     self.ksl.inverse_cholesky(S_nn)
>   File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in 
> inverse_cholesky
>     raise RuntimeError('Failed to orthogonalize: %d' % info)
> RuntimeError: Failed to orthogonalize: 1
> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.  
> Calling MPI_Abort!
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 42.
>
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Marcin Dulak <Marcin.Dulak at fysik.dtu.dk>
> *To:* Chris Willmore <chris.willmore at yahoo.com>
> *Cc:* "gpaw-users at listserv.fysik.dtu.dk" 
> <gpaw-users at listserv.fysik.dtu.dk>
> *Sent:* Wednesday, April 27, 2011 10:04 AM
> *Subject:* Re: [gpaw-users] "Failed to orthogonalize" when domain size 
> changed.
>
> Hi,
>
> i'm unable to reproduce the problem on 8 cores on our cluster 
> (https://wiki.fysik.dtu.dk/niflheim/Hardware), gpaw/0.7.2.6974, 
> ase/3.4.1.1765.
> Which of the jobs fail? Can you present the output, especially the 
> part that concerns the parallelization,
> for example for Bi111k-2.txt I get:
> ------------------------
> Total number of cores used: 8
> Domain Decomposition: 1 x 2 x 4
> Diagonalizer layout: Serial LAPACK
> Orthonormalizer layout: Serial LAPACK
>
> Symmetries present: 2
> 2 k-points in the Irreducible Part of the Brillouin Zone (total: 4)
> ------------------------
> Please also run tests in parallel (see 
> https://wiki.fysik.dtu.dk/gpaw/install/installationguide.html#run-the-tests), 
> assuming bash:
>
> mpirun -np 8 gpaw-python `which gpaw-test`  2>&1 | tee test.log
>
> Best regards,
>
> Marcin
>
> Chris Willmore wrote:
> > Hi All,
> >
> > I was given a script to run on some spare hardware, which had a hard 
> coded domain of 2. I had 8 cpu's so, I modified the script to use the 
> variable gpaw.mpi.world.size. When the script runs with only 2 nodes 
> it works fine (albeit slower than desired), but when I run with 8 
> nodes, it crashes with a "Failed to orthogonalize" error. Attached is 
> the script. Any suggestions?
> >
> > Thanks,
> > Chris
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk 
> <mailto:gpaw-users at listserv.fysik.dtu.dk>
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> -- ***********************************
>
> Marcin Dulak
> Technical University of Denmark
> Department of Physics
> Building 307, Room 229
> DK-2800 Kongens Lyngby
> Denmark
> Tel.: (+45) 4525 3157
> Fax.: (+45) 4593 2399
> email: Marcin.Dulak at fysik.dtu.dk <mailto:Marcin.Dulak at fysik.dtu.dk>
>
> ***********************************
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users

-- 
***********************************
 
Marcin Dulak
Technical University of Denmark
Department of Physics
Building 307, Room 229
DK-2800 Kongens Lyngby
Denmark
Tel.: (+45) 4525 3157
Fax.: (+45) 4593 2399
email: Marcin.Dulak at fysik.dtu.dk

***********************************



More information about the gpaw-users mailing list