[gpaw-users] "Failed to orthogonalize" when domain size changed.

Chris Willmore chris.willmore at yahoo.com
Wed Apr 27 11:07:50 CEST 2011


Hi Marcin,

Please find the error trace below. The exception occurred after one hour of execution, on the first iteration over dl (value 1.66).
I have successfully run other scripts utilizing 8 cores.
This job was run on an Amazon EC2 type c1.xlarge (http://aws.amazon.com/ec2/instance-types/)

Thanks and regards,
Chris


Traceback (most recent call last):
  File "dl-k-2nodes.py", line 34, in <module>
    e = energy(k, h, dl, da, vac, 'dl', dl)
  File "dl-k-2nodes.py", line 21, in energy
    e = slab.get_potential_energy()
  File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in get_potential_energy
    return self.calc.get_potential_energy(self)
  File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line 32, in get_potential_energy
    self.calculate(atoms, converge=True)
  File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in calculate
    self.occupations):
  File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
    wfs.eigensolver.iterate(hamiltonian, wfs)
  File "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py", line 71, in iterate
    wfs.orthonormalize()
  File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py", line 190, in orthonormalize
    self.overlap.orthonormalize(self, kpt)
  File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in orthonormalize
    self.ksl.inverse_cholesky(S_nn)
  File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in inverse_cholesky
    raise RuntimeError('Failed to orthogonalize: %d' % info)
RuntimeError: Failed to orthogonalize: 1
GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.  Calling MPI_Abort!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 42.






________________________________
From: Marcin Dulak <Marcin.Dulak at fysik.dtu.dk>
To: Chris Willmore <chris.willmore at yahoo.com>
Cc: "gpaw-users at listserv.fysik.dtu.dk" <gpaw-users at listserv.fysik.dtu.dk>
Sent: Wednesday, April 27, 2011 10:04 AM
Subject: Re: [gpaw-users] "Failed to orthogonalize" when domain size changed.

Hi,

i'm unable to reproduce the problem on 8 cores on our cluster (https://wiki.fysik.dtu.dk/niflheim/Hardware), gpaw/0.7.2.6974, ase/3.4.1.1765.
Which of the jobs fail? Can you present the output, especially the part that concerns the parallelization,
for example for Bi111k-2.txt I get:
------------------------
Total number of cores used: 8
Domain Decomposition: 1 x 2 x 4
Diagonalizer layout: Serial LAPACK
Orthonormalizer layout: Serial LAPACK

Symmetries present: 2
2 k-points in the Irreducible Part of the Brillouin Zone (total: 4)
------------------------
Please also run tests in parallel (see https://wiki.fysik.dtu.dk/gpaw/install/installationguide.html#run-the-tests), assuming bash:

mpirun -np 8 gpaw-python `which gpaw-test`  2>&1 | tee test.log

Best regards,

Marcin

Chris Willmore wrote:
> Hi All,
> 
> I was given a script to run on some spare hardware, which had a hard coded domain of 2. I had 8 cpu's so, I modified the script to use the variable gpaw.mpi.world.size. When the script runs with only 2 nodes it works fine (albeit slower than desired), but when I run with 8 nodes, it crashes with a "Failed to orthogonalize" error. Attached is the script. Any suggestions?
> 
> Thanks,
> Chris
> ------------------------------------------------------------------------
> 
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users

-- ***********************************

Marcin Dulak
Technical University of Denmark
Department of Physics
Building 307, Room 229
DK-2800 Kongens Lyngby
Denmark
Tel.: (+45) 4525 3157
Fax.: (+45) 4593 2399
email: Marcin.Dulak at fysik.dtu.dk

***********************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20110427/3514032f/attachment.html 


More information about the gpaw-users mailing list