[gpaw-users] "Random" orthoginalization exceptions

Jens Jørgen Mortensen jensj at fysik.dtu.dk
Wed Nov 2 12:26:36 CET 2011


On 01-11-2011 14:59, Chris Willmore wrote:
> Hi All,
>
> We have a script performing a simple structure optimization. This 
> script was run successfully before at Nifelheim.
> We have run the script numerous times on our hardware withmixed 
> results. Sometimes the script will complete as expected, and other 
> times the script will fail with a "Failed to orthogonalize" exception. 
> We are changing the number of cores used between executions.
> Here is a list of failures and sucesses.
> # Cores  |  Suceed / Fail
> 2    fail
> 4    succeed
> 6    fail
> 8    succeed
>
> We have also run the script on a mixture of hosts types (using Amazon 
> EC2 and a private cloud) with mixed results.
>
> Our question is should the number of hosts/cores or other 
> environmental factors trigger an "orthogonalize" exception?

No.

It works OK for me.  Could you send us the text output from your 
calculations on 2 and 4 cores?

Try also to run the test-suite on 1, 2, 4 and 8 cores, and let us know 
how that goes.

   mpirun -np 2 gpaw-python /path/to/gpaw/tools/gpaw-test

Jens Jørgen

> Attached is the script and below is an example exception trace.
>
> Regards,
> Chris and Vlad
> University of Tartu
>
> Traceback (most recent call last):
>   File "22.py", line 32, in <module>
>     qn.run(fmax=0.05)
>   File "/usr/lib/python2.6/dist-packages/ase/optimize/optimize.py", 
> line 114, in run
>     f = self.atoms.get_forces()
>   File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 536, in 
> get_forces
>     forces = self.calc.get_forces(self)
>   File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line 
> 61, in get_forces
>     force_call_to_set_positions=force_call_to_set_positions)
>   File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in 
> calculate
>     self.occupations):
>   File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
>     wfs.eigensolver.iterate(hamiltonian, wfs)
>   File 
> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py", 
> line 71, in iterate
>     wfs.orthonormalize()
>   File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py", 
> line 190, in orthonormalize
>     self.overlap.orthonormalize(self, kpt)
>   File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in 
> orthonormalize
>     self.ksl.inverse_cholesky(S_nn)
>   File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in 
> inverse_cholesky
>     raise RuntimeError('Failed to orthogonalize: %d' % info)
> RuntimeError: Failed to orthogonalize: 1
> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.  
> Calling MPI_Abort!
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 42.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111102/c90d9981/attachment.html 


More information about the gpaw-users mailing list