[gpaw-users] "Random" orthoginalization exceptions
Jens Jørgen Mortensen
jensj at fysik.dtu.dk
Wed Nov 2 12:26:36 CET 2011
On 01-11-2011 14:59, Chris Willmore wrote:
> Hi All,
>
> We have a script performing a simple structure optimization. This
> script was run successfully before at Nifelheim.
> We have run the script numerous times on our hardware withmixed
> results. Sometimes the script will complete as expected, and other
> times the script will fail with a "Failed to orthogonalize" exception.
> We are changing the number of cores used between executions.
> Here is a list of failures and sucesses.
> # Cores | Suceed / Fail
> 2 fail
> 4 succeed
> 6 fail
> 8 succeed
>
> We have also run the script on a mixture of hosts types (using Amazon
> EC2 and a private cloud) with mixed results.
>
> Our question is should the number of hosts/cores or other
> environmental factors trigger an "orthogonalize" exception?
No.
It works OK for me. Could you send us the text output from your
calculations on 2 and 4 cores?
Try also to run the test-suite on 1, 2, 4 and 8 cores, and let us know
how that goes.
mpirun -np 2 gpaw-python /path/to/gpaw/tools/gpaw-test
Jens Jørgen
> Attached is the script and below is an example exception trace.
>
> Regards,
> Chris and Vlad
> University of Tartu
>
> Traceback (most recent call last):
> File "22.py", line 32, in <module>
> qn.run(fmax=0.05)
> File "/usr/lib/python2.6/dist-packages/ase/optimize/optimize.py",
> line 114, in run
> f = self.atoms.get_forces()
> File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 536, in
> get_forces
> forces = self.calc.get_forces(self)
> File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
> 61, in get_forces
> force_call_to_set_positions=force_call_to_set_positions)
> File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
> calculate
> self.occupations):
> File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
> wfs.eigensolver.iterate(hamiltonian, wfs)
> File
> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
> line 71, in iterate
> wfs.orthonormalize()
> File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
> line 190, in orthonormalize
> self.overlap.orthonormalize(self, kpt)
> File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in
> orthonormalize
> self.ksl.inverse_cholesky(S_nn)
> File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
> inverse_cholesky
> raise RuntimeError('Failed to orthogonalize: %d' % info)
> RuntimeError: Failed to orthogonalize: 1
> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.
> Calling MPI_Abort!
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
> with errorcode 42.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111102/c90d9981/attachment.html
More information about the gpaw-users
mailing list