[gpaw-users] "Random" orthoginalization exceptions
Chris Willmore
chris.willmore at yahoo.com
Tue Nov 1 14:59:17 CET 2011
Hi All,
We have a script performing a simple structure optimization. This script was run successfully before at Nifelheim.
We have run the script numerous times on our hardware with mixed results. Sometimes the script will complete as expected, and other times the script will fail with a "Failed to orthogonalize" exception. We are changing the number of cores used between executions.
Here is a list of failures and sucesses.
# Cores | Suceed / Fail
2 fail
4 succeed
6 fail
8 succeed
We have also run the script on a mixture of hosts types (using Amazon EC2 and a private cloud) with mixed results.
Our question is should the number of hosts/cores or other environmental factors trigger an "orthogonalize" exception?
Attached is the script and below is an example exception trace.
Regards,
Chris and Vlad
University of Tartu
Traceback (most recent call last):
File "22.py", line 32, in <module>
qn.run(fmax=0.05)
File "/usr/lib/python2.6/dist-packages/ase/optimize/optimize.py", line 114, in run
f = self.atoms.get_forces()
File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 536, in get_forces
forces = self.calc.get_forces(self)
File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line 61, in get_forces
force_call_to_set_positions=force_call_to_set_positions)
File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in calculate
self.occupations):
File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
wfs.eigensolver.iterate(hamiltonian, wfs)
File "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py", line 71, in iterate
wfs.orthonormalize()
File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py", line 190, in orthonormalize
self.overlap.orthonormalize(self, kpt)
File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in orthonormalize
self.ksl.inverse_cholesky(S_nn)
File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in inverse_cholesky
raise RuntimeError('Failed to orthogonalize: %d' % info)
RuntimeError: Failed to orthogonalize: 1
GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred. Calling MPI_Abort!
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 42.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111101/019f03e1/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 22.py
Type: application/octet-stream
Size: 1046 bytes
Desc: not available
Url : http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111101/019f03e1/attachment.obj
More information about the gpaw-users
mailing list