[gpaw-users] "Random" orthoginalization exceptions
Jakob Blomquist
jakob.blomqvist at mah.se
Wed Nov 2 13:49:30 CET 2011
As I wrote here
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2011-May/000860.html
there seem to be some issues with Netlib's lapack and blas for Ubuntu
vis-á-vi gpaw in parallell mode.
I'm successfully using ACML since then.
Best,
Jakob Blomquist
Associate Professor
Dep. of Material Science
IMP, School of Technology
Malmo University
SWEDEN
+46(0)40 6657751
jakob.blomqvist at mah.se
On 11/02/2011 12:26 PM, Jens Jørgen Mortensen wrote:
> On 01-11-2011 14:59, Chris Willmore wrote:
>> Hi All,
>>
>> We have a script performing a simple structure optimization. This
>> script was run successfully before at Nifelheim.
>> We have run the script numerous times on our hardware withmixed
>> results. Sometimes the script will complete as expected, and other
>> times the script will fail with a "Failed to orthogonalize"
>> exception. We are changing the number of cores used between executions.
>> Here is a list of failures and sucesses.
>> # Cores | Suceed / Fail
>> 2 fail
>> 4 succeed
>> 6 fail
>> 8 succeed
>>
>> We have also run the script on a mixture of hosts types (using Amazon
>> EC2 and a private cloud) with mixed results.
>>
>> Our question is should the number of hosts/cores or other
>> environmental factors trigger an "orthogonalize" exception?
>
> No.
>
> It works OK for me. Could you send us the text output from your
> calculations on 2 and 4 cores?
>
> Try also to run the test-suite on 1, 2, 4 and 8 cores, and let us know
> how that goes.
>
> mpirun -np 2 gpaw-python /path/to/gpaw/tools/gpaw-test
>
> Jens Jørgen
>
>> Attached is the script and below is an example exception trace.
>>
>> Regards,
>> Chris and Vlad
>> University of Tartu
>>
>> Traceback (most recent call last):
>> File "22.py", line 32, in <module>
>> qn.run(fmax=0.05)
>> File "/usr/lib/python2.6/dist-packages/ase/optimize/optimize.py",
>> line 114, in run
>> f = self.atoms.get_forces()
>> File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 536, in
>> get_forces
>> forces = self.calc.get_forces(self)
>> File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
>> 61, in get_forces
>> force_call_to_set_positions=force_call_to_set_positions)
>> File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
>> calculate
>> self.occupations):
>> File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
>> wfs.eigensolver.iterate(hamiltonian, wfs)
>> File
>> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
>> line 71, in iterate
>> wfs.orthonormalize()
>> File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
>> line 190, in orthonormalize
>> self.overlap.orthonormalize(self, kpt)
>> File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76,
>> in orthonormalize
>> self.ksl.inverse_cholesky(S_nn)
>> File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
>> inverse_cholesky
>> raise RuntimeError('Failed to orthogonalize: %d' % info)
>> RuntimeError: Failed to orthogonalize: 1
>> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.
>> Calling MPI_Abort!
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode 42.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>>
>>
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111102/d35b0049/attachment.html
More information about the gpaw-users
mailing list