[gpaw-users] "Random" orthoginalization exceptions

Jakob Blomquist jakob.blomqvist at mah.se
Wed Nov 2 13:49:30 CET 2011


As I wrote here 
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2011-May/000860.html
there seem to be some issues with Netlib's lapack and blas for Ubuntu 
vis-á-vi  gpaw in parallell mode.
I'm successfully using ACML since then.

Best,

Jakob Blomquist
Associate Professor
Dep. of Material Science
IMP, School of Technology
Malmo University
SWEDEN
+46(0)40 6657751
jakob.blomqvist at mah.se


On 11/02/2011 12:26 PM, Jens Jørgen Mortensen wrote:
> On 01-11-2011 14:59, Chris Willmore wrote:
>> Hi All,
>>
>> We have a script performing a simple structure optimization. This 
>> script was run successfully before at Nifelheim.
>> We have run the script numerous times on our hardware withmixed 
>> results. Sometimes the script will complete as expected, and other 
>> times the script will fail with a "Failed to orthogonalize" 
>> exception. We are changing the number of cores used between executions.
>> Here is a list of failures and sucesses.
>> # Cores  |  Suceed / Fail
>> 2    fail
>> 4    succeed
>> 6    fail
>> 8    succeed
>>
>> We have also run the script on a mixture of hosts types (using Amazon 
>> EC2 and a private cloud) with mixed results.
>>
>> Our question is should the number of hosts/cores or other 
>> environmental factors trigger an "orthogonalize" exception?
>
> No.
>
> It works OK for me.  Could you send us the text output from your 
> calculations on 2 and 4 cores?
>
> Try also to run the test-suite on 1, 2, 4 and 8 cores, and let us know 
> how that goes.
>
>   mpirun -np 2 gpaw-python /path/to/gpaw/tools/gpaw-test
>
> Jens Jørgen
>
>> Attached is the script and below is an example exception trace.
>>
>> Regards,
>> Chris and Vlad
>> University of Tartu
>>
>> Traceback (most recent call last):
>>   File "22.py", line 32, in <module>
>>     qn.run(fmax=0.05)
>>   File "/usr/lib/python2.6/dist-packages/ase/optimize/optimize.py", 
>> line 114, in run
>>     f = self.atoms.get_forces()
>>   File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 536, in 
>> get_forces
>>     forces = self.calc.get_forces(self)
>>   File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line 
>> 61, in get_forces
>>     force_call_to_set_positions=force_call_to_set_positions)
>>   File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in 
>> calculate
>>     self.occupations):
>>   File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
>>     wfs.eigensolver.iterate(hamiltonian, wfs)
>>   File 
>> "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py", 
>> line 71, in iterate
>>     wfs.orthonormalize()
>>   File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py", 
>> line 190, in orthonormalize
>>     self.overlap.orthonormalize(self, kpt)
>>   File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, 
>> in orthonormalize
>>     self.ksl.inverse_cholesky(S_nn)
>>   File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in 
>> inverse_cholesky
>>     raise RuntimeError('Failed to orthogonalize: %d' % info)
>> RuntimeError: Failed to orthogonalize: 1
>> GPAW CLEANUP (node 0): <type 'exceptions.RuntimeError'> occurred.  
>> Calling MPI_Abort!
>> --------------------------------------------------------------------------
>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
>> with errorcode 42.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>>
>>
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20111102/d35b0049/attachment.html 


More information about the gpaw-users mailing list