[gpaw-users] "Failed to orthogonalize" error when running on 4 processors

Jakob Blomquist jakob.blomqvist at mah.se
Mon May 2 14:54:39 CEST 2011


Ok after ALOT of hassle with Inverse Cholesky error in random ways, but 
always at the initial scf-step (sometime it will pass, sometime it will 
not; sometime it seems to be related to domain decomposition; sometime 
with number of cores used; and sometimes just because it feels like 
it!). I updated gpaw to trunk and ase to latest stable version. I tried 
new gpaw-setups (we are calculating on Zr). The only difference was that 
the error message changed from the strange error seen here: 
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2011-March/000745.html
which involved blacs.py even though gpaw wasn't compiled with BlasLapack
to similar error but now involving lapack.py (gpaw trunk).

So what to do! I finally successfully compiled gpaw-trunk with acml4.4.0 
using following customize.py
****
libraries = ['acml']
library_dirs = ['/opt/acml4.4.0/gfortran64/lib']
extra_compile_args = ['-O3', '-std=c99', '-funroll-all-loops', '-fPIC']
****
and set the variable LD_LIBRARY_PATH=/opt/acml4.4.0/gfortran64/lib

SUCCESS!!

I can only surmise that on my system (AMD64 Opteron 16 core, Ubuntu 
10.10) using Ubuntu's netlib's lapack/blas from distro did not cut it.
Now I can only hope that I will not get any more surprises.

/Jakob

On 03/15/2011 03:06 PM, Nichols A. Romero wrote:
> Jakob,
>
> Can you re-run this calculation using the SVN version of GPAW?
>
> Line 620 in blacs.py doesn't exist anymore.
>
> ----- Original Message -----
>> Jakob Blomqvist wrote:
>>> Hmm.... Odd. When you replied this I tried to run it using:
>>> $ mpirun -np 4 gpaw-python gamma.py
>>> Then it runs fine. But when I go through torque it doesn't (for 4
>>> processors).
>>> I did have some issues with installing Torque (2.4.3) though.
>> i think we should focus on the error message
>> https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2011-March/000745.html
>> that suggests that inverse_cholesky is run with scalapack (or maybe
>> i'm
>> wrong about it).
>>
>> Marcin
>>> /Jakob
>>>
>>>
>>>>>> Ask Hjorth Larsen<askhl at fysik.dtu.dk>  03/15/11 1:45 PM>>>
>>> Hi
>>>
>>> On Tue, 15 Mar 2011, Marcin Dulak wrote:
>>>
>>>>
>>>> Ask Hjorth Larsen wrote:
>>>>> Hi
>>>>>
>>>>> On Tue, 15 Mar 2011, Marcin Dulak wrote:
>>>>>
>>>>>> Ask - can you run Jakob's example on ubuntu?
>>>>>>
>>>>>> Marcin
>>>>> Not on four CPUs I'm afraid. I'll give it a try though and get
>>>>> back to
>>>>> you.
>>>> i think you can still oversubscribe the cpus, as long you have
>>> enough memory.
>>>> On Jakob's ubuntu the job crashes at the first SCF.
>>>>
>>>> Marcin
>>>>> Regards
>>>>> Ask
>>>
>>> Yeah, I expected to run out of memory but it was small enough.
>>>
>>> It runs successfully for at least four iterations on my computer
>>> using
>>> the ubuntu package and standard lapack/blas/mpi.
>>>
>>> Regards
>>> Ask
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> --
>> ***********************************
>>
>> Marcin Dulak
>> Technical University of Denmark
>> Department of Physics
>> Building 307, Room 229
>> DK-2800 Kongens Lyngby
>> Denmark
>> Tel.: (+45) 4525 3157
>> Fax.: (+45) 4593 2399
>> email: Marcin.Dulak at fysik.dtu.dk
>>
>> ***********************************
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users



More information about the gpaw-users mailing list