[gpaw-users] "Failed to orthogonalize" error when running on 4 processors

Tue Mar 15 11:06:05 CET 2011

Hi,

Jakob Blomqvist wrote:
> Example of input that fails and error output.
> I have noticed that it happens for 4 processors when it happens and 
> that gridpoint parameter as well as kpoints and nbands seem to matter. 
> E.g increasing nbands to equal number of valence electrons makes it work.
>
> /Jakob
>
>
>
> Dr. Jakob Blomquist
> IMP, School of Technology
> Malmo University
> SWEDEN
> +46(0)40 6657626
> jakob.blomqvist at mah.se
>
> >>> Marcin Dulak <Marcin.Dulak at fysik.dtu.dk> 03/11/11 5:52 PM >>>
> Hi,
>
> i think nobody answer yet: we need a full input file first.
> If you use some special custom modules, please cut the example to the
> minimum single python script.
>
> Marcin
>
> Jakob Blomqvist wrote:
> > I noticed others had brought up this as well, but why does it happen
> > for 4 processors and not 1, 2, 6, 8, 12 etc?
> >
> > /Jakob
> >
> > >>> "Jakob Blomqvist" 03/09/11 5:06 PM >>>
> > It seems to be a problem for only 4 processors (as far as I can
> > interpret) and using h<=0.20 , and in this case kpts=(4,4,4) other
> > times for kpts=(6,6,6)
> >
> > Anyone?
> >
> > gpaw textfile for h=0.2, kpts(4,4,4) and 4 processors shows:
> > *********************
> > Unit Cell:
> > Periodic X Y Z Points Spacing
> > --------------------------------------------------------------------
> > 1. axis: yes 4.600000 0.000000 0.000000 24 0.1917
> > 2. axis: yes 0.000000 4.600000 0.000000 24 0.1917
> > 3. axis: yes 0.000000 0.000000 4.970000 24 0.2071
> > ...
> > ...
> > Total number of cores used: 4
> > Parallelization over k-points: 4
> > Diagonalizer layout: Serial LAPACK
> > Orthonormalizer layout: Serial LAPACK
> >
> > Symmetries present: 8
> > 12 k-points in the Irreducible Part of the Brillouin Zone (total: 64)
> > **********************
> >
> > error-file shows:
> > **********************
> > Traceback (most recent call last):
> > File "convergenceTestGamma.py", line 39, in <module>
> > E[i][j]=gamma_Hydride.get_potential_energy()
> > File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in
> > get_potential_energy
> > return self.calc.get_potential_energy(self)
> > File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
> > 32, in get_potential_energy
> > self.calculate(atoms, converge=True)
> > File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
> > calculate
> > self.occupations):
> > File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
> > wfs.eigensolver.iterate(hamiltonian, wfs)
> > File
> > "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
> > line 71, in iterate
> > wfs.orthonormalize()
> > File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
> > line 190, in orthonormalize
> > self.overlap.orthonormalize(self, kpt)
> > File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in
> > orthonormalize
> > self.ksl.inverse_cholesky(S_nn)
> > File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
> > inverse_cholesky
> > raise RuntimeError('Failed to orthogonalize: %d' % info)
> > RuntimeError: Failed to orthogonalize: 1
> > ***********************
i don't get this error on our opteron nodes. What is surprising here is 
that the error from line 620 in blacs.py from gpaw-0.7.2.6974 (i think 
you use this one)
suggests that you run with scalapack. Still the text output reported 
above is "Orthonormalizer layout: Serial LAPACK".
Can you attach the whole output?

Marcin
> >
> > Dr. Jakob Blomquist
> > IMP, School of Technology
> > Malmo University
> > SWEDEN
> > +46(0)40 6657626
> > jakob.blomqvist at mah.se
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
> -- 
> ***********************************
>
>