[gpaw-users] "Failed to orthogonalize" error when running on 4 processors
Marcin Dulak
Marcin.Dulak at fysik.dtu.dk
Tue Mar 15 12:57:19 CET 2011
Ask - can you run Jakob's example on ubuntu?
Marcin
Jakob Blomqvist wrote:
> outputfile is attached.
> Again it seems to be quite sensitive, but so far I have only
> experienced it on 4 processors (16 processor AMD machine with Ubuntu
> 10.10, gpaw and ase installed from debian repos.)
>
> /Jakob
>
>
> Dr. Jakob Blomquist
> IMP, School of Technology
> Malmo University
> SWEDEN
> +46(0)40 6657626
> jakob.blomqvist at mah.se
>
> >>> Marcin Dulak <Marcin.Dulak at fysik.dtu.dk> 03/15/11 11:06 AM >>>
> Hi,
>
> Jakob Blomqvist wrote:
> > Example of input that fails and error output.
> > I have noticed that it happens for 4 processors when it happens and
> > that gridpoint parameter as well as kpoints and nbands seem to matter.
> > E.g increasing nbands to equal number of valence electrons makes it
> work.
> >
> > /Jakob
> >
> >
> >
> > Dr. Jakob Blomquist
> > IMP, School of Technology
> > Malmo University
> > SWEDEN
> > +46(0)40 6657626
> > jakob.blomqvist at mah.se
> >
> > >>> Marcin Dulak <Marcin.Dulak at fysik.dtu.dk> 03/11/11 5:52 PM >>>
> > Hi,
> >
> > i think nobody answer yet: we need a full input file first.
> > If you use some special custom modules, please cut the example to the
> > minimum single python script.
> >
> > Marcin
> >
> > Jakob Blomqvist wrote:
> > > I noticed others had brought up this as well, but why does it happen
> > > for 4 processors and not 1, 2, 6, 8, 12 etc?
> > >
> > > /Jakob
> > >
> > > >>> "Jakob Blomqvist" 03/09/11 5:06 PM >>>
> > > It seems to be a problem for only 4 processors (as far as I can
> > > interpret) and using h<=0.20 , and in this case kpts=(4,4,4) other
> > > times for kpts=(6,6,6)
> > >
> > > Anyone?
> > >
> > > gpaw textfile for h=0.2, kpts(4,4,4) and 4 processors shows:
> > > *********************
> > > Unit Cell:
> > > Periodic X Y Z Points Spacing
> > > --------------------------------------------------------------------
> > > 1. axis: yes 4.600000 0.000000 0.000000 24 0.1917
> > > 2. axis: yes 0.000000 4.600000 0.000000 24 0.1917
> > > 3. axis: yes 0.000000 0.000000 4.970000 24 0.2071
> > > ...
> > > ...
> > > Total number of cores used: 4
> > > Parallelization over k-points: 4
> > > Diagonalizer layout: Serial LAPACK
> > > Orthonormalizer layout: Serial LAPACK
> > >
> > > Symmetries present: 8
> > > 12 k-points in the Irreducible Part of the Brillouin Zone (total: 64)
> > > **********************
> > >
> > > error-file shows:
> > > **********************
> > > Traceback (most recent call last):
> > > File "convergenceTestGamma.py", line 39, in <module>
> > > E[i][j]=gamma_Hydride.get_potential_energy()
> > > File "/usr/lib/python2.6/dist-packages/ase/atoms.py", line 494, in
> > > get_potential_energy
> > > return self.calc.get_potential_energy(self)
> > > File "/usr/lib/python2.6/dist-packages/gpaw/aseinterface.py", line
> > > 32, in get_potential_energy
> > > self.calculate(atoms, converge=True)
> > > File "/usr/lib/python2.6/dist-packages/gpaw/paw.py", line 265, in
> > > calculate
> > > self.occupations):
> > > File "/usr/lib/python2.6/dist-packages/gpaw/scf.py", line 46, in run
> > > wfs.eigensolver.iterate(hamiltonian, wfs)
> > > File
> > > "/usr/lib/python2.6/dist-packages/gpaw/eigensolvers/eigensolver.py",
> > > line 71, in iterate
> > > wfs.orthonormalize()
> > > File "/usr/lib/python2.6/dist-packages/gpaw/wavefunctions/fdpw.py",
> > > line 190, in orthonormalize
> > > self.overlap.orthonormalize(self, kpt)
> > > File "/usr/lib/python2.6/dist-packages/gpaw/overlap.py", line 76, in
> > > orthonormalize
> > > self.ksl.inverse_cholesky(S_nn)
> > > File "/usr/lib/python2.6/dist-packages/gpaw/blacs.py", line 620, in
> > > inverse_cholesky
> > > raise RuntimeError('Failed to orthogonalize: %d' % info)
> > > RuntimeError: Failed to orthogonalize: 1
> > > ***********************
> i don't get this error on our opteron nodes. What is surprising here is
> that the error from line 620 in blacs.py from gpaw-0.7.2.6974 (i think
> you use this one)
> suggests that you run with scalapack. Still the text output reported
> above is "Orthonormalizer layout: Serial LAPACK".
> Can you attach the whole output?
>
> Marcin
> > >
> > > Dr. Jakob Blomquist
> > > IMP, School of Technology
> > > Malmo University
> > > SWEDEN
> > > +46(0)40 6657626
> > > jakob.blomqvist at mah.se
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > gpaw-users mailing list
> > > gpaw-users at listserv.fysik.dtu.dk
> > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> >
> > --
> > ***********************************
> >
> >
>
--
***********************************
Marcin Dulak
Technical University of Denmark
Department of Physics
Building 307, Room 229
DK-2800 Kongens Lyngby
Denmark
Tel.: (+45) 4525 3157
Fax.: (+45) 4593 2399
email: Marcin.Dulak at fysik.dtu.dk
***********************************
More information about the gpaw-users
mailing list