[gpaw-users] Strange inverse_cholesky errors

Morten Bjørn Bakkedal mbjba at mek.dtu.dk
Tue Nov 20 15:32:27 CET 2012


I keep getting some strange inverse_cholesky errors. In the sample below I'm running a calculation on a spin-polarized Ni bulk material in a very small supercell with two Ni atoms.

I'm using the plane-wave mode, but I've also encountered this problem using real-space grid. It's running on DTU's HPC cluster. I've tested with the version from the CAMD 2012 summer school, the latest stable version compiled by myself, as well as the latest developer version. No difference here.

The really strange thing is that the error for this particular code (see below) only seems to be reproduced in a parallel run with 4 cores using OpenMPI. When running it with 1 core or with 8 cores, it runs just fine. However, I've testing other bulk systems as well, and I remember seeing this error also with one core only.
 
Anybody has some clues on what to do? The Python version installed on the cluster is 2.6.

The following simple code reproduces the error systematically with "mpirun -np 4" after a few seconds:

from ase import Atoms
from gpaw import GPAW
from gpaw import PW

ecut = 400
k = 20
a = 3.519004126457587
b = a / 2
atoms = Atoms('Ni2',
              cell = [[0, 2 * b, 2 * b],
                      [b, 0, b],
                      [b, b, 0]],
              positions = [[0, 0, 0], [0, b, b]],
              pbc = True)
atoms.set_initial_magnetic_moments([0.5941, 0.5941])
calc = GPAW(mode = PW(ecut),
            xc = 'PBE',
            kpts = (k / 2, k, k),
            parallel = {'domain' : None})
atoms.set_calculator(calc)
e = atoms.get_potential_energy()


This is the console output. Somewhat mixed up due to the parallel run.

$ mpirun -np 4 gpaw-python main.py

  ___ ___ ___ _ _ _  
 |   |   |_  | | | | 
 | | | | | . | | | | 
 |__ |  _|___|_____|  0.9.1.9737
 |___|_|             

User:  mbjba at n-62-24-13
Date:  Tue Nov 20 15:12:13 2012
Arch:  x86_64
Pid:   8311
Dir:   /zhome/e2/c/74231/gpaw/gpaw/gpaw
ase:   /zhome/e2/c/74231/gpaw/ase/ase (version 3.6.1)
numpy: /usr/lib64/python2.6/site-packages/numpy (version 1.3.0)
units: Angstrom and eV
cores: 4

Memory estimate
---------------
Process memory now: 54.90 MiB
Calculator  105.98 MiB
    Density  0.71 MiB
        Arrays  0.43 MiB
        Localized functions  0.20 MiB
        Mixer  0.07 MiB
    Hamiltonian  0.33 MiB
        Arrays  0.32 MiB
        XC  0.00 MiB
        Poisson  0.00 MiB
        vbar  0.00 MiB
    Wavefunctions  104.95 MiB
        Arrays psit_nG  78.44 MiB
        Eigensolver  0.18 MiB
        Projectors  23.73 MiB
        Overlap op  0.16 MiB
        PW-descriptor  2.44 MiB
Traceback (most recent call last):
  File "main.py", line 28, in <module>
    e = atoms.get_potential_energy()
  File "/zhome/e2/c/74231/gpaw/ase/ase/atoms.py", line 627, in get_potential_energy
    return self._calc.get_potential_energy(self)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/aseinterface.py", line 38, in get_potential_energy
    self.calculate(atoms, converge=True)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/paw.py", line 269, in calculate
    self.occupations):
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/scf.py", line 46, in run
    wfs.eigensolver.iterate(hamiltonian, wfs)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/eigensolvers/eigensolver.py", line 62, in iterate
    wfs.overlap.orthonormalize(wfs, kpt)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/overlap.py", line 92, in orthonormalize
    self.ksl.inverse_cholesky(S_nn)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 150, in inverse_cholesky
    self._inverse_cholesky(S_NN)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 157, in _inverse_cholesky
    inverse_cholesky(S_NN)
  File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/utilities/lapack.py", line 101, in inverse_cholesky
    raise RuntimeError('inverse_cholesky error: %d' % info)
RuntimeError: inverse_cholesky error: 1
GPAW CLEANUP (node 2): <type 'exceptions.RuntimeError'> occurred.  Calling MPI_Abort!

Positions:
  0 Ni    0.0000    0.0000    0.0000
  1 Ni    0.0000    1.7595    1.7595

                  
                  
                  
                  
         Ni       
                  
                  
       Ni         
                  
                  
                  
                  

Unit Cell:
           Periodic     X           Y           Z      Points  Spacing
  --------------------------------------------------------------------
  1. axis:    yes    0.000000    3.519004    3.519004    20     0.2032
  2. axis:    yes    1.759502    0.000000    1.759502     9     0.2257
  3. axis:    yes    1.759502    1.759502    0.000000     9     0.2257

Ni-setup:
  name   : Nickel
  id     : 0d9f38a9d6e76a2886f07bb4381f212b
  Z      : 28
  valence: 16
  core   : 12
  charge : 0.0
  file   : /zhome/e2/c/74231/gpaw/gpaw-setups/Ni.PBE.gz
  cutoffs: 1.15(comp), 2.14(filt), 1.98(core), lmax=2
  valence states:
            energy   radius
    4s(2)   -5.642   1.164
    3p(6)  -71.394   1.207
    4p(0)   -1.226   1.207
    3d(8)   -8.875   1.138
    *s      21.570   1.164
    *d      18.337   1.138

Using partial waves for Ni as LCAO basis

Using the PBE Exchange-Correlation Functional.
Spin-Polarized Calculation.
Magnetic Moment:  (0.000000, 0.000000, 1.188200)
Total Charge:      0.000000
Fermi Temperature: 0.100000
Wave functions: Plane wave expansion
      Cutoff energy: 400.000 eV
      Number of coefficients (min, max): 373, 408
      Using FFTW library
Eigensolver:       rmm-diis
XC and Coulomb potentials evaluated on a 40*18*18 grid
Interpolation: FFT
Poisson solver: FFT
Reference Energy:  -82736.026178

Total number of cores used: 4
Parallelization over k-points and spin: 4

Symmetries present: 4
4000 k-points: 10 x 20 x 20 Monkhorst-Pack grid
1050 k-points in the Irreducible Part of the Brillouin Zone
Linear Mixing Parameter:           0.1
Pulay Mixing with 3 Old Densities
Damping of Long Wave Oscillations: 50

Convergence Criteria:
Total Energy Change:           0.0005 eV / electron
Integral of Absolute Density Change:    0.0001 electrons
Integral of Absolute Eigenstate Change: 4e-08 eV^2
Number of Atoms: 2
Number of Atomic Orbitals: 24
Number of Bands in Calculation:         24
Bands to Converge:                      Occupied States Only
Number of Valence Electrons:            32
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD 
with errorcode 42.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 8313 on
node n-62-24-13 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------



More information about the gpaw-users mailing list