[gpaw-users] Strange inverse_cholesky errors
Morten Bjørn Bakkedal
mbjba at mek.dtu.dk
Tue Nov 20 15:32:27 CET 2012
I keep getting some strange inverse_cholesky errors. In the sample below I'm running a calculation on a spin-polarized Ni bulk material in a very small supercell with two Ni atoms.
I'm using the plane-wave mode, but I've also encountered this problem using real-space grid. It's running on DTU's HPC cluster. I've tested with the version from the CAMD 2012 summer school, the latest stable version compiled by myself, as well as the latest developer version. No difference here.
The really strange thing is that the error for this particular code (see below) only seems to be reproduced in a parallel run with 4 cores using OpenMPI. When running it with 1 core or with 8 cores, it runs just fine. However, I've testing other bulk systems as well, and I remember seeing this error also with one core only.
Anybody has some clues on what to do? The Python version installed on the cluster is 2.6.
The following simple code reproduces the error systematically with "mpirun -np 4" after a few seconds:
from ase import Atoms
from gpaw import GPAW
from gpaw import PW
ecut = 400
k = 20
a = 3.519004126457587
b = a / 2
atoms = Atoms('Ni2',
cell = [[0, 2 * b, 2 * b],
[b, 0, b],
[b, b, 0]],
positions = [[0, 0, 0], [0, b, b]],
pbc = True)
atoms.set_initial_magnetic_moments([0.5941, 0.5941])
calc = GPAW(mode = PW(ecut),
xc = 'PBE',
kpts = (k / 2, k, k),
parallel = {'domain' : None})
atoms.set_calculator(calc)
e = atoms.get_potential_energy()
This is the console output. Somewhat mixed up due to the parallel run.
$ mpirun -np 4 gpaw-python main.py
___ ___ ___ _ _ _
| | |_ | | | |
| | | | | . | | | |
|__ | _|___|_____| 0.9.1.9737
|___|_|
User: mbjba at n-62-24-13
Date: Tue Nov 20 15:12:13 2012
Arch: x86_64
Pid: 8311
Dir: /zhome/e2/c/74231/gpaw/gpaw/gpaw
ase: /zhome/e2/c/74231/gpaw/ase/ase (version 3.6.1)
numpy: /usr/lib64/python2.6/site-packages/numpy (version 1.3.0)
units: Angstrom and eV
cores: 4
Memory estimate
---------------
Process memory now: 54.90 MiB
Calculator 105.98 MiB
Density 0.71 MiB
Arrays 0.43 MiB
Localized functions 0.20 MiB
Mixer 0.07 MiB
Hamiltonian 0.33 MiB
Arrays 0.32 MiB
XC 0.00 MiB
Poisson 0.00 MiB
vbar 0.00 MiB
Wavefunctions 104.95 MiB
Arrays psit_nG 78.44 MiB
Eigensolver 0.18 MiB
Projectors 23.73 MiB
Overlap op 0.16 MiB
PW-descriptor 2.44 MiB
Traceback (most recent call last):
File "main.py", line 28, in <module>
e = atoms.get_potential_energy()
File "/zhome/e2/c/74231/gpaw/ase/ase/atoms.py", line 627, in get_potential_energy
return self._calc.get_potential_energy(self)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/aseinterface.py", line 38, in get_potential_energy
self.calculate(atoms, converge=True)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/paw.py", line 269, in calculate
self.occupations):
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/scf.py", line 46, in run
wfs.eigensolver.iterate(hamiltonian, wfs)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/eigensolvers/eigensolver.py", line 62, in iterate
wfs.overlap.orthonormalize(wfs, kpt)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/overlap.py", line 92, in orthonormalize
self.ksl.inverse_cholesky(S_nn)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 150, in inverse_cholesky
self._inverse_cholesky(S_NN)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 157, in _inverse_cholesky
inverse_cholesky(S_NN)
File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/utilities/lapack.py", line 101, in inverse_cholesky
raise RuntimeError('inverse_cholesky error: %d' % info)
RuntimeError: inverse_cholesky error: 1
GPAW CLEANUP (node 2): <type 'exceptions.RuntimeError'> occurred. Calling MPI_Abort!
Positions:
0 Ni 0.0000 0.0000 0.0000
1 Ni 0.0000 1.7595 1.7595
Ni
Ni
Unit Cell:
Periodic X Y Z Points Spacing
--------------------------------------------------------------------
1. axis: yes 0.000000 3.519004 3.519004 20 0.2032
2. axis: yes 1.759502 0.000000 1.759502 9 0.2257
3. axis: yes 1.759502 1.759502 0.000000 9 0.2257
Ni-setup:
name : Nickel
id : 0d9f38a9d6e76a2886f07bb4381f212b
Z : 28
valence: 16
core : 12
charge : 0.0
file : /zhome/e2/c/74231/gpaw/gpaw-setups/Ni.PBE.gz
cutoffs: 1.15(comp), 2.14(filt), 1.98(core), lmax=2
valence states:
energy radius
4s(2) -5.642 1.164
3p(6) -71.394 1.207
4p(0) -1.226 1.207
3d(8) -8.875 1.138
*s 21.570 1.164
*d 18.337 1.138
Using partial waves for Ni as LCAO basis
Using the PBE Exchange-Correlation Functional.
Spin-Polarized Calculation.
Magnetic Moment: (0.000000, 0.000000, 1.188200)
Total Charge: 0.000000
Fermi Temperature: 0.100000
Wave functions: Plane wave expansion
Cutoff energy: 400.000 eV
Number of coefficients (min, max): 373, 408
Using FFTW library
Eigensolver: rmm-diis
XC and Coulomb potentials evaluated on a 40*18*18 grid
Interpolation: FFT
Poisson solver: FFT
Reference Energy: -82736.026178
Total number of cores used: 4
Parallelization over k-points and spin: 4
Symmetries present: 4
4000 k-points: 10 x 20 x 20 Monkhorst-Pack grid
1050 k-points in the Irreducible Part of the Brillouin Zone
Linear Mixing Parameter: 0.1
Pulay Mixing with 3 Old Densities
Damping of Long Wave Oscillations: 50
Convergence Criteria:
Total Energy Change: 0.0005 eV / electron
Integral of Absolute Density Change: 0.0001 electrons
Integral of Absolute Eigenstate Change: 4e-08 eV^2
Number of Atoms: 2
Number of Atomic Orbitals: 24
Number of Bands in Calculation: 24
Bands to Converge: Occupied States Only
Number of Valence Electrons: 32
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 42.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 8313 on
node n-62-24-13 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
More information about the gpaw-users
mailing list