[gpaw-users] Strange inverse_cholesky errors
Jakob Blomqvist
jakob.blomqvist at mah.se
Tue Nov 20 16:42:48 CET 2012
Not sure if it is related but sounds like it...
https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2011-November/001108.html
best
Jakob Blomquist
Associate Professor
Dep. of Material Science
IMP, School of Technology
Malmo University
SWEDEN
+46(0)40 6657751
jakob.blomqvist at mah.se
On 11/20/2012 03:32 PM, Morten Bjørn Bakkedal wrote:
> I keep getting some strange inverse_cholesky errors. In the sample below I'm running a calculation on a spin-polarized Ni bulk material in a very small supercell with two Ni atoms.
>
> I'm using the plane-wave mode, but I've also encountered this problem using real-space grid. It's running on DTU's HPC cluster. I've tested with the version from the CAMD 2012 summer school, the latest stable version compiled by myself, as well as the latest developer version. No difference here.
>
> The really strange thing is that the error for this particular code (see below) only seems to be reproduced in a parallel run with 4 cores using OpenMPI. When running it with 1 core or with 8 cores, it runs just fine. However, I've testing other bulk systems as well, and I remember seeing this error also with one core only.
>
> Anybody has some clues on what to do? The Python version installed on the cluster is 2.6.
>
> The following simple code reproduces the error systematically with "mpirun -np 4" after a few seconds:
>
> from ase import Atoms
> from gpaw import GPAW
> from gpaw import PW
>
> ecut = 400
> k = 20
> a = 3.519004126457587
> b = a / 2
> atoms = Atoms('Ni2',
> cell = [[0, 2 * b, 2 * b],
> [b, 0, b],
> [b, b, 0]],
> positions = [[0, 0, 0], [0, b, b]],
> pbc = True)
> atoms.set_initial_magnetic_moments([0.5941, 0.5941])
> calc = GPAW(mode = PW(ecut),
> xc = 'PBE',
> kpts = (k / 2, k, k),
> parallel = {'domain' : None})
> atoms.set_calculator(calc)
> e = atoms.get_potential_energy()
>
>
> This is the console output. Somewhat mixed up due to the parallel run.
>
> $ mpirun -np 4 gpaw-python main.py
>
> ___ ___ ___ _ _ _
> | | |_ | | | |
> | | | | | . | | | |
> |__ | _|___|_____| 0.9.1.9737
> |___|_|
>
> User: mbjba at n-62-24-13
> Date: Tue Nov 20 15:12:13 2012
> Arch: x86_64
> Pid: 8311
> Dir: /zhome/e2/c/74231/gpaw/gpaw/gpaw
> ase: /zhome/e2/c/74231/gpaw/ase/ase (version 3.6.1)
> numpy: /usr/lib64/python2.6/site-packages/numpy (version 1.3.0)
> units: Angstrom and eV
> cores: 4
>
> Memory estimate
> ---------------
> Process memory now: 54.90 MiB
> Calculator 105.98 MiB
> Density 0.71 MiB
> Arrays 0.43 MiB
> Localized functions 0.20 MiB
> Mixer 0.07 MiB
> Hamiltonian 0.33 MiB
> Arrays 0.32 MiB
> XC 0.00 MiB
> Poisson 0.00 MiB
> vbar 0.00 MiB
> Wavefunctions 104.95 MiB
> Arrays psit_nG 78.44 MiB
> Eigensolver 0.18 MiB
> Projectors 23.73 MiB
> Overlap op 0.16 MiB
> PW-descriptor 2.44 MiB
> Traceback (most recent call last):
> File "main.py", line 28, in <module>
> e = atoms.get_potential_energy()
> File "/zhome/e2/c/74231/gpaw/ase/ase/atoms.py", line 627, in get_potential_energy
> return self._calc.get_potential_energy(self)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/aseinterface.py", line 38, in get_potential_energy
> self.calculate(atoms, converge=True)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/paw.py", line 269, in calculate
> self.occupations):
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/scf.py", line 46, in run
> wfs.eigensolver.iterate(hamiltonian, wfs)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/eigensolvers/eigensolver.py", line 62, in iterate
> wfs.overlap.orthonormalize(wfs, kpt)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/overlap.py", line 92, in orthonormalize
> self.ksl.inverse_cholesky(S_nn)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 150, in inverse_cholesky
> self._inverse_cholesky(S_NN)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/kohnsham_layouts.py", line 157, in _inverse_cholesky
> inverse_cholesky(S_NN)
> File "/zhome/e2/c/74231/gpaw/gpaw/gpaw/utilities/lapack.py", line 101, in inverse_cholesky
> raise RuntimeError('inverse_cholesky error: %d' % info)
> RuntimeError: inverse_cholesky error: 1
> GPAW CLEANUP (node 2): <type 'exceptions.RuntimeError'> occurred. Calling MPI_Abort!
>
> Positions:
> 0 Ni 0.0000 0.0000 0.0000
> 1 Ni 0.0000 1.7595 1.7595
>
>
>
>
>
> Ni
>
>
> Ni
>
>
>
>
>
> Unit Cell:
> Periodic X Y Z Points Spacing
> --------------------------------------------------------------------
> 1. axis: yes 0.000000 3.519004 3.519004 20 0.2032
> 2. axis: yes 1.759502 0.000000 1.759502 9 0.2257
> 3. axis: yes 1.759502 1.759502 0.000000 9 0.2257
>
> Ni-setup:
> name : Nickel
> id : 0d9f38a9d6e76a2886f07bb4381f212b
> Z : 28
> valence: 16
> core : 12
> charge : 0.0
> file : /zhome/e2/c/74231/gpaw/gpaw-setups/Ni.PBE.gz
> cutoffs: 1.15(comp), 2.14(filt), 1.98(core), lmax=2
> valence states:
> energy radius
> 4s(2) -5.642 1.164
> 3p(6) -71.394 1.207
> 4p(0) -1.226 1.207
> 3d(8) -8.875 1.138
> *s 21.570 1.164
> *d 18.337 1.138
>
> Using partial waves for Ni as LCAO basis
>
> Using the PBE Exchange-Correlation Functional.
> Spin-Polarized Calculation.
> Magnetic Moment: (0.000000, 0.000000, 1.188200)
> Total Charge: 0.000000
> Fermi Temperature: 0.100000
> Wave functions: Plane wave expansion
> Cutoff energy: 400.000 eV
> Number of coefficients (min, max): 373, 408
> Using FFTW library
> Eigensolver: rmm-diis
> XC and Coulomb potentials evaluated on a 40*18*18 grid
> Interpolation: FFT
> Poisson solver: FFT
> Reference Energy: -82736.026178
>
> Total number of cores used: 4
> Parallelization over k-points and spin: 4
>
> Symmetries present: 4
> 4000 k-points: 10 x 20 x 20 Monkhorst-Pack grid
> 1050 k-points in the Irreducible Part of the Brillouin Zone
> Linear Mixing Parameter: 0.1
> Pulay Mixing with 3 Old Densities
> Damping of Long Wave Oscillations: 50
>
> Convergence Criteria:
> Total Energy Change: 0.0005 eV / electron
> Integral of Absolute Density Change: 0.0001 electrons
> Integral of Absolute Eigenstate Change: 4e-08 eV^2
> Number of Atoms: 2
> Number of Atomic Orbitals: 24
> Number of Bands in Calculation: 24
> Bands to Converge: Occupied States Only
> Number of Valence Electrons: 32
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
> with errorcode 42.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 8313 on
> node n-62-24-13 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
More information about the gpaw-users
mailing list