[gpaw-users] diagonalize_full_hamiltonian: Assertion error assert bd.comm.size == nprow * npcol

Ask Hjorth Larsen asklarsen at gmail.com
Tue Nov 18 17:20:13 CET 2014


Apparently some bad numbers are passed sometimes when the system is
too small.  We never solved this because it has never affected a
system of significant size and nobody likes or has time to mess with
that stuff when it isn't breaking something they need.  Hmmm.  This
doesn't answer any questions, but someone could look into the error
and try to fix it (not me though, sorry).

Best regards
Ask

2014-11-18 13:15 GMT+01:00 Marcin Dulak <Marcin.Dulak at fysik.dtu.dk>:
> On 11/18/2014 10:46 AM, Jens Jørgen Mortensen wrote:
>>
>> On 11/17/2014 08:33 PM, Thomas Stenbæk Jauho wrote:
>>>
>>> Hi Trevor,
>>>
>>> To me it seems like an issue with scalapack and the parallelization you
>>> are trying.
>>>
>>> As I understand the code will try to parallelize over bands when using 32
>>> cores (as it cannot parallelize 5x5x1 = 25 kpts on 32 cores).
>>>
>>> Therefore it will switch to band parallelization, however in order to do
>>> this successfully you should set scalapack correspondingly.
>>>
>>> As an example I use the following code for parallelizing the
>>> diagonalization over 8 or 16 cores respectively:
>>>
>>> calc.diagonalize_full_hamiltonian(nbands=nbands, scalapack=(4,4,32))
>>>
>>> This means nprow=4 and npcol=4, their product should equal the number of
>>> cores you're using, unless spin parallelization is included, in which case
>>> the product needs to be half the number of cores.
>>
>>
>> With recent versions of GPAW, one can just use "scalapack=True" and it
>> will figure out what ScaLapack parameters to use.  With the latest
>> development version (from today), you don't have to set "scalapack" at all -
>> it will use ScaLapack if there is parallelization over bands.
>
> is this good? - we had problems in the past where scalapack was giving wrong
> total energies:
> https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2013-January/001972.html
>
> Best regards,
>
> Marcin
>
>>
>> I normally split such calculations in three parts: 1) ground-state, 2)
>> full diagonalization, and then finally 3) calculation of the response
>> function.  Then I can choose freely the most optimal number of processes for
>> each step.
>>
>> There is a helper function:
>>
>> https://wiki.fysik.dtu.dk/gpaw/epydoc/gpaw.fulldiag-module.html
>>
>> that will do step 2 if you have a gpw-file containing the ground-state
>> density.
>>
>> If you don't want to use ScaLapack then you can force GPAW to parallelize
>> over spins and k-points (possibly with a bit of load imbalance) like this:
>>
>>     calc = GPAW('gs.gpw', parallel={'band': 1})
>>     calc.diagonalize_full_hamiltonian(60)
>>     calc.write('gs-all.gpw', 'all')
>>
>> or equivalently:
>>
>>     fulldiag('gs.gpw', 60)
>>
>> Jens Jørgen
>>
>>
>>
>>
>>
>>
>>>
>>> Best,
>>> Thomas
>>>
>>>
>>>
>>>
>>> On 2014-11-17 11:16, Trevor Hardcastle wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Brand new GPAW! user here. I'm trying to get a basic EEL spectrum for
>>>> N-doped graphene using gpaw/0.11.0.12151. The calculation works fine
>>>> when using 16 cores, but fails when I try with 32 cores. The exact
>>>> same .py script was used in both cases:
>>>>
>>>>
>>>> *******************************************************************************************************
>>>>
>>>> from __future__ import print_function
>>>>
>>>> import numpy as np
>>>>
>>>> from ase import Atoms
>>>>
>>>> from ase.parallel import paropen
>>>>
>>>> from gpaw import GPAW, FermiDirac, PW
>>>>
>>>> from gpaw.response.df import DielectricFunction
>>>>
>>>> graphene = Atoms('C7N',
>>>>
>>>>  scaled_positions = [(0.8333333333333334, 0.1666666666666666,
>>>> 0.5000000000000000),
>>>>
>>>>  (0.8333333333333334, 0.6666666666666669, 0.5000000000000000),
>>>>
>>>>  (0.3333333333333332, 0.1666666666666666, 0.5000000000000000),
>>>>
>>>>  (0.1666666666666663, 0.3333333333333329, 0.5000000000000000),
>>>>
>>>>  (0.6666666666666672, 0.8333333333333337, 0.5000000000000000),
>>>>
>>>>  (0.1666666666666664, 0.8333333333333339, 0.5000000000000000),
>>>>
>>>>  (0.3333333333333334, 0.6666666666666667, 0.5000000000000000),
>>>>
>>>>  (0.6666666666666667, 0.3333333333333334, 0.5000000000000000)],
>>>>
>>>>  pbc=(1, 1, 1),
>>>>
>>>>  cell=[(4.260844986619441, -2.460000000000003, 0.000000000000000),
>>>>
>>>>  (-0.000000000000000, 4.920000000000004, 0.000000000000000),
>>>>
>>>>  (0.000000000000000, 0.000000000000000, 10.000000000000000)]
>>>>
>>>>  )
>>>>
>>>> calc = GPAW(mode=PW(100),
>>>>
>>>>  nbands=60,
>>>>
>>>>  basis='dzp',
>>>>
>>>>  eigensolver='rmm-diis',
>>>>
>>>>  xc='LDA',
>>>>
>>>>  kpts=(5, 5, 1),
>>>>
>>>>  txt='graphene_1.txt')
>>>>
>>>> graphene.set_calculator(calc)
>>>>
>>>> graphene.get_potential_energy()
>>>>
>>>> calc.set(kpts=(20, 20, 1), fixdensity=True)
>>>>
>>>> calc.set(fixdensity=True)
>>>>
>>>> graphene.get_potential_energy()
>>>>
>>>> calc.diagonalize_full_hamiltonian(nbands=60)
>>>>
>>>> calc.write('graphene.gpw', 'all')
>>>>
>>>> df = DielectricFunction(calc='graphene.gpw',
>>>>
>>>>  domega0=0.01,
>>>>
>>>>  eta=0.3,
>>>>
>>>>  ecut=20,
>>>>
>>>>  txt='G_to_G.txt')
>>>>
>>>> q_c = [0.0, 0.0, 0.0]
>>>>
>>>> df.get_eels_spectrum(q_c=q_c, filename='G_to_G_EELS.csv')
>>>>
>>>>
>>>> *******************************************************************************************************
>>>>
>>>> When I run this on 32 cores, I get these assertion errors:
>>>>
>>>> · AssertionError
>>>>
>>>>  calc.diagonalize_full_hamiltonian(nbands=60)
>>>>
>>>> "assert bd.comm.size == nprow * npcol" on line 746 of
>>>>
>>>> $HOME/gpaw/0.11.0.12151/lib/python2.7/site-packages/gpaw/wavefunctions/pw.py
>>>>
>>>> · GPAW CLEANUP (node 23): <type 'exceptions.AssertionError'>
>>>> occurred. Calling MPI_Abort!
>>>>
>>>>  File
>>>>
>>>> "/home/ufaserv1_g/pretha/soft/gpaw/0.11.0.12151/lib/python2.7/site-packages/gpaw/paw.py",
>>>> line 1000, in diagonalize_full_hamiltonian
>>>>
>>>>  nbands, scalapack)
>>>>
>>>> What exactly are the quantities "bd.comm.size", "nprow" and "npcol",
>>>> and how can I avoid encountering these errors? I'm using the ARC2
>>>> machine at Leeds University:
>>>> https://hec.wiki.leeds.ac.uk/bin/view/Documentation/WebHome [1]
>>>>
>>>> Many thanks in advance for any tips.
>>>>
>>>> Trevor
>>>>
>>>> ************************
>>>>
>>>> Dr. Trevor P. Hardcastle
>>>>
>>>> EPSRC Doctoral Prize Fellow
>>>>
>>>> Room B20.C
>>>>
>>>> Engineering Building
>>>>
>>>> Institute for Materials Research
>>>>
>>>> University of Leeds
>>>>
>>>> LS2 9JT, United Kingdom
>>>>
>>>> t.p.hardcastle at leeds.ac.uk
>>>>
>>>> ************************
>>>>
>>>>
>>>>
>>>> Links:
>>>> ------
>>>> [1] https://hec.wiki.leeds.ac.uk/bin/view/Documentation/WebHome
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>
>>> _______________________________________________
>>> gpaw-users mailing list
>>> gpaw-users at listserv.fysik.dtu.dk
>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users



More information about the gpaw-users mailing list