[gpaw-users] Erratic Behaviour from Scalapack 2.0.2 on BG/Q

Nichols A. Romero naromero at alcf.anl.gov
Tue Sep 10 17:38:13 CEST 2013


This is a ScaLAPACK bug. The best you can do is report it to the ScaLAPACK mailing list. Not all ScaLAPACK parameters work are guaranteed to work, especially if the matrix is small compared to the processor grid.

----- Original Message -----
> From: "Ask Hjorth Larsen" <asklarsen at gmail.com>
> To: "Andrew Logsdail" <a.logsdail at ucl.ac.uk>
> Cc: "gpaw-users" <gpaw-users at listserv.fysik.dtu.dk>
> Sent: Monday, July 29, 2013 3:54:12 PM
> Subject: Re: [gpaw-users] Erratic Behaviour from Scalapack 2.0.2 on BG/Q
> 
> Hello Andrew
> 
> sl_auto uses as many CPUs as it can, which might be reasonable for
> "normal" supercomputers.  Certainly the code has no excuse for not
> working, even if the layout is much too large as may be.  But the
> last
> thing we concluded was that ScaLAPACK was probably to blame for some
> such problems (not that I would be so sure!).  Anyway on BG it's
> unwise to rely on the automatic parallelization, as the
> parallelization has no way of being aware of the BG partition layout.
> Thus you should specify your own, both for scalapack and the
> remaining
> settings, unless you want to risk a very large performance hit.
> 
> Best regards
> Ask
> 
> 2013/7/29 Andrew Logsdail <a.logsdail at ucl.ac.uk>:
> > I can answer my own problem - using sl_auto was not appropriate in
> > this
> > case.
> >
> > The default BLACS grids (sl_auto) were much too large compared to
> > the
> > criteria outlined on the GPAW webpages for parallel runs
> > (https://wiki.fysik.dtu.dk/gpaw/documentation/parallel_runs/parallel_runs.html)
> > and when manually reduced to 6 x 6 by including sl_default=6,6,64 I
> > was
> > able to reproduce the LAPACK results.
> >
> > I will play with the settings further but consider this a closed
> > case!
> >
> > All the best,
> >
> > Andy
> >
> > On 07/29/2013 05:17 PM, Andrew Logsdail wrote:
> >> Hi all,
> >>
> >> I'm just trying to run some test calculations on the BG/Q to
> >> benchmark
> >> SCF cycles, and am getting some wild behaviour from Scalapack.
> >> Examples
> >> are given below of the output, compared to an equivalent
> >> calculation
> >> using bog-standard lapack. My calculator assignment is:
> >>
> >>       #Assign calculator
> >>       calc = GPAW(h=0.18,
> >>                   xc = 'PBE',
> >>                   spinpol=spinpol,
> >>                   charge=0.0,
> >>                   nbands=-80,
> >> convergence={'energy':0.0001,'density':1.0e-5,'eigenstates':1.0e-8},
> >>                   mixer=mixer,
> >>                   poissonsolver=PoissonSolver(eps=1e-12),
> >>                   eigensolver='rmm-diis',
> >>                   occupations=FermiDirac(0.1),
> >>                   txt=file+'.txt',
> >>                   maxiter=200,
> >>                   parallel={'sl_auto':True,'domain': world.size}
> >>                   )
> >>
> >> where, in this case, spinpol=False and mixer=Mixer(0.05,5).
> >> Obviously,
> >> sl_auto is only enabled with the scalapack run. Is this
> >> trustworthy, or
> >> should I be manually defining sl_default?
> >>
> >> Examples of outputs I get are below:
> >>
> >> LAPACK:
> >> Total number of cores used: 512
> >> Domain Decomposition: 8 x 8 x 8
> >> MatrixOperator buffer_size: default value or
> >>                               see value of nblock in input file
> >> Diagonalizer layout: Serial LAPACK
> >> Orthonormalizer layout: Serial LAPACK
> >>
> >> Symmetries present: 1
> >> 1 k-point (Gamma)
> >> 1 k-point in the Irreducible Part of the Brillouin Zone
> >> Linear Mixing Parameter:           0.05
> >> Pulay Mixing with 5 Old Densities
> >> Damping of Long Wave Oscillations: 50
> >>
> >> Convergence Criteria:
> >> Total Energy Change:           0.0001 eV / electron
> >> Integral of Absolute Density Change:    1e-05 electrons
> >> Integral of Absolute Eigenstate Change: 1e-08 eV^2
> >> Number of Atoms: 147
> >> Number of Atomic Orbitals: 1323
> >> Number of Bands in Calculation:         888
> >> Bands to Converge:                      Occupied States Only
> >> Number of Valence Electrons:            1617
> >>                        log10-error:    Total        Iterations:
> >>              Time      WFS    Density  Energy       Fermi  Poisson
> >> iter:   1  10:02:49  +0.6            -397.616378  4      297
> >> iter:   2  10:05:50  -0.5            -441.060730  4
> >> iter:   3  10:08:51  -1.0            -449.513668  3
> >> ..etc...
> >>
> >> SCALAPACK:
> >> Total number of cores used: 512
> >> Domain Decomposition: 8 x 8 x 8
> >> MatrixOperator buffer_size: default value or
> >>                               see value of nblock in input file
> >> Diagonalizer layout: BLACS 16 x 32 grid with 64 x 64 blocksize
> >> Orthonormalizer layout: BLACS 16 x 32 grid with 64 x 64 blocksize
> >>
> >> Symmetries present: 1
> >> 1 k-point (Gamma)
> >> 1 k-point in the Irreducible Part of the Brillouin Zone
> >> Linear Mixing Parameter:           0.05
> >> Pulay Mixing with 5 Old Densities
> >> Damping of Long Wave Oscillations: 50
> >>
> >> Convergence Criteria:
> >> Total Energy Change:           0.0001 eV / electron
> >> Integral of Absolute Density Change:    1e-05 electrons
> >> Integral of Absolute Eigenstate Change: 1e-08 eV^2
> >> Number of Atoms: 147
> >> Number of Atomic Orbitals: 1323
> >> Number of Bands in Calculation:         888
> >> Bands to Converge:                      Occupied States Only
> >> Number of Valence Electrons:            1617
> >>                        log10-error:    Total        Iterations:
> >>              Time      WFS    Density  Energy       Fermi  Poisson
> >> iter:   1  10:47:57  +2.8            4614.671085  3      297
> >> iter:   2  10:49:22  +3.0            5336.707667  80
> >> iter:   3  10:50:47  +3.0            4587.931757  76
> >> iter:   4  10:52:42  +3.1   -0.7     4399.383453  36     241
> >> iter:   5  10:54:36  +3.2   -0.8     4286.566568  16     246
> >> iter:   6  10:56:17  +3.4   -0.8     5419.473054  125    101
> >> iter:   7  10:58:08  +3.5   -0.9     5711.671954  116    203
> >> iter:   8  10:59:53  +3.6   -0.9     5490.337403  117    159
> >> ..etc..
> >>
> >> Does anyone have any idea what might be causing these problems?
> >> Perhaps
> >> the BLACS layouts are too big?
> >>
> >> I see many printouts in the stdout for the latter calculation
> >> reading:
> >> {   10,   11}:  On entry to PDORMTR parameter number   16 had an
> >> illegal
> >> value
> >>
> >> and I note that previously this has been seen in the mailing list
> >> (https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2013-January/001972.html),
> >> however on that occassion the problem was attributed to a bad
> >> Mixer
> >> setup for an non-perfect starting geometry.
> >>
> >> In my case I've preconverged the geometry on another machine, so
> >> it
> >> should be pretty close to the correct structure.
> >>
> >> Thanks in advance for any suggestions.
> >>
> >> All the best,
> >>
> >> Andy
> >>
> >
> >
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> 

-- 
Nichols A. Romero, Ph.D.
Argonne Leadership Computing Facility
Argonne National Laboratory
Building 240 Room 2-127
9700 South Cass Avenue
Argonne, IL 60490
(630) 252-3441



More information about the gpaw-users mailing list