[gpaw-users] Computational cost of vdW-DF

Wed Apr 13 09:00:58 CEST 2011

On Tue, 2011-04-12 at 17:21 -0400, Duy Le wrote:
> On Tue, Apr 12, 2011 at 12:52 PM,  <andreasm at fysik.dtu.dk> wrote:
> > I guess you can change the number of kernels to be consistent with the
> > number of cpus
> > For example using 16 cpus:
> > vdw=FFTVDWFunctional(name='vdW-DF2',Nalpha=16)
> > Than everything above 16 cpus will only help the GGA part of the
> > calculations.
> 
> By defaults, it is 20. Don't see how it helps if we reduce it to 16.

The calculation is parallelized over Nalpha. So for Nrank (the number of
MPI processes) < Nalpha, if Nrank is evenly divisible by Nalpha then the
load-balancing is better. OTOH, for Nrank > Nalpha, Nrank == N * Nalpha
(where N is an integer) might help with memory usage per node as the vdW
part uses quite a lot of memory (this helps up to the point where you
have Nalpha nodes, obviously).

> > I have a question in this respect. What would take to change the
> > implementation such that you could use more cpus? It would be very helpful
> > if one could use 32 or even 64...

One obvious thing would be to distribute the FFT with MPI. While the
idea sounds obvious, the implementation might not be. Whoever wants it
the most gets to write the patch, I suppose.. ;)

-- 
Janne Blomqvist