[gpaw-users] Memory Usage for NEB Calculations

Tue Dec 14 01:25:48 CET 2010

Hi Hongliang

I'm not an NEB expert, but I think the below is correct.

On Mon, 13 Dec 2010, Hongliang Xin wrote:

> Dear All,
> 
> My NEB calculation is crashing the computer node due to oversubscribing the
> memory. I checked the output text file, and found that Initial Overhead
> memory estimate for consecutive images keeps increasing, and eventually
> crashes the node.

Normally I'd say the initial overhead is wrong, which is frequently the 
case, but looking at the script I think it's actually correct in this 
case.  So let's read on...

> Can anyone tell what the problem is?  I pasted some relevant code of my
> calculation below.
> 
> initial = read('Initial/initial.gpw',index=-1,format='gpw')
> io.write('InitialGeom.xyz',initial,format='xyz')
> 
> final = read('Final/final.gpw',index=-1,format='gpw')
> io.write('FinalGeom.xyz',final,format='xyz')
> 
> # Construct a list of images:
> images = [initial]
> for i in range(4):
>     images.append(initial.copy())
> for i in range(4):
>     images.append(final.copy())
> images.append(final)
> 
> # Make a mask of zeros and ones that select fixed atoms (the
> # bottom layers):
> mask = []
> for i in range(len(initial)):
>     pos = initial.get_positions()[i]
>     if pos[2]> 6.0:
>         mask.append(0)
>     else:
>         mask.append(1)
> constraint = constraints.FixAtoms(mask=mask)
> 
> for image in images:
>     # Let all images use an EMT calculator:
>     name = 'image_'+str(i)
>     calc = GPAW(mode = 'fd',
>                 basis = 'dzp',
>                 nbands = -25,
>                 xc = 'RPBE',
>                 kpts = (6,3,1),
>                 occupations=FermiDirac(0.1),
>                 eigensolver='rmm-diis',           
>                 convergence={'energy': 1.0e-3,
>                              'density': 1.0e-4,
>                              'eigenstates': 1.0e-9,
>                              'bands': 'occupied'},
>                 spinpol = True,
>                 maxiter = 200,
>                 usesymm = False,
>                 h = 0.18,
>                 txt=name+'.txt',
>                 verbose=False)
>     image.set_calculator(calc)
>     image.set_constraint(constraint)
>     image.set_initial_magnetic_moments([1.0 for a in image])
>     print image.get_initial_magnetic_moments()
>    
> # Create a Nudged Elastic Band:
> neb = NEB(images, parallel=False)
> 
> # Mak a starting guess for the minimum energy path (a straight line
> # from the initial to the final state):
> neb.interpolate()
> 
> # Relax the NEB path:
> qn = optimize.QuasiNewton(neb, logfile='qn.log', trajectory='neb.traj')
> qn.run(fmax=0.1)
> 
> write('NEB.traj', images)
> 
> 
> 
> Thanks,
> 
> Hongliang

Every image has its own calculator.  That probably requires a very large 
amount of memory.  The first NEB iteration, if I'm not mistaken, fully 
allocates each calculator without freeing it.  On the second NEB iteration 
no extra memory should be required, but the calculation probably crashed 
at this point while looping over the images the first time.

The simplest way to solve the problem is to allocate more CPUs.  Another 
way is to use the SingleCalculatorNEB, so you only have one calculator in 
total - much less memory required, but it is less efficient as it cannot 
reuse wavefunctions/densities between iterations.

I would suggest parallelizing over the images, i.e. using parallel=True 
and assigning more CPUs.  I think that requires a number of CPUs which is 
a multiple of the number of images.  It also requires using the 
communicator keyword on each calculator to assign a set of ranks, which I 
guess is rather bothersome to write up, and I notice now that it isn't 
documented in the manual yet for some reason.

To summarize, I would recommend th SingleCalculatorNEB.

Can any NEB-people reveal how they usually parallelize over images?

Regards
Ask