[gpaw-users] small test HDF5 test case is running out of memory

Nichols A. Romero naromero at alcf.anl.gov
Tue Apr 19 16:30:22 CEST 2011


For anyone that is following the HDF5 stuff, even though I have fixed
HDF5 for the largest test case (Pt 1415 cluster) at 64 MPI tasks. This
is still failing. 

I wonder if this is some special case here because 64-nodes is one Pset
(there is only a single I/O node on a 64-node partition of BG/P).

----- Original Message -----
> Hi,
> 
> Here is my small HDF5 test case:
> http://en.pastebin.ca/2045654
> 
> I should have plenty of memory at 64-nodes, but instead
> I run out of memory on the restart. Here is this traceback:
> 
> GPAW CLEANUP (node 0): <type 'exceptions.MemoryError'> occurred.
> Calling MPI_Ab
> ort!
> Traceback (most recent call last):
> File "Au_bulk3x3x3_restartwfs.py", line 61, in <module>
> calc.initialize_positions()
> File "./gpaw/paw.py", line 285, in initialize_positions
> File "./gpaw/density.py", line 104, in set_positions
> File "./gpaw/lfc.py", line 265, in set_positions
> File "./gpaw/lfc.py", line 365, in _update
> File "./gpaw/lfc.py", line 177, in normalize
> MemoryError
> 
> If the restart is done properly, should the _update method be called
> by the set_positions method in NewLFC?
> https://trac.fysik.dtu.dk/projects/gpaw/browser/branches/gpaw_custom_hdf5/gpaw/lfc.py#L265
> 
> My current hypothesis is that we may not be reading D_asp or dH_asp to
> the
> correct MPI task.
> https://trac.fysik.dtu.dk/projects/gpaw/browser/branches/gpaw_custom_hdf5/gpaw/io/__init__.py#L567
> https://trac.fysik.dtu.dk/projects/gpaw/browser/branches/gpaw_custom_hdf5/gpaw/io/__init__.py#L602
> 
> which MPI tasks should these get these arrays?
> 
> For D_asp, we seem to read from domain_comm.rank == 0, followed by
> initialize_direct_arrays(nt_sG, D_asp)
> 
> but only one the domain_comm master has D_asp. Does this look correct?
> 
> --
> Nichols A. Romero, Ph.D.
> Argonne Leadership Computing Facility
> Argonne National Laboratory
> Building 240 Room 2-127
> 9700 South Cass Avenue
> Argonne, IL 60490
> (630) 252-3441
> 
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users

-- 
Nichols A. Romero, Ph.D.
Argonne Leadership Computing Facility
Argonne National Laboratory
Building 240 Room 2-127
9700 South Cass Avenue
Argonne, IL 60490
(630) 252-3441



More information about the gpaw-users mailing list