[gpaw-users] general comment on memory leaks.

Wed Jan 20 22:55:36 CET 2016

Very strange!

It is very difficult for me to see what could be the cause, not having
access to the supercomputer (and in general).

Maybe MPI is broken somehow.  Do other MPI jobs work?  Can MPI "hello
world"-style programmes consistently be executed across multiple
nodes?  Or simple MPI operations (e.g., a loop of repeated
MPI_Allreduce calls) on large arrays?

What about smaller GPAW calculations, like H2O in a big box
parallelized over several nodes?  It sounds like the crashing
calculations are still quite large.

Best regards
Ask

2016-01-20 11:20 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
> and by EVERY SINGLE TIME, I mean i have run the exact same jobs twice or
> thrice to check if they crash or run. For all the cases mentioned above.
>
> On Wed, Jan 20, 2016 at 11:19 AM, abhishek khetan <askhetan at gmail.com>
> wrote:
>>
>> I think I have figured out exactly where the problem lies, but not what is
>> causing it.
>>
>> First, just to give you what our two clusters here are like (in case they
>> may be of help):
>>
>> Cluster1:
>> Chassis: 14x Dell PowerEdge C6100 (Means 14 nodes on this chassis/cluster)
>> Processor/Node: 2x Intel Xeon X5670 (6-core) (Means a total of 2x6=12
>> processors per node)
>> Memory/Node: 48 GByte (12x 4 GByte, 1333MHz) (Means atleast 3.5 Gbs of
>> actual resident memory available per core)
>> Interconnect:Infiniband QDR Dual Port 40Gb/s (non-blocking)
>> File System: lustre file system
>> Operating System: Scientific Linux 6.4
>>
>> Cluster2:
>> Blades: 6x Dell PowerEdge M620 (Means 6 nodes per chassis/cluster)
>> Processor/Blade: 2x Intel Xeon E5-2660v2 (10-core) (Means a total of
>> 2x10=20 processors per node)
>> Memory/Blade: 256 GByte (Means atleast 12.5 Gbs of actual resident memory
>> available per core)
>> Interconnect: Infiniband FDR-10
>> File System: lustre file system
>> Operating System: Scientific Linux 6.4
>>
>> I did an experiment I some low memory jobs (with kpts=1x1x1) on 12 and 24
>> processors on Cluster1 and I also ran some higher memory jobs (with
>> kpts=1x1x2) on 20 and 40 processors on Cluster2.
>>
>> In both cases, when the jobs did not span over more than one node, which
>> means 12 procs on Cluster1 for low mem jobs and 20 procs on Cluster2 for
>> high mem jobs, they ran perfectly well EVERY SINGLE TIME.
>>
>> However, as I increased the number of processors from 12 (1 node) to 24 (2
>> nodes) for the low mem jobs on Cluster1, and 20 (1 node) to 40 (2 nodes) for
>> the higher mem jobs on Cluster2, the behaviour is totally erratic. Sometimes
>> they start, other times they give the same segfault error, which I have
>> described previously in this post. Another interesting feature was that the
>> more number of processors (and therefore nodes) i run the jobs on, the more
>> difficult it is to get to jobs to start. In all simplicity, the number of
>> times the jobs crashed was found to be an exponentially increasing  function
>> of the number of nodes involved. As pseudo-scientific this sounds, its
>> actually what is happening. I have no clue why.
>>
>> Although, this clearly indicates a problem with the inter-node
>> communication here on the cluster, because on single nodes, there is no
>> problem at all.  I have provided you with the exact technical details so
>> that maybe you can let me know if its a known problem on Infiniband FDR or
>> QDR interconnections. Could there be a problem in my compilation? Seems to
>> me not because even on 3 or 4 nodes, the jobs do start sometimes, if I am
>> lucky.
>>
>> Any help is greatly appreciated.
>>
>>
>> On Mon, Jan 18, 2016 at 7:35 PM, abhishek khetan <askhetan at gmail.com>
>> wrote:
>>>
>>> You're right, the word memory leak is a wrong description. I made the
>>> mistake of invariable associating it with the seg fault error, which is what
>>> it actually is. I will make these tests and get back.
>>>
>>>
>>> On Mon, Jan 18, 2016 at 6:32 PM, Ask Hjorth Larsen <asklarsen at gmail.com>
>>> wrote:
>>>>
>>>> Why are you so sure that there are memory leaks?  So far we have only
>>>> seen indications that a lot of memory is allocated.
>>>>
>>>> You could for example lower the grid spacing until it runs, then check
>>>> if memory usage increases linearly with subsequent identical
>>>> calculations.  That would indicate a memory leak.  If you do not
>>>> observe this behaviour, then I don't know what you are seeing, but it
>>>> is certainly not a memory leak!
>>>>
>>>> 2016-01-18 13:26 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
>>>> > I tried using the cluster interactively, and it gives me the output as
>>>> > below. I couldn't make the r_memusage function work but its easily
>>>> > visible
>>>> > that the memory requirements are quite modest. I do not know why there
>>>> > is
>>>> > seg fault when I allocate it in the regular cluster for production
>>>> > jobs.
>>>> >
>>>> >   ___ ___ ___ _ _ _
>>>> >  |   |   |_  | | | |
>>>> >  | | | | | . | | | |
>>>> >  |__ |  _|___|_____|  0.12.0.13279
>>>> >  |___|_|
>>>> >
>>>> > User:   ak498084 at linuxbmc0002.rz.RWTH-Aachen.DE
>>>> > Date:   Mon Jan 18 13:22:24 2016
>>>> > Arch:   x86_64
>>>> > Pid:    20443
>>>> > Python: 2.7.9
>>>> > gpaw:   /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/gpaw
>>>> > _gpaw:
>>>> >
>>>> > /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/build/bin.linux-x86_64-2.7/gpaw-python
>>>> > ase:    /home/ak498084/Utility/GPAW/gpaw_devel/ase/ase (version
>>>> > 3.10.0)
>>>> > numpy:
>>>> >
>>>> > /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/numpy
>>>> > (version 1.9.1)
>>>> > scipy:
>>>> >
>>>> > /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/scipy
>>>> > (version 0.15.1)
>>>> > units:  Angstrom and eV
>>>> > cores:  32
>>>> >
>>>> > Memory estimate
>>>> > ---------------
>>>> > Process memory now: 75.02 MiB
>>>> > Calculator  1145.24 MiB
>>>> >     Density  56.04 MiB
>>>> >         Arrays  15.91 MiB
>>>> >         Localized functions  35.58 MiB
>>>> >         Mixer  4.55 MiB
>>>> >     Hamiltonian  23.19 MiB
>>>> >         Arrays  11.82 MiB
>>>> >         XC  0.00 MiB
>>>> >         Poisson  8.81 MiB
>>>> >         vbar  2.56 MiB
>>>> >     Wavefunctions  1066.01 MiB
>>>> >         Arrays psit_nG  523.69 MiB
>>>> >         Eigensolver  2.29 MiB
>>>> >         Projections  2.06 MiB
>>>> >         Projectors  4.17 MiB
>>>> >         Overlap op  533.81 MiB
>>>> >
>>>> >
>>>> > On Mon, Jan 18, 2016 at 1:01 PM, abhishek khetan <askhetan at gmail.com>
>>>> > wrote:
>>>> >>
>>>> >> Dear Marcin, and Ask,
>>>> >>
>>>> >> I am indeed on this cluster. And I have already used both these
>>>> >> tools.
>>>> >> When I use the r_memusage (to check the peak physical memory), the
>>>> >> peak
>>>> >> physical memory is in the order of a few MBs and the process gets
>>>> >> killed
>>>> >> right as the beginning with the output only as:
>>>> >>
>>>> >>  |   |   |_  | | | |
>>>> >>  | | | | | . | | | |
>>>> >>  |__ |  _|___|_____|  0.12.0.13279
>>>> >>  |___|_|
>>>> >>
>>>> >>
>>>> >> The same is not the case when I take a pre-converged systems and run
>>>> >> the
>>>> >> r_memusage script. It shows me a good 2.5 GBs (and rising) before I
>>>> >> kill the
>>>> >> process, as I can see its running fine. This is what I mean by saying
>>>> >> that
>>>> >> the allocation doesn't even start for these unconverged cases. Using
>>>> >> eigensolver=RMM_DIIS(keep_htpsit=False), has the exact same problems.
>>>> >> Is
>>>> >> there a way, I can trick gpaw into giving the cluster much less of a
>>>> >> requirement. I want to try this because, as I have mentioned, at the
>>>> >> peak
>>>> >> condition my jobs don't need more than 2 GB per core and I'm
>>>> >> providing it 8
>>>> >> GB usually (albiet, to no use).
>>>> >>
>>>> >> Best,
>>>> >>
>>>> >>
>>>> >> On Sat, Jan 16, 2016 at 1:10 PM, Marcin Dulak <mdul at dtu.dk> wrote:
>>>> >>>
>>>> >>> Hi,
>>>> >>>
>>>> >>> are you one this cluster?
>>>> >>> https://doc.itc.rwth-aachen.de/display/CC/r_memusage
>>>> >>>
>>>> >>>
>>>> >>> https://doc.itc.rwth-aachen.de/display/CC/Resource+limitations+on+dialog+systems
>>>> >>> It may be that the batch system (LSF) kills your jobs that exceed
>>>> >>> given
>>>> >>> resident memory.
>>>> >>> The two links above may help you to diagnose that.
>>>> >>> I recall GPAW's memory estimate is not very accurate for standard
>>>> >>> ground-state, PW or GRID mode jobs
>>>> >>> (~20%) and may be very inaccurate (order of magnitude) for VDW or
>>>> >>> LCAO
>>>> >>> jobs (Ask correct me if this is not the case anymore).
>>>> >>>
>>>> >>> Best regards,
>>>> >>>
>>>> >>> Marcin
>>>> >>> _______________________________________________
>>>> >>> gpaw-users mailing list
>>>> >>> gpaw-users at listserv.fysik.dtu.dk
>>>> >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> || radhe radhe ||
>>>> >>
>>>> >> abhishek
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > || radhe radhe ||
>>>> >
>>>> > abhishek
>>>> >
>>>> > _______________________________________________
>>>> > gpaw-users mailing list
>>>> > gpaw-users at listserv.fysik.dtu.dk
>>>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>
>>>
>>>
>>>
>>> --
>>> || radhe radhe ||
>>>
>>> abhishek
>>
>>
>>
>>
>> --
>> || radhe radhe ||
>>
>> abhishek
>
>
>
>
> --
> || radhe radhe ||
>
> abhishek
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users