[gpaw-users] general comment on memory leaks.

Thu Jan 21 16:38:36 CET 2016

Okay, very nice.  So the fast Infiniband interconnect is disabled in
favour of the standard network connection.  A better solution would
fix things so they work with infiniband.  I know that on Niflheim,
Infiniband is used with GPAW without similar problems.  Maybe GPAW can
be installed with different libraries or some other change to achieve
stable infiniband calculations.  I suggest asking the admin once more
about this (maybe compare to other installed codes that work with
Infiniband).

Best regards
Ask

2016-01-21 16:19 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
> Problem Solved!!
>
> The only things that could've gone wrong was the inter-node communication. I
> got this tip from the Cluster admin
> _______________________________________________________________________________
> -try out to disable the InfiniBand transport and fallback to IP_over_IB, for
> Open MPI
> export OMPI_MCA_btl="^openib"
> export OMPI_MCA_btl_tcp_if_exclude="ib0,lo"
> No failures any more? strange, let me know. Less failures, or maybe more
> failures? 'race condition' is very likely!
> _______________________________________________________________________________
>
> After passing these before calling the binary, the jobs do not crash anymore
> on 12/24/36/72/84 processors, not one single time. Although, I am still too
> much of a layman to tell how doing this resolved the problem but its works.
> Definitely though, the calculations are a tad bit slower. I will have to
> verify that properly. Thanks a lot for your suggestion of running the jobs
> interactively on the backend. Thats what resolved this. First thing to do
> for me now is to ressurrect the old GW method jobs. If they work fine, that
> would mean this was the problem all along.
>
>
>
> On Thu, Jan 21, 2016 at 9:48 AM, abhishek khetan <askhetan at gmail.com> wrote:
>>
>> No doubt, this kind of crashing happens more for bigger memory jobs (which
>> are actually still well within the provided resources), like this system
>> we're discussing, and also while diagonalizing the full hamiltonian and
>> doing GW calculations. I've described the same to my cluster admin,
>> hopefully they're able to throw some light. I have also used some methods
>> from my own compilation of VASP which require atleast 15-20 gbs per core and
>> they run successfully, so its indeed very strange why this should happen
>> only with GPAW. I have used the default gcc/ATLAS library though for gpaw
>> compilation. maybe I should try with different combinations of intel/MKL,
>> etc. If I am able to determine the problem, I'll get back.
>>
>> On Wed, Jan 20, 2016 at 10:55 PM, Ask Hjorth Larsen <asklarsen at gmail.com>
>> wrote:
>>>
>>> Very strange!
>>>
>>> It is very difficult for me to see what could be the cause, not having
>>> access to the supercomputer (and in general).
>>>
>>> Maybe MPI is broken somehow.  Do other MPI jobs work?  Can MPI "hello
>>> world"-style programmes consistently be executed across multiple
>>> nodes?  Or simple MPI operations (e.g., a loop of repeated
>>> MPI_Allreduce calls) on large arrays?
>>>
>>> What about smaller GPAW calculations, like H2O in a big box
>>> parallelized over several nodes?  It sounds like the crashing
>>> calculations are still quite large.
>>>
>>> Best regards
>>> Ask
>>>
>>> 2016-01-20 11:20 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
>>> > and by EVERY SINGLE TIME, I mean i have run the exact same jobs twice
>>> > or
>>> > thrice to check if they crash or run. For all the cases mentioned
>>> > above.
>>> >
>>> > On Wed, Jan 20, 2016 at 11:19 AM, abhishek khetan <askhetan at gmail.com>
>>> > wrote:
>>> >>
>>> >> I think I have figured out exactly where the problem lies, but not
>>> >> what is
>>> >> causing it.
>>> >>
>>> >> First, just to give you what our two clusters here are like (in case
>>> >> they
>>> >> may be of help):
>>> >>
>>> >> Cluster1:
>>> >> Chassis: 14x Dell PowerEdge C6100 (Means 14 nodes on this
>>> >> chassis/cluster)
>>> >> Processor/Node: 2x Intel Xeon X5670 (6-core) (Means a total of 2x6=12
>>> >> processors per node)
>>> >> Memory/Node: 48 GByte (12x 4 GByte, 1333MHz) (Means atleast 3.5 Gbs of
>>> >> actual resident memory available per core)
>>> >> Interconnect:Infiniband QDR Dual Port 40Gb/s (non-blocking)
>>> >> File System: lustre file system
>>> >> Operating System: Scientific Linux 6.4
>>> >>
>>> >> Cluster2:
>>> >> Blades: 6x Dell PowerEdge M620 (Means 6 nodes per chassis/cluster)
>>> >> Processor/Blade: 2x Intel Xeon E5-2660v2 (10-core) (Means a total of
>>> >> 2x10=20 processors per node)
>>> >> Memory/Blade: 256 GByte (Means atleast 12.5 Gbs of actual resident
>>> >> memory
>>> >> available per core)
>>> >> Interconnect: Infiniband FDR-10
>>> >> File System: lustre file system
>>> >> Operating System: Scientific Linux 6.4
>>> >>
>>> >> I did an experiment I some low memory jobs (with kpts=1x1x1) on 12 and
>>> >> 24
>>> >> processors on Cluster1 and I also ran some higher memory jobs (with
>>> >> kpts=1x1x2) on 20 and 40 processors on Cluster2.
>>> >>
>>> >> In both cases, when the jobs did not span over more than one node,
>>> >> which
>>> >> means 12 procs on Cluster1 for low mem jobs and 20 procs on Cluster2
>>> >> for
>>> >> high mem jobs, they ran perfectly well EVERY SINGLE TIME.
>>> >>
>>> >> However, as I increased the number of processors from 12 (1 node) to
>>> >> 24 (2
>>> >> nodes) for the low mem jobs on Cluster1, and 20 (1 node) to 40 (2
>>> >> nodes) for
>>> >> the higher mem jobs on Cluster2, the behaviour is totally erratic.
>>> >> Sometimes
>>> >> they start, other times they give the same segfault error, which I
>>> >> have
>>> >> described previously in this post. Another interesting feature was
>>> >> that the
>>> >> more number of processors (and therefore nodes) i run the jobs on, the
>>> >> more
>>> >> difficult it is to get to jobs to start. In all simplicity, the number
>>> >> of
>>> >> times the jobs crashed was found to be an exponentially increasing
>>> >> function
>>> >> of the number of nodes involved. As pseudo-scientific this sounds, its
>>> >> actually what is happening. I have no clue why.
>>> >>
>>> >> Although, this clearly indicates a problem with the inter-node
>>> >> communication here on the cluster, because on single nodes, there is
>>> >> no
>>> >> problem at all.  I have provided you with the exact technical details
>>> >> so
>>> >> that maybe you can let me know if its a known problem on Infiniband
>>> >> FDR or
>>> >> QDR interconnections. Could there be a problem in my compilation?
>>> >> Seems to
>>> >> me not because even on 3 or 4 nodes, the jobs do start sometimes, if I
>>> >> am
>>> >> lucky.
>>> >>
>>> >> Any help is greatly appreciated.
>>> >>
>>> >>
>>> >> On Mon, Jan 18, 2016 at 7:35 PM, abhishek khetan <askhetan at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> You're right, the word memory leak is a wrong description. I made the
>>> >>> mistake of invariable associating it with the seg fault error, which
>>> >>> is what
>>> >>> it actually is. I will make these tests and get back.
>>> >>>
>>> >>>
>>> >>> On Mon, Jan 18, 2016 at 6:32 PM, Ask Hjorth Larsen
>>> >>> <asklarsen at gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Why are you so sure that there are memory leaks?  So far we have
>>> >>>> only
>>> >>>> seen indications that a lot of memory is allocated.
>>> >>>>
>>> >>>> You could for example lower the grid spacing until it runs, then
>>> >>>> check
>>> >>>> if memory usage increases linearly with subsequent identical
>>> >>>> calculations.  That would indicate a memory leak.  If you do not
>>> >>>> observe this behaviour, then I don't know what you are seeing, but
>>> >>>> it
>>> >>>> is certainly not a memory leak!
>>> >>>>
>>> >>>> 2016-01-18 13:26 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
>>> >>>> > I tried using the cluster interactively, and it gives me the
>>> >>>> > output as
>>> >>>> > below. I couldn't make the r_memusage function work but its easily
>>> >>>> > visible
>>> >>>> > that the memory requirements are quite modest. I do not know why
>>> >>>> > there
>>> >>>> > is
>>> >>>> > seg fault when I allocate it in the regular cluster for production
>>> >>>> > jobs.
>>> >>>> >
>>> >>>> >   ___ ___ ___ _ _ _
>>> >>>> >  |   |   |_  | | | |
>>> >>>> >  | | | | | . | | | |
>>> >>>> >  |__ |  _|___|_____|  0.12.0.13279
>>> >>>> >  |___|_|
>>> >>>> >
>>> >>>> > User:   ak498084 at linuxbmc0002.rz.RWTH-Aachen.DE
>>> >>>> > Date:   Mon Jan 18 13:22:24 2016
>>> >>>> > Arch:   x86_64
>>> >>>> > Pid:    20443
>>> >>>> > Python: 2.7.9
>>> >>>> > gpaw:   /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/gpaw
>>> >>>> > _gpaw:
>>> >>>> >
>>> >>>> >
>>> >>>> > /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/build/bin.linux-x86_64-2.7/gpaw-python
>>> >>>> > ase:    /home/ak498084/Utility/GPAW/gpaw_devel/ase/ase (version
>>> >>>> > 3.10.0)
>>> >>>> > numpy:
>>> >>>> >
>>> >>>> >
>>> >>>> > /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/numpy
>>> >>>> > (version 1.9.1)
>>> >>>> > scipy:
>>> >>>> >
>>> >>>> >
>>> >>>> > /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/scipy
>>> >>>> > (version 0.15.1)
>>> >>>> > units:  Angstrom and eV
>>> >>>> > cores:  32
>>> >>>> >
>>> >>>> > Memory estimate
>>> >>>> > ---------------
>>> >>>> > Process memory now: 75.02 MiB
>>> >>>> > Calculator  1145.24 MiB
>>> >>>> >     Density  56.04 MiB
>>> >>>> >         Arrays  15.91 MiB
>>> >>>> >         Localized functions  35.58 MiB
>>> >>>> >         Mixer  4.55 MiB
>>> >>>> >     Hamiltonian  23.19 MiB
>>> >>>> >         Arrays  11.82 MiB
>>> >>>> >         XC  0.00 MiB
>>> >>>> >         Poisson  8.81 MiB
>>> >>>> >         vbar  2.56 MiB
>>> >>>> >     Wavefunctions  1066.01 MiB
>>> >>>> >         Arrays psit_nG  523.69 MiB
>>> >>>> >         Eigensolver  2.29 MiB
>>> >>>> >         Projections  2.06 MiB
>>> >>>> >         Projectors  4.17 MiB
>>> >>>> >         Overlap op  533.81 MiB
>>> >>>> >
>>> >>>> >
>>> >>>> > On Mon, Jan 18, 2016 at 1:01 PM, abhishek khetan
>>> >>>> > <askhetan at gmail.com>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> Dear Marcin, and Ask,
>>> >>>> >>
>>> >>>> >> I am indeed on this cluster. And I have already used both these
>>> >>>> >> tools.
>>> >>>> >> When I use the r_memusage (to check the peak physical memory),
>>> >>>> >> the
>>> >>>> >> peak
>>> >>>> >> physical memory is in the order of a few MBs and the process gets
>>> >>>> >> killed
>>> >>>> >> right as the beginning with the output only as:
>>> >>>> >>
>>> >>>> >>  |   |   |_  | | | |
>>> >>>> >>  | | | | | . | | | |
>>> >>>> >>  |__ |  _|___|_____|  0.12.0.13279
>>> >>>> >>  |___|_|
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> The same is not the case when I take a pre-converged systems and
>>> >>>> >> run
>>> >>>> >> the
>>> >>>> >> r_memusage script. It shows me a good 2.5 GBs (and rising) before
>>> >>>> >> I
>>> >>>> >> kill the
>>> >>>> >> process, as I can see its running fine. This is what I mean by
>>> >>>> >> saying
>>> >>>> >> that
>>> >>>> >> the allocation doesn't even start for these unconverged cases.
>>> >>>> >> Using
>>> >>>> >> eigensolver=RMM_DIIS(keep_htpsit=False), has the exact same
>>> >>>> >> problems.
>>> >>>> >> Is
>>> >>>> >> there a way, I can trick gpaw into giving the cluster much less
>>> >>>> >> of a
>>> >>>> >> requirement. I want to try this because, as I have mentioned, at
>>> >>>> >> the
>>> >>>> >> peak
>>> >>>> >> condition my jobs don't need more than 2 GB per core and I'm
>>> >>>> >> providing it 8
>>> >>>> >> GB usually (albiet, to no use).
>>> >>>> >>
>>> >>>> >> Best,
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Sat, Jan 16, 2016 at 1:10 PM, Marcin Dulak <mdul at dtu.dk>
>>> >>>> >> wrote:
>>> >>>> >>>
>>> >>>> >>> Hi,
>>> >>>> >>>
>>> >>>> >>> are you one this cluster?
>>> >>>> >>> https://doc.itc.rwth-aachen.de/display/CC/r_memusage
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> https://doc.itc.rwth-aachen.de/display/CC/Resource+limitations+on+dialog+systems
>>> >>>> >>> It may be that the batch system (LSF) kills your jobs that
>>> >>>> >>> exceed
>>> >>>> >>> given
>>> >>>> >>> resident memory.
>>> >>>> >>> The two links above may help you to diagnose that.
>>> >>>> >>> I recall GPAW's memory estimate is not very accurate for
>>> >>>> >>> standard
>>> >>>> >>> ground-state, PW or GRID mode jobs
>>> >>>> >>> (~20%) and may be very inaccurate (order of magnitude) for VDW
>>> >>>> >>> or
>>> >>>> >>> LCAO
>>> >>>> >>> jobs (Ask correct me if this is not the case anymore).
>>> >>>> >>>
>>> >>>> >>> Best regards,
>>> >>>> >>>
>>> >>>> >>> Marcin
>>> >>>> >>> _______________________________________________
>>> >>>> >>> gpaw-users mailing list
>>> >>>> >>> gpaw-users at listserv.fysik.dtu.dk
>>> >>>> >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> --
>>> >>>> >> || radhe radhe ||
>>> >>>> >>
>>> >>>> >> abhishek
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > --
>>> >>>> > || radhe radhe ||
>>> >>>> >
>>> >>>> > abhishek
>>> >>>> >
>>> >>>> > _______________________________________________
>>> >>>> > gpaw-users mailing list
>>> >>>> > gpaw-users at listserv.fysik.dtu.dk
>>> >>>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> || radhe radhe ||
>>> >>>
>>> >>> abhishek
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> || radhe radhe ||
>>> >>
>>> >> abhishek
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > || radhe radhe ||
>>> >
>>> > abhishek
>>> >
>>> > _______________________________________________
>>> > gpaw-users mailing list
>>> > gpaw-users at listserv.fysik.dtu.dk
>>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>>
>>
>>
>> --
>> || radhe radhe ||
>>
>> abhishek
>
>
>
>
> --
> || radhe radhe ||
>
> abhishek
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users