[gpaw-users] general comment on memory leaks.

abhishek khetan askhetan at gmail.com
Thu Jan 21 16:19:52 CET 2016


Problem Solved!!

The only things that could've gone wrong was the inter-node communication.
I got this tip from the Cluster admin
_______________________________________________________________________________
-try out to disable the InfiniBand transport and fallback to IP_over_IB,
for Open MPI
export OMPI_MCA_btl="^openib"
export OMPI_MCA_btl_tcp_if_exclude="ib0,lo"
No failures any more? strange, let me know. Less failures, or maybe more
failures? 'race condition' is very likely!
_______________________________________________________________________________

After passing these before calling the binary, the jobs do not crash
anymore on 12/24/36/72/84 processors, not one single time. Although, I am
still too much of a layman to tell how doing this resolved the problem but
its works. Definitely though, the calculations are a tad bit slower. I will
have to verify that properly. Thanks a lot for your suggestion of running
the jobs interactively on the backend. Thats what resolved this. First
thing to do for me now is to ressurrect the old GW method jobs. If they
work fine, that would mean this was the problem all along.



On Thu, Jan 21, 2016 at 9:48 AM, abhishek khetan <askhetan at gmail.com> wrote:

> No doubt, this kind of crashing happens more for bigger memory jobs (which
> are actually still well within the provided resources), like this system
> we're discussing, and also while diagonalizing the full hamiltonian and
> doing GW calculations. I've described the same to my cluster admin,
> hopefully they're able to throw some light. I have also used some methods
> from my own compilation of VASP which require atleast 15-20 gbs per core
> and they run successfully, so its indeed very strange why this should
> happen only with GPAW. I have used the default gcc/ATLAS library though for
> gpaw compilation. maybe I should try with different combinations of
> intel/MKL, etc. If I am able to determine the problem, I'll get back.
>
> On Wed, Jan 20, 2016 at 10:55 PM, Ask Hjorth Larsen <asklarsen at gmail.com>
> wrote:
>
>> Very strange!
>>
>> It is very difficult for me to see what could be the cause, not having
>> access to the supercomputer (and in general).
>>
>> Maybe MPI is broken somehow.  Do other MPI jobs work?  Can MPI "hello
>> world"-style programmes consistently be executed across multiple
>> nodes?  Or simple MPI operations (e.g., a loop of repeated
>> MPI_Allreduce calls) on large arrays?
>>
>> What about smaller GPAW calculations, like H2O in a big box
>> parallelized over several nodes?  It sounds like the crashing
>> calculations are still quite large.
>>
>> Best regards
>> Ask
>>
>> 2016-01-20 11:20 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
>> > and by EVERY SINGLE TIME, I mean i have run the exact same jobs twice or
>> > thrice to check if they crash or run. For all the cases mentioned above.
>> >
>> > On Wed, Jan 20, 2016 at 11:19 AM, abhishek khetan <askhetan at gmail.com>
>> > wrote:
>> >>
>> >> I think I have figured out exactly where the problem lies, but not
>> what is
>> >> causing it.
>> >>
>> >> First, just to give you what our two clusters here are like (in case
>> they
>> >> may be of help):
>> >>
>> >> Cluster1:
>> >> Chassis: 14x Dell PowerEdge C6100 (Means 14 nodes on this
>> chassis/cluster)
>> >> Processor/Node: 2x Intel Xeon X5670 (6-core) (Means a total of 2x6=12
>> >> processors per node)
>> >> Memory/Node: 48 GByte (12x 4 GByte, 1333MHz) (Means atleast 3.5 Gbs of
>> >> actual resident memory available per core)
>> >> Interconnect:Infiniband QDR Dual Port 40Gb/s (non-blocking)
>> >> File System: lustre file system
>> >> Operating System: Scientific Linux 6.4
>> >>
>> >> Cluster2:
>> >> Blades: 6x Dell PowerEdge M620 (Means 6 nodes per chassis/cluster)
>> >> Processor/Blade: 2x Intel Xeon E5-2660v2 (10-core) (Means a total of
>> >> 2x10=20 processors per node)
>> >> Memory/Blade: 256 GByte (Means atleast 12.5 Gbs of actual resident
>> memory
>> >> available per core)
>> >> Interconnect: Infiniband FDR-10
>> >> File System: lustre file system
>> >> Operating System: Scientific Linux 6.4
>> >>
>> >> I did an experiment I some low memory jobs (with kpts=1x1x1) on 12 and
>> 24
>> >> processors on Cluster1 and I also ran some higher memory jobs (with
>> >> kpts=1x1x2) on 20 and 40 processors on Cluster2.
>> >>
>> >> In both cases, when the jobs did not span over more than one node,
>> which
>> >> means 12 procs on Cluster1 for low mem jobs and 20 procs on Cluster2
>> for
>> >> high mem jobs, they ran perfectly well EVERY SINGLE TIME.
>> >>
>> >> However, as I increased the number of processors from 12 (1 node) to
>> 24 (2
>> >> nodes) for the low mem jobs on Cluster1, and 20 (1 node) to 40 (2
>> nodes) for
>> >> the higher mem jobs on Cluster2, the behaviour is totally erratic.
>> Sometimes
>> >> they start, other times they give the same segfault error, which I have
>> >> described previously in this post. Another interesting feature was
>> that the
>> >> more number of processors (and therefore nodes) i run the jobs on, the
>> more
>> >> difficult it is to get to jobs to start. In all simplicity, the number
>> of
>> >> times the jobs crashed was found to be an exponentially increasing
>> function
>> >> of the number of nodes involved. As pseudo-scientific this sounds, its
>> >> actually what is happening. I have no clue why.
>> >>
>> >> Although, this clearly indicates a problem with the inter-node
>> >> communication here on the cluster, because on single nodes, there is no
>> >> problem at all.  I have provided you with the exact technical details
>> so
>> >> that maybe you can let me know if its a known problem on Infiniband
>> FDR or
>> >> QDR interconnections. Could there be a problem in my compilation?
>> Seems to
>> >> me not because even on 3 or 4 nodes, the jobs do start sometimes, if I
>> am
>> >> lucky.
>> >>
>> >> Any help is greatly appreciated.
>> >>
>> >>
>> >> On Mon, Jan 18, 2016 at 7:35 PM, abhishek khetan <askhetan at gmail.com>
>> >> wrote:
>> >>>
>> >>> You're right, the word memory leak is a wrong description. I made the
>> >>> mistake of invariable associating it with the seg fault error, which
>> is what
>> >>> it actually is. I will make these tests and get back.
>> >>>
>> >>>
>> >>> On Mon, Jan 18, 2016 at 6:32 PM, Ask Hjorth Larsen <
>> asklarsen at gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Why are you so sure that there are memory leaks?  So far we have only
>> >>>> seen indications that a lot of memory is allocated.
>> >>>>
>> >>>> You could for example lower the grid spacing until it runs, then
>> check
>> >>>> if memory usage increases linearly with subsequent identical
>> >>>> calculations.  That would indicate a memory leak.  If you do not
>> >>>> observe this behaviour, then I don't know what you are seeing, but it
>> >>>> is certainly not a memory leak!
>> >>>>
>> >>>> 2016-01-18 13:26 GMT+01:00 abhishek khetan <askhetan at gmail.com>:
>> >>>> > I tried using the cluster interactively, and it gives me the
>> output as
>> >>>> > below. I couldn't make the r_memusage function work but its easily
>> >>>> > visible
>> >>>> > that the memory requirements are quite modest. I do not know why
>> there
>> >>>> > is
>> >>>> > seg fault when I allocate it in the regular cluster for production
>> >>>> > jobs.
>> >>>> >
>> >>>> >   ___ ___ ___ _ _ _
>> >>>> >  |   |   |_  | | | |
>> >>>> >  | | | | | . | | | |
>> >>>> >  |__ |  _|___|_____|  0.12.0.13279
>> >>>> >  |___|_|
>> >>>> >
>> >>>> > User:   ak498084 at linuxbmc0002.rz.RWTH-Aachen.DE
>> >>>> > Date:   Mon Jan 18 13:22:24 2016
>> >>>> > Arch:   x86_64
>> >>>> > Pid:    20443
>> >>>> > Python: 2.7.9
>> >>>> > gpaw:   /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/gpaw
>> >>>> > _gpaw:
>> >>>> >
>> >>>> >
>> /home/ak498084/Utility/GPAW/gpaw_devel/gpaw-0.12/build/bin.linux-x86_64-2.7/gpaw-python
>> >>>> > ase:    /home/ak498084/Utility/GPAW/gpaw_devel/ase/ase (version
>> >>>> > 3.10.0)
>> >>>> > numpy:
>> >>>> >
>> >>>> >
>> /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/numpy
>> >>>> > (version 1.9.1)
>> >>>> > scipy:
>> >>>> >
>> >>>> >
>> /usr/local_rwth/sw/python/2.7.9/x86_64/lib/python2.7/site-packages/scipy
>> >>>> > (version 0.15.1)
>> >>>> > units:  Angstrom and eV
>> >>>> > cores:  32
>> >>>> >
>> >>>> > Memory estimate
>> >>>> > ---------------
>> >>>> > Process memory now: 75.02 MiB
>> >>>> > Calculator  1145.24 MiB
>> >>>> >     Density  56.04 MiB
>> >>>> >         Arrays  15.91 MiB
>> >>>> >         Localized functions  35.58 MiB
>> >>>> >         Mixer  4.55 MiB
>> >>>> >     Hamiltonian  23.19 MiB
>> >>>> >         Arrays  11.82 MiB
>> >>>> >         XC  0.00 MiB
>> >>>> >         Poisson  8.81 MiB
>> >>>> >         vbar  2.56 MiB
>> >>>> >     Wavefunctions  1066.01 MiB
>> >>>> >         Arrays psit_nG  523.69 MiB
>> >>>> >         Eigensolver  2.29 MiB
>> >>>> >         Projections  2.06 MiB
>> >>>> >         Projectors  4.17 MiB
>> >>>> >         Overlap op  533.81 MiB
>> >>>> >
>> >>>> >
>> >>>> > On Mon, Jan 18, 2016 at 1:01 PM, abhishek khetan <
>> askhetan at gmail.com>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Dear Marcin, and Ask,
>> >>>> >>
>> >>>> >> I am indeed on this cluster. And I have already used both these
>> >>>> >> tools.
>> >>>> >> When I use the r_memusage (to check the peak physical memory), the
>> >>>> >> peak
>> >>>> >> physical memory is in the order of a few MBs and the process gets
>> >>>> >> killed
>> >>>> >> right as the beginning with the output only as:
>> >>>> >>
>> >>>> >>  |   |   |_  | | | |
>> >>>> >>  | | | | | . | | | |
>> >>>> >>  |__ |  _|___|_____|  0.12.0.13279
>> >>>> >>  |___|_|
>> >>>> >>
>> >>>> >>
>> >>>> >> The same is not the case when I take a pre-converged systems and
>> run
>> >>>> >> the
>> >>>> >> r_memusage script. It shows me a good 2.5 GBs (and rising) before
>> I
>> >>>> >> kill the
>> >>>> >> process, as I can see its running fine. This is what I mean by
>> saying
>> >>>> >> that
>> >>>> >> the allocation doesn't even start for these unconverged cases.
>> Using
>> >>>> >> eigensolver=RMM_DIIS(keep_htpsit=False), has the exact same
>> problems.
>> >>>> >> Is
>> >>>> >> there a way, I can trick gpaw into giving the cluster much less
>> of a
>> >>>> >> requirement. I want to try this because, as I have mentioned, at
>> the
>> >>>> >> peak
>> >>>> >> condition my jobs don't need more than 2 GB per core and I'm
>> >>>> >> providing it 8
>> >>>> >> GB usually (albiet, to no use).
>> >>>> >>
>> >>>> >> Best,
>> >>>> >>
>> >>>> >>
>> >>>> >> On Sat, Jan 16, 2016 at 1:10 PM, Marcin Dulak <mdul at dtu.dk>
>> wrote:
>> >>>> >>>
>> >>>> >>> Hi,
>> >>>> >>>
>> >>>> >>> are you one this cluster?
>> >>>> >>> https://doc.itc.rwth-aachen.de/display/CC/r_memusage
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> https://doc.itc.rwth-aachen.de/display/CC/Resource+limitations+on+dialog+systems
>> >>>> >>> It may be that the batch system (LSF) kills your jobs that exceed
>> >>>> >>> given
>> >>>> >>> resident memory.
>> >>>> >>> The two links above may help you to diagnose that.
>> >>>> >>> I recall GPAW's memory estimate is not very accurate for standard
>> >>>> >>> ground-state, PW or GRID mode jobs
>> >>>> >>> (~20%) and may be very inaccurate (order of magnitude) for VDW or
>> >>>> >>> LCAO
>> >>>> >>> jobs (Ask correct me if this is not the case anymore).
>> >>>> >>>
>> >>>> >>> Best regards,
>> >>>> >>>
>> >>>> >>> Marcin
>> >>>> >>> _______________________________________________
>> >>>> >>> gpaw-users mailing list
>> >>>> >>> gpaw-users at listserv.fysik.dtu.dk
>> >>>> >>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> || radhe radhe ||
>> >>>> >>
>> >>>> >> abhishek
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > || radhe radhe ||
>> >>>> >
>> >>>> > abhishek
>> >>>> >
>> >>>> > _______________________________________________
>> >>>> > gpaw-users mailing list
>> >>>> > gpaw-users at listserv.fysik.dtu.dk
>> >>>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> || radhe radhe ||
>> >>>
>> >>> abhishek
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> || radhe radhe ||
>> >>
>> >> abhishek
>> >
>> >
>> >
>> >
>> > --
>> > || radhe radhe ||
>> >
>> > abhishek
>> >
>> > _______________________________________________
>> > gpaw-users mailing list
>> > gpaw-users at listserv.fysik.dtu.dk
>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>
>
>
> --
> || radhe radhe ||
>
> abhishek
>



-- 
|| radhe radhe ||

abhishek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20160121/8913292c/attachment-0001.html>


More information about the gpaw-users mailing list