[gpaw-users] general comment on memory leaks.
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Jan 22 12:07:12 CET 2016
abhishek khetan askhetan at gmail.com wrote:
> The only things that could've gone wrong was the inter-node communication.
> I got this tip from the Cluster admin
> _______________________________________________________________________________
> -try out to disable the InfiniBand transport and fallback to IP_over_IB,
> for Open MPI
> export OMPI_MCA_btl="^openib"
> export OMPI_MCA_btl_tcp_if_exclude="ib0,lo"
> No failures any more? strange, let me know. Less failures, or maybe more
> failures? 'race condition' is very likely!
It's really bad if you disable the usage of Infiniband, because it is
*much* faster than Ethernet. IP over Infiniband should be almost full
speed, but less than optimal.
I would ask which version of OpenMPI you are using? If it's quite old,
then your problems could be due to OpenMPI bugs that have been solved a
long time ago. FYI, the latest version of OpenMPI is 1.10.2.
Problems could also arise if your Infiniband adapters use old drivers,
or have old firmware installed. Please ask your sysadmin which
hardware, firmware and drivers they have installed.
/Ole
--
Ole Holm Nielsen
PhD, Manager of IT services
Department of Physics, Technical University of Denmark
More information about the gpaw-users
mailing list