[gpaw-users] Problem with compiling parallel GPAW

Michalsky Ronald michalskyr at ethz.ch
Fri Jul 25 21:56:08 CEST 2014


Yes, “bse_MoS2_cut.py” hangs; it does not yield an actual error.

I’m not sure how to test the OpenMPI but I’ve tried using the OpenMPI provided by the cluster:

Edits: in .bashrc:

module load open_mpi/1.6.2
#export MPIDIR=/cluster/home03/mavt/ronaldm/openmpi-1.8.1.gfortran
export MPIDIR=/cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2
#export PATH=/cluster/home03/mavt/ronaldm/openmpi-1.8.1.gfortran/bin/mpicc:$PATH
export PATH=/cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/bin/mpicc:$PATH
export LD_LIBRARY_PATH=/cluster/apps/gcc/gcc472/lib64:$LD_LIBRARY_PATH # for libquadmath.so.0
export LD_LIBRARY_PATH=/cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ess_lsf:$LD_LIBRARY_PATH # due to warning, that was though not prevented
export LD_LIBRARY_PATH=/cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_plm_lsf:$LD_LIBRARY_PATH # due to warning, that was though not prevented
export LD_LIBRARY_PATH=/cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ras_lsf:$LD_LIBRARY_PATH # due to warning, that was though not prevented

Edits: in .bash_profile:

module load python/2.7.2 netcdf/4.1.3 #######
#module load python/2.7.2 netcdf/4.3.0 #######
module load gcc/4.7.2
module load open_mpi/1.6.2
export NETCDF=/cluster/apps/netcdf/4.1.3/x86_64/serial/gcc_4.7.2/lib #######
#export NETCDF=/cluster/apps/netcdf/4.3.0/x86_64/gcc_4.7.2/openmpi_1.6.2/lib #######

>>> Still: gpaw-python `which gpaw-test` 2>&1 | tee test.log
>>> hangs at “bse_MoS2_cut.py”:

[brutus3.ethz.ch:35995] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ess_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus3.ethz.ch:36084] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ess_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus3.ethz.ch:36084] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_plm_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus3.ethz.ch:36084] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ras_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    brutus3.ethz.ch
  OMPI source:   btl_openib.c:190
  Function:      ibv_create_cq()
  Device:        mlx4_0
  Memlock limit: 2097152

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          brutus3.ethz.ch (PID 35995)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
python 2.7.2 GCC 4.4.6 20110731 (Red Hat 4.4.6-3) 64bit ELF on Linux x86_64 centos 6.5 Final
Running tests in /tmp/gpaw-test-Bh4RnJ
Jobs: 1, Cores: 1, debug-mode: False
=============================================================================
gemm_complex.py                  0.018  OK
[…]

>>> and: mpirun -np 2 gpaw-python -c "import gpaw.mpi as mpi; print mpi.rank"
>>> yields:

[brutus3.ethz.ch:35216] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ess_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus3.ethz.ch:35216] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_plm_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus3.ethz.ch:35216] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ras_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    brutus3.ethz.ch
  OMPI source:   btl_openib.c:190
  Function:      ibv_create_cq()
  Device:        mlx4_0
  Memlock limit: 2097152

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
0
1
[brutus3.ethz.ch:35216] 1 more process has sent help message help-mpi-btl-openib.txt / init-fail-no-mem
[brutus3.ethz.ch:35216] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

>>> while: mpiexec -np 4 gpaw-python `which gpaw-test` 2>&1 | tee test4.log
>>> yields:

[brutus2.ethz.ch:48094] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ess_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus2.ethz.ch:48094] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_plm_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
[brutus2.ethz.ch:48094] mca: base: component_find: unable to open /cluster/apps/openmpi/1.6.2/x86_64/gcc_4.7.2/lib/openmpi/mca_ras_lsf: libbat.so: cannot open shared object file: No such file or directory (ignored)
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:

  Local host:    brutus2.ethz.ch
  OMPI source:   btl_openib.c:190
  Function:      ibv_create_cq()
  Device:        mlx4_0
  Memlock limit: 2097152

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          brutus2.ethz.ch (PID 48100)
  MPI_COMM_WORLD rank: 0

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
python 2.7.2 GCC 4.4.6 20110731 (Red Hat 4.4.6-3) 64bit ELF on Linux x86_64 centos 6.5 Final
Running tests in /tmp/gpaw-test-0Z2zMG
Jobs: 1, Cores: 4, debug-mode: False
=============================================================================
gemm_complex.py                         0.034  OK
mpicomm.py                              0.032  OK
ase3k_version.py                        0.029  OK
numpy_core_multiarray_dot.py            0.029  OK
eigh.py                                 0.044  OK
lapack.py                               0.026  OK
dot.py                                  0.022  OK
lxc_fxc.py                              0.029  OK
blas.py                                 0.032  OK
erf.py                                  0.020  OK
gp2.py                                  0.022  OK
kptpar.py                          [brutus2.ethz.ch:48094] 3 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-mem
[brutus2.ethz.ch:48094] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[brutus2.ethz.ch:48094] 3 more processes have sent help message help-mpi-runtime.txt / mpi_init:warn-fork
     6.769  OK
non_periodic.py                         0.026  OK
parallel/blacsdist.py                   0.030  OK
gradient.py                             0.038  OK
cg2.py                                  0.045  OK
kpt.py                                  0.024  OK
lf.py                                   0.022  OK
gd.py                                   0.017  OK
parallel/compare.py                     0.242  OK
pbe_pw91.py                             0.031  OK
fsbt.py                                 0.022  OK
derivatives.py                          0.035  OK
Gauss.py                                0.053  OK
second_derivative.py                    0.041  OK
integral4.py                            0.046  OK
parallel/ut_parallel.py                 1.170  OK
transformations.py                      0.046  OK
parallel/parallel_eigh.py               0.036  OK
spectrum.py                             0.303  OK
xc.py                                   0.099  OK
zher.py                                 0.063  OK
pbc.py                                  0.061  OK
lebedev.py                              0.051  OK
parallel/ut_hsblacs.py                  0.346  OK
occupations.py                          0.093  OK
dump_chi0.py                            0.098  OK
cluster.py                              0.271  OK
pw/interpol.py                          0.032  OK
poisson.py                              0.078  OK
pw/lfc.py                               0.291  OK
pw/reallfc.py                           0.405  OK
XC2.py                                  0.105  OK
multipoletest.py                        0.406  OK
nabla.py                                0.132  OK
noncollinear/xccorr.py                  0.600  OK
gauss_wave.py                           0.612  OK
harmonic.py                             0.214  OK
atoms_too_close.py                      0.256  OK
screened_poisson.py                     0.206  OK
yukawa_radial.py                        0.050  OK
noncollinear/xcgrid3d.py                0.197  OK
vdwradii.py                             2.275  OK
lcao_restart.py                         0.500  OK
ase3k.py                                0.841  OK
parallel/ut_kptops.py                   2.128  OK
fileio/idiotproof_setup.py              0.682  OK
fileio/hdf5_simple.py                   0.037  SKIPPED
fileio/hdf5_noncontiguous.py            0.032  SKIPPED
fileio/parallel.py                      6.487  OK
timing.py                               1.019  OK
coulomb.py                              0.572  OK
xcatom.py                               1.585  OK
proton.py                               0.492  OK
keep_htpsit.py                          2.049  OK
pw/stresstest.py                        2.313  OK
aeatom.py                               4.282  OK
numpy_zdotc_graphite.py                 1.822  OK
lcao_density.py                         0.845  OK
parallel/overlap.py                     0.978  OK
restart.py                              1.973  OK
gemv.py                                 3.267  OK
ylexpand.py                             1.659  OK
wfs_io.py                               1.775  OK
fixocc.py                               4.275  OK
nonselfconsistentLDA.py                 2.343  OK
gga_atom.py                             3.069  OK
ds_beta.py                              2.794  OK
gauss_func.py                           0.894  OK
noncollinear/h.py                       1.997  OK
symmetry.py                             3.726  OK
usesymm.py                              2.058  OK
broydenmixer.py                         4.086  OK
mixer.py                                4.097  OK
wfs_auto.py                             1.659  OK
ewald.py                                5.067  OK
refine.py                               1.503  OK
revPBE.py                               2.956  OK
nonselfconsistent.py                    3.267  OK
hydrogen.py                             2.043  OK
fileio/file_reference.py                4.062  OK
fixdensity.py                           3.996  OK
bee1.py                                 2.605  OK
spinFe3plus.py                          5.160  OK
pw/h.py                                 6.014  OK
stdout.py                               3.986  OK
parallel/lcao_complicated.py           10.440  OK
pw/slab.py                              9.166  OK
spinpol.py                              4.213  OK
plt.py                                  3.010  OK
eed.py                                  2.512  OK
lrtddft2.py                             1.868  OK
parallel/hamiltonian.py                 2.207  OK
ah.py                                   3.870  OK
laplace.py                              0.034  OK
pw/mgo_hybrids.py                      12.123  OK
lcao_largecellforce.py                  3.160  OK
restart2.py                             3.530  OK
Cl_minus.py                             7.130  OK
fileio/restart_density.py              10.020  OK
external_potential.py                   2.266  OK
pw/bulk.py                              8.433  OK
pw/fftmixer.py                          1.472  OK
mgga_restart.py                         3.698  OK
vdw/quick.py                           11.911  OK
partitioning.py                         8.147  OK
bulk.py                                12.549  OK
elf.py                                  6.431  OK
aluminum_EELS.py                        5.776  OK
H_force.py                              5.329  OK
parallel/lcao_hamiltonian.py            8.331  OK
fermisplit.py                           5.849  OK
parallel/ut_redist.py                  15.330  OK
lcao_h2o.py                             2.746  OK
cmrtest/cmr_test2.py                    5.257  OK
h2o_xas.py                              5.350  OK
ne_gllb.py                             10.771  OK
exx_acdf.py                             6.887  OK
ut_rsh.py                               2.145  OK
ut_csh.py                               2.205  OK
spin_contamination.py                   6.899  OK
davidson.py                             9.190  OK
cg.py                                   7.258  OK
gllbatomic.py                          22.057  OK
lcao_force.py                          12.172  OK
fermilevel.py                          12.780  OK
h2o_xas_recursion.py                    7.836  OK
excited_state.py                   /cluster/home03/mavt/ronaldm/gpaw/gpaw/lrtddft/excited_state.py:176: RuntimeWarning: divide by zero encountered in remainder
  if (i % ncalcs) == icalc:
/cluster/home03/mavt/ronaldm/gpaw/gpaw/lrtddft/excited_state.py:176: RuntimeWarning: divide by zero encountered in remainder
  if (i % ncalcs) == icalc:
/cluster/home03/mavt/ronaldm/gpaw/gpaw/lrtddft/excited_state.py:176: RuntimeWarning: divide by zero encountered in remainder
  if (i % ncalcs) == icalc:
/cluster/home03/mavt/ronaldm/gpaw/gpaw/lrtddft/excited_state.py:176: RuntimeWarning: divide by zero encountered in remainder
  if (i % ncalcs) == icalc:
     2.423  FAILED! (rank 0,1,2,3)
#############################################################################
RANK 0,1,2,3:
Traceback (most recent call last):
  File "/cluster/home03/mavt/ronaldm/gpaw/gpaw/test/__init__.py", line 514, in run_one
    execfile(filename, loc)
  File "/cluster/home03/mavt/ronaldm/gpaw/gpaw/test/excited_state.py", line 39, in <module>
    forces = exst.get_forces(H2)
  File "/cluster/home03/mavt/ronaldm/gpaw/gpaw/lrtddft/excited_state.py", line 176, in get_forces
    if (i % ncalcs) == icalc:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
#############################################################################
gemm.py                                18.840  OK
rpa_energy_Ni.py                       15.612  OK
LDA_unstable.py                        20.611  OK
si.py                                   7.939  OK
blocked_rmm_diis.py                     5.108  OK
lxc_xcatom.py                          22.004  OK
gw_planewave.py                         9.671  OK
degeneracy.py                           9.305  OK
apmb.py                                 9.745  OK
vdw/potential.py                        0.041  OK
al_chain.py                            13.172  OK
relax.py                               12.406  OK
fixmom.py                              10.716  OK
CH4.py                                 12.639  OK
diamond_absorption.py                  12.034  OK
simple_stm.py                          22.820  OK
gw_method.py                           18.060  OK
lcao_bulk.py                           14.978  OK
constant_electric_field.py             15.148  OK
parallel/ut_invops.py                  14.622  OK
parallel/lcao_projections.py            7.506  OK
guc_force.py                           24.147  OK
test_ibzqpt.py                         16.916  OK
aedensity.py                           16.705  OK
fd2lcao_restart.py                     23.504  OK
lcao_bsse.py                            9.762  OK
pplda.py                               33.445  OK
revPBE_Li.py                           35.922  OK
si_primitive.py                        17.008  OK
complex.py                             10.706  OK
Hubbard_U.py                           30.096  OK
ldos.py                                38.696  OK
parallel/ut_hsops.py                   58.648  OK
pw/hyb.py                              35.228  OK
hgh_h2o.py                             13.953  OK
vdw/quick_spin.py                      52.983  OK
scfsic_h2.py                           16.912  OK
lrtddft.py                             31.377  OK
dscf_lcao.py                           24.880  OK
IP_oxygen.py                           36.677  OK
Al2_lrtddft.py                         22.078  OK
rpa_energy_Si.py                       32.897  OK
2Al.py                                 29.983  OK
jstm.py                                19.824  OK
tpss.py                                42.180  OK
be_nltd_ip.py                          17.267  OK
si_xas.py                              30.602  OK
atomize.py                             37.679  OK
chi0.py                               162.184  OK
Cu.py                                  48.385  OK
restart_band_structure.py              41.443  OK
ne_disc.py                             36.857  OK
exx_coarse.py                          32.540  OK
exx_unocc.py                            8.875  OK
Hubbard_U_Zn.py                        41.355  OK
diamond_gllb.py                        73.924  OK
h2o_dks.py                             70.194  OK
aluminum_EELS_lcao.py                  26.350  OK
gw_ppa.py                              21.388  OK
gw_static.py                           10.340  OK
exx.py                                 30.104  OK
pygga.py                               89.949  OK
dipole.py                              39.430  OK
nsc_MGGA.py                            48.030  OK
mgga_sc.py                             39.634  OK
MgO_exx_fd_vs_pw.py                    79.017  OK
lb94.py                               170.687  OK
8Si.py                                 42.790  OK
td_na2.py                              33.487  OK
ehrenfest_nacl.py                      19.010  OK
rpa_energy_N2.py                      149.873  OK
beefvdw.py                            114.234  OK
nonlocalset.py                        108.888  OK
wannierk.py                            68.999  OK
rpa_energy_Na.py                       79.209  OK
pw/si_stress.py                       139.633  OK
ut_tddft.py                            69.103  OK
transport.py                          163.823  OK
vdw/ar2.py                            120.665  OK
aluminum_testcell.py                  162.975  OK
au02_absorption.py                    231.149  OK
lrtddft3.py                           212.349  OK
scfsic_n2.py                          146.630  OK
parallel/lcao_parallel.py               8.294  OK
parallel/fd_parallel.py                 6.368  OK
bse_aluminum.py                        16.704  OK
bse_diamond.py                         47.535  OK
bse_vs_lrtddft.py                     116.312  OK
parallel/pblas.py                       0.051  OK
parallel/scalapack.py                   0.041  OK
parallel/scalapack_diag_simple.py       0.055  OK
parallel/scalapack_mpirecv_crash.py    11.156  OK
parallel/realspace_blacs.py             0.050  OK
AA_exx_enthalpy.py                    303.890  OK
cmrtest/cmr_test.py                     0.383  SKIPPED
cmrtest/cmr_test3.py                    0.025  SKIPPED
cmrtest/cmr_test4.py                    0.031  SKIPPED
cmrtest/cmr_append.py                   0.016  SKIPPED
cmrtest/Li2_atomize.py                  0.017  SKIPPED
=============================================================================
Ran 230 tests out of 237 in 4885.5 seconds
Tests skipped: 7
Tests failed: 1
=============================================================================

>>> Trying addtitinoally: in .bashrc & .bash_profile:

#module load python/2.7.2 netcdf/4.1.3 #######
module load python/2.7.2 netcdf/4.3.0 #######
#export NETCDF=/cluster/apps/netcdf/4.1.3/x86_64/serial/gcc_4.7.2/lib #######
export NETCDF=/cluster/apps/netcdf/4.3.0/x86_64/gcc_4.7.2/openmpi_1.6.2/lib #######

Serial: yields the same hanging at “bse_MoS2_cut.py”
Parallel: The same error messages as above


More information about the gpaw-users mailing list