[gpaw-users] Test failure

Jay Wai jaywai412 at gmail.com
Wed Aug 14 12:34:47 CEST 2019


Hi,

Thank you.

The problem is resolved and the serical code test is now carried out well.
It was not caused by gpaw itself.
After that, I have been running the parallel code test, but have not
succeded yet.

When running the test code with 68 cores, the following error message shows
up:
linalg/zher.py 0.468 OK
fd_ops/gd.py 0.185 FAILED! (rank
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67)
gpaw.grid_descriptor.BadGridError: Grid 48x48x48 too small for 1x1x68 cores!

Running with 4 or 8 cores, the test seems to proceed normally, showing OK
messages, but at some point the whole test code stops as shown below:

parallel/fd_parallel.py 84.036 OK

solvation/poisson.py 18.443 OK

solvation/water_water.py 22.588 OK

xc/pygga.py 72.644 OK

pseudopotential/atompaw.py (Here, no message is printed and the test stops
without being exited)


The latter case (4 or 8 cores) seems to be a system-related issue, but as
for the former case, I guess gpaw users could give me a clue.


Actually, my simple python script has been successfully executed with 68
cores, but the calculation time was reduced only by half compared to that
in my pc with 4 cores.

Should I adjust the parallelization option when calling GPAW function to
efficiently use such a large number of cores?

Or, does this also seem to be related to the issues mentioned above?


Best,

Jay

2019년 8월 12일 (월) 오후 9:11, Ask Hjorth Larsen <asklarsen at gmail.com>님이 작성:

> Hi,
>
> Am Fr., 9. Aug. 2019 um 20:28 Uhr schrieb Jay Wai via gpaw-users
> <gpaw-users at listserv.fysik.dtu.dk>:
> >
> > Hello all,
> >
> > I’ve just installed gpaw-19.8.1 with scalapack and fftw into a CentOS
> machine.
> > There were no error or warning messages during the installation, but I
> have difficulties in figuring out what causes the following problems in
> after-installation steps:
> >
> > 1. ‘gpaw test’ run stops in the lcao/lcao_projections.py part showing a
> weird message ‘killed’ as follows:
> > pw/fulldiagk.py                               3.026  OK
> > ext_potential/external.py                     2.956  OK
> > ext_potential/external_pw.py                  4.194  OK
> > lcao/atomic_corrections.py                    0.000  SKIPPED
> > vdw/libvdwxc_h2.py                            0.000  SKIPPED
> > generic/mixer.py                              2.478  OK
> > lcao/lcao_projections.py                 Killed
> >
> > Is there any case in which gpaw internally kill the test process? Or
> should I ask the system manager?
>
> GPAW won't kill it, so this must be some other program.
>
> Try reproducing the error using only that file, by finding it (gpaw
> test --list | grep lcao_proj) and running it manually with mpirun -np
> N gpaw-python thefile.py.
>
> Then see if it fails with or without MPI, and with 1, 2, 4 processes.
>
> Best regards
> Ask
>
> >
> > 2. ‘gpaw -P 4 test’ stops right away showing following error messages
> > gpaw-python: symbol lookup error:
> /apps/compiler/gcc/7.2.0/lib/libmca_common_verbs.so.7: undefined symbol:
> ompi_common_verbs_usnic_register_fake_drivers’
> > Primary job  O-direction, the job has been aborted.
> > mpiexec detected that one or more processes exited with non-zero status,
> thus causing
> > the job to be terminated. The first process to do so was:
> >   Process name: [[13827,1],0]
> >   Exit code:    127
> >
> > Openmpi 3.x are installed in the system. Does that message something to
> do with the compile setting of openmpi?
> >
> > I have struggled with this problem for a few days.
> > It would be grateful if someone could help me on this.
> > -Jay
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20190814/048d3437/attachment.html>


More information about the gpaw-users mailing list