[gpaw-users] Test failure
Ask Hjorth Larsen
asklarsen at gmail.com
Wed Aug 14 13:42:46 CEST 2019
Hi Jay,
Am Mi., 14. Aug. 2019 um 12:35 Uhr schrieb Jay Wai <jaywai412 at gmail.com>:
>
> Hi,
>
> Thank you.
>
> The problem is resolved and the serical code test is now carried out well.
> It was not caused by gpaw itself.
> After that, I have been running the parallel code test, but have not succeded yet.
>
> When running the test code with 68 cores, the following error message shows up:
> linalg/zher.py 0.468 OK
> fd_ops/gd.py 0.185 FAILED! (rank 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67)
> gpaw.grid_descriptor.BadGridError: Grid 48x48x48 too small for 1x1x68 cores!
>
> Running with 4 or 8 cores, the test seems to proceed normally, showing OK messages, but at some point the whole test code stops as shown below:
>
The test suite is meant for 1, 2, 4, or 8 cores.
> parallel/fd_parallel.py 84.036 OK
>
> solvation/poisson.py 18.443 OK
>
> solvation/water_water.py 22.588 OK
>
> xc/pygga.py 72.644 OK
>
> pseudopotential/atompaw.py (Here, no message is printed and the test stops without being exited)
>
>
> The latter case (4 or 8 cores) seems to be a system-related issue, but as for the former case, I guess gpaw users could give me a clue.
>
>
> Actually, my simple python script has been successfully executed with 68 cores, but the calculation time was reduced only by half compared to that in my pc with 4 cores.
>
> Should I adjust the parallelization option when calling GPAW function to efficiently use such a large number of cores?
>
> Or, does this also seem to be related to the issues mentioned above?
Your system is probably very small. You only gain from adding many
cores in systems of a certain size.
The documentation on parallelization might help you.
Best regards
Ask
>
>
> Best,
>
> Jay
>
>
> 2019년 8월 12일 (월) 오후 9:11, Ask Hjorth Larsen <asklarsen at gmail.com>님이 작성:
>>
>> Hi,
>>
>> Am Fr., 9. Aug. 2019 um 20:28 Uhr schrieb Jay Wai via gpaw-users
>> <gpaw-users at listserv.fysik.dtu.dk>:
>> >
>> > Hello all,
>> >
>> > I’ve just installed gpaw-19.8.1 with scalapack and fftw into a CentOS machine.
>> > There were no error or warning messages during the installation, but I have difficulties in figuring out what causes the following problems in after-installation steps:
>> >
>> > 1. ‘gpaw test’ run stops in the lcao/lcao_projections.py part showing a weird message ‘killed’ as follows:
>> > pw/fulldiagk.py 3.026 OK
>> > ext_potential/external.py 2.956 OK
>> > ext_potential/external_pw.py 4.194 OK
>> > lcao/atomic_corrections.py 0.000 SKIPPED
>> > vdw/libvdwxc_h2.py 0.000 SKIPPED
>> > generic/mixer.py 2.478 OK
>> > lcao/lcao_projections.py Killed
>> >
>> > Is there any case in which gpaw internally kill the test process? Or should I ask the system manager?
>>
>> GPAW won't kill it, so this must be some other program.
>>
>> Try reproducing the error using only that file, by finding it (gpaw
>> test --list | grep lcao_proj) and running it manually with mpirun -np
>> N gpaw-python thefile.py.
>>
>> Then see if it fails with or without MPI, and with 1, 2, 4 processes.
>>
>> Best regards
>> Ask
>>
>> >
>> > 2. ‘gpaw -P 4 test’ stops right away showing following error messages
>> > gpaw-python: symbol lookup error: /apps/compiler/gcc/7.2.0/lib/libmca_common_verbs.so.7: undefined symbol: ompi_common_verbs_usnic_register_fake_drivers’
>> > Primary job O-direction, the job has been aborted.
>> > mpiexec detected that one or more processes exited with non-zero status, thus causing
>> > the job to be terminated. The first process to do so was:
>> > Process name: [[13827,1],0]
>> > Exit code: 127
>> >
>> > Openmpi 3.x are installed in the system. Does that message something to do with the compile setting of openmpi?
>> >
>> > I have struggled with this problem for a few days.
>> > It would be grateful if someone could help me on this.
>> > -Jay
>> > _______________________________________________
>> > gpaw-users mailing list
>> > gpaw-users at listserv.fysik.dtu.dk
>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
More information about the gpaw-users
mailing list