[gpaw-users] mpi-problem
Torsten Hahn
der.hahn.torsten at googlemail.com
Thu Jan 20 10:09:06 CET 2011
Dear Jussi,
i attached a minimum example which shows the problem. The error occurs in most cases right after the first scf-cycle is finished during structure optimization.
I use:
gpaw: svn Revision: 7592
ase: svn Revision: 1953
MPI versions tested:
- openmpi 1.4.1 / 64bit / intel compiler(s)
- intelmpi 3.2.2
- mpich2 (some recent version, i dont know for sure)
All show more or less the same errors ...
Best regards,
Torsten.
Am 20.01.2011 um 08:50 schrieb Jussi Enkovaara:
> On 2011-01-20 09:36, Torsten Hahn wrote:
>> Dear all,
>>
>> using GPAW with "small" jobs in parallel works fine, but running "heavy" jobs always cause the following error:
>>
>> =========
>> [node123:1404] *** An error occurred in MPI_Wait
>> [node123:1404] *** on communicator MPI COMMUNICATOR 3 CREATE FROM 0
>> [node123:1404] *** MPI_ERR_TRUNCATE: message truncated
>> [node123:1404] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 24 with PID 1395 on
>> node node123.cm.cluster exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>> =========
>>
>> There is always an MPI_ERR_TRUNCATE event. I tried with intel-mpi as well as open mpi. Does anybody know where this kind of error might come from?
>
> Dear Torsten,
> in most cases I have seen, the errors like above point to a problem in the MPI library. However, the fact that you get the same error with two different MPI implementations indicates that in this case the problem might actually be in GPAW. Could you provide the input data which generates the above error, so we could try to investigate the problem further?
>
> Best regards,
> Jussi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CoPc.cif
Type: application/octet-stream
Size: 4410 bytes
Desc: not available
Url : http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20110120/35ffd78b/attachment-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pc_gs_rl.py
Type: text/x-python-script
Size: 1736 bytes
Desc: not available
Url : http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20110120/35ffd78b/attachment-0001.bin
More information about the gpaw-users
mailing list