[gpaw-users] mpi-problem
Jens Jørgen Mortensen
jensj at fysik.dtu.dk
Thu Jan 20 11:59:54 CET 2011
On Thu, 2011-01-20 at 10:09 +0100, Torsten Hahn wrote:
> Dear Jussi,
>
> i attached a minimum example which shows the problem. The error occurs in most cases right after the first scf-cycle is finished during structure optimization.
On how many cores do you run this calculation?
Jens Jørgen
> I use:
>
> gpaw: svn Revision: 7592
> ase: svn Revision: 1953
>
> MPI versions tested:
>
> - openmpi 1.4.1 / 64bit / intel compiler(s)
> - intelmpi 3.2.2
> - mpich2 (some recent version, i dont know for sure)
>
> All show more or less the same errors ...
>
> Best regards,
> Torsten.
>
>
> Am 20.01.2011 um 08:50 schrieb Jussi Enkovaara:
>
> > On 2011-01-20 09:36, Torsten Hahn wrote:
> >> Dear all,
> >>
> >> using GPAW with "small" jobs in parallel works fine, but running "heavy" jobs always cause the following error:
> >>
> >> =========
> >> [node123:1404] *** An error occurred in MPI_Wait
> >> [node123:1404] *** on communicator MPI COMMUNICATOR 3 CREATE FROM 0
> >> [node123:1404] *** MPI_ERR_TRUNCATE: message truncated
> >> [node123:1404] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> >> --------------------------------------------------------------------------
> >> mpirun has exited due to process rank 24 with PID 1395 on
> >> node node123.cm.cluster exiting without calling "finalize". This may
> >> have caused other processes in the application to be
> >> terminated by signals sent by mpirun (as reported here).
> >> --------------------------------------------------------------------------
> >> =========
> >>
> >> There is always an MPI_ERR_TRUNCATE event. I tried with intel-mpi as well as open mpi. Does anybody know where this kind of error might come from?
> >
> > Dear Torsten,
> > in most cases I have seen, the errors like above point to a problem in the MPI library. However, the fact that you get the same error with two different MPI implementations indicates that in this case the problem might actually be in GPAW. Could you provide the input data which generates the above error, so we could try to investigate the problem further?
> >
> > Best regards,
> > Jussi
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
More information about the gpaw-users
mailing list