[gpaw-users] MPI Error, Fatal Error in PMPI_Comm_dup
Jens Jørgen Mortensen
jjmo at dtu.dk
Mon Aug 5 12:23:18 CEST 2019
On 8/1/19 10:38 AM, Ali Malik via gpaw-users wrote:
> Dear gpaw-users,
>
> I have been doing calculations on slabs of many systems including
> Cr2AlC, Cr2GaC etc. Recently, I have been facing the MPI error, which
> seems to occur randomly during the execution of the job. Sometimes, the
> jobs get completed, but most of the time, I get this MPI error, which
> terminates the job. I have been unable to identify the root cause of
> this error.
>
> I am running calculations on HPC cluster, using gpaw-1.5.2,
> intelmpi-2018.4, python-3.7.2, scalapack-2.0.2. The HPC support desk
> told me the error is due to the probable bug in
> gpaw.logger(gpaw/io/logger.py lines 32-46). Their response:
>
> "when you call calc.set(txt = "...") the old logfile is not closed
> properly, only a new one is created. I suspect that you reach the limit
> of concurrently open files".
Does it work OK if you comment out the "calc.set(txt=...)" line?
Jens Jørgen
>
> The input script and error output file is attached. If you need anything
> else, please feel free to ask. Any help to debug the issue would be
> highly appreciated. Or should I report this in bug tracker? Thanks
>
> Here is the function*gpaw_optimize*, used in the input script which is
> just a wrapper:
>
> def gpaw_optimize(atoms, calc, relax='', fmax=0.01, relaxalgorithm=
> "BFGS", mask=None, attach=False, gpawwrite="", verbose=True, **alargs):
> """
> wrapper function for relaxation
>
> :param atoms: ase atom object
> :param calc: Calculator object
> :param relax: string (cell, full, "") , type of relaxation,
> :param fmax: number, force criteria
> :param relaxalgorithm: relax algorithm
> :param attach: bool, default False
> :param verbose: bool, default True
> :return: atoms object
> """
>
> if not attach:
>
> atoms.set_calculator(calc)
>
> if verbose:
>
> parprint("attaching the calculator", flush=True)
>
> if atoms.get_calculator() is None: # recheck
>
> if verbose:
>
> parprint("The Calculator is not attached", flush=True)
>
> atoms.set_calculator(calc)
> if verbose:
>
> parprint("It has been attached", flush=True)
> attach=True
>
>
>
> optimizer_algorithms = {"QuasiNewton": QuasiNewton, "BFGS": BFGS,
> "CG": CG, "ScBFGS": ScBFGS, "BFGSLS": BFGSLS} # relaxation algorithms
>
> if relaxalgorithm in optimizer_algorithms:
> pass
> else:
>
> raise KeyError("The %s is invalid or not found.\n The
> available algorithms are: %s"
> % ( relaxalgorithm,
> optimizer_algorithms.values()) )
>
> #TODO: single relax statement outside if.
>
> if relax == 'full':
>
> uf = UnitCellFilter(atoms, mask=mask)
> relax = optimizer_algorithms[relaxalgorithm](uf,
> logfile="rel-all.log", **alargs)
>
> if verbose:
>
> parprint("Full relaxation", flush=True)
>
>
> elif relax == 'cell':
>
> cf = StrainFilter(atoms, mask=mask)
> relax = optimizer_algorithms[relaxalgorithm](cf,
> logfile="rel-cell.log", **alargs)
>
> if verbose:
>
> parprint("Cell relaxation only", flush=True)
>
>
> elif relax == 'ions': # ionic_relaxation
>
>
> relax = optimizer_algorithms[relaxalgorithm](atoms,
> logfile="rel-ionic.log", **alargs)
>
> if verbose:
>
> parprint("Ions relaxation only", flush=True)
>
> else:
>
> raise RelaxationTypeException("The entered relaxation string is
> incorrect")
>
>
> relax.run(fmax=fmax)
>
> if gpawwrite: # last state only
>
> calc.write(gpawwrite, mode="all")
>
> return atoms
>
>
> Best Regards,
>
> Ali Muhammad Malik
>
>
>
>
>
>
>
>
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
More information about the gpaw-users
mailing list