[gpaw-users] MPI Error, Fatal Error in PMPI_Comm_dup

Ali Malik malik at mm.tu-darmstadt.de
Mon Aug 5 13:23:47 CEST 2019


Hi,

Nope, it runs into the error before reaching this "cal.set()" statement. 
That's why I am not sure of the cause.

Best Regards,

Ali Muhammad Malik

On 05.08.19 12:23, Jens Jørgen Mortensen wrote:
> On 8/1/19 10:38 AM, Ali Malik via gpaw-users wrote:
>> Dear gpaw-users,
>>
>> I have been doing calculations on slabs of many systems including 
>> Cr2AlC, Cr2GaC etc. Recently, I have been facing the MPI error, which 
>> seems to occur randomly during the execution of the job. Sometimes, 
>> the jobs get completed, but most of the time, I get this MPI error, 
>> which terminates the job. I have been unable to identify the root 
>> cause of this error.
>>
>> I am running calculations on HPC cluster, using gpaw-1.5.2, 
>> intelmpi-2018.4, python-3.7.2, scalapack-2.0.2. The HPC support desk 
>> told me the error is due to the probable bug in 
>> gpaw.logger(gpaw/io/logger.py lines 32-46). Their response:
>>
>>    "when you call calc.set(txt = "...") the old logfile is not closed 
>> properly, only a new one is created. I suspect that you reach the 
>> limit of concurrently open files".
>
> Does it work OK if you comment out the "calc.set(txt=...)" line?
>
> Jens Jørgen
>
>>
>> The input script and error output file is attached. If you need 
>> anything else, please feel free to ask. Any help to debug the issue 
>> would be highly appreciated. Or should I report this in bug tracker? 
>> Thanks
>>
>> Here is the function*gpaw_optimize*, used in the input script which 
>> is just a wrapper:
>>
>> def gpaw_optimize(atoms, calc, relax='', fmax=0.01, relaxalgorithm= 
>> "BFGS", mask=None, attach=False, gpawwrite="", verbose=True, **alargs):
>>      """
>>          wrapper function for relaxation
>>
>>      :param atoms: ase atom object
>>      :param calc:  Calculator object
>>      :param relax: string (cell, full, "") , type of relaxation,
>>      :param fmax: number, force criteria
>>      :param relaxalgorithm: relax algorithm
>>      :param attach: bool, default False
>>      :param verbose: bool, default True
>>      :return: atoms object
>>      """
>>
>>      if not attach:
>>
>>          atoms.set_calculator(calc)
>>
>>          if verbose:
>>
>>              parprint("attaching the calculator", flush=True)
>>
>>      if atoms.get_calculator() is None: # recheck
>>
>>          if verbose:
>>
>>              parprint("The Calculator is not attached", flush=True)
>>
>>          atoms.set_calculator(calc)
>>          if verbose:
>>
>>              parprint("It has been attached", flush=True)
>>          attach=True
>>
>>
>>
>>      optimizer_algorithms = {"QuasiNewton": QuasiNewton, "BFGS": 
>> BFGS, "CG": CG, "ScBFGS": ScBFGS, "BFGSLS": BFGSLS} # relaxation 
>> algorithms
>>
>>      if relaxalgorithm in optimizer_algorithms:
>>          pass
>>      else:
>>
>>          raise KeyError("The %s is invalid or  not found.\n The 
>> available algorithms are: %s"
>>                                 % ( relaxalgorithm, 
>> optimizer_algorithms.values()) )
>>
>>      #TODO: single relax statement outside if.
>>
>>      if relax == 'full':
>>
>>          uf = UnitCellFilter(atoms, mask=mask)
>>          relax = optimizer_algorithms[relaxalgorithm](uf, 
>> logfile="rel-all.log", **alargs)
>>
>>          if verbose:
>>
>>              parprint("Full relaxation", flush=True)
>>
>>
>>      elif relax == 'cell':
>>
>>          cf = StrainFilter(atoms, mask=mask)
>>          relax = optimizer_algorithms[relaxalgorithm](cf, 
>> logfile="rel-cell.log", **alargs)
>>
>>          if verbose:
>>
>>              parprint("Cell relaxation only", flush=True)
>>
>>
>>      elif relax == 'ions':  # ionic_relaxation
>>
>>
>>          relax = optimizer_algorithms[relaxalgorithm](atoms, 
>> logfile="rel-ionic.log", **alargs)
>>
>>          if verbose:
>>
>>              parprint("Ions relaxation only", flush=True)
>>
>>      else:
>>
>>          raise RelaxationTypeException("The entered relaxation string 
>> is incorrect")
>>
>>
>>      relax.run(fmax=fmax)
>>
>>      if gpawwrite:  # last state only
>>
>>          calc.write(gpawwrite, mode="all")
>>
>>      return atoms
>>
>>
>> Best Regards,
>>
>> Ali Muhammad Malik
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> gpaw-users mailing list
>> gpaw-users at listserv.fysik.dtu.dk
>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>


More information about the gpaw-users mailing list