[gpaw-users] Calculation freezes after memory estimate
Juho Arjoranta
juho.arjoranta at helsinki.fi
Wed Jul 4 08:37:15 CEST 2012
Lainaus "Marcin Dulak" <Marcin.Dulak at fysik.dtu.dk>:
> On 07/03/12 12:17, Juho Arjoranta wrote:
>> Lainaus "Marcin Dulak"<Marcin.Dulak at fysik.dtu.dk>:
>>
>>> On 06/26/12 08:41, Juho Arjoranta wrote:
>>>> Hello,
>>>>
>>>> I have been converging the vacuum around a copper surface with two
>>>> bottom layers fixed and with 4 and 6 layers of copper everything went
>>>> fine. Now I have an 8 layer surface and the calculation freezes after
>>>> the evaluation of memory usage is made. The calculation for the
>>>> surface was restarted from a bulk calculation where the initial state
>>>> for the surface was set. The bulk was relaxed to make sure that the
>>>> structure is really correct.
>>> could freezing be ralated to scalapack?
>>> I remember similar problems with scalapack on, for example
>>> this hangs:
>>> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_pdlasrt_hang.py
>>>
>>> Marcin
>> It would seem that the freezing was related to scalapack. I tried to
>> run the same calculations with the default eigensolver and without
>> scalapack and it seems to be working now. To get the density and wave
>> functions to converge I also made some modifications to the calculator:
> did you get different energies without/with scalapack?
> Do you know which version of scalapack do you use (on our systems
> scalapack version 2.0.1 seems to behave better than 1.8.0)?
> Can you run, on 4 cores:
> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_mpirecv_crash.py
> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_pdlasrt_hang.py
>
> Marcin
>
I will run some test to see whether the energies are the same with and
without scalapack. I'm not sure what version of scalapack is in use
but I will find it out.
I scalapack_mpirecv_crash.py with 4 cores and it crashed just before
the memory estimate with a good old segmentation fault:
_pmii_daemon(SIGCHLD): [NID 01792] [c7-0c0s0n0] [Tue Jul 3 16:36:39
2012] PE 2 exit signal Segmentation fault
[NID 01792] 2012-07-03 16:36:39 Apid 4555952: initiated application
termination
Application 4555952 exit codes: 139
Application 4555952 resources: utime 0, stime 0
The other one, scalapack_pdlasrt_hang.py, has been running over night
(17 hours now) and I'm not sure if it is frozen or not? How long
should it take to finish with 4 cores?
Juho
>
>>
>> surface, calc = restart('bulk-8-initial.gpw',
>> txt = name + '.txt',
>> xc = 'PBE',
>> basis = 'szp(dzp)',
>> mixer=Mixer(beta=0.1, nmaxold=2, weight=100.0),
>> poissonsolver=PoissonSolver(nn=3, relax='GS',
>> eps=1e-12),
>> parallel={'domain': (1,2,7)},
>> kpts = (8, 8, 1),
>> h = 0.12)
>>
>> And for a different (but similiar system) I also increased the maximum
>> number of iterations for the Poisson solver as explained in this
>> thread:
>> https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2010-October/000408.html
>>
>> Juho
>>
>>>> Initially when I used the same calculator options as for the 6 layer
>>>> surface the calculation for 8 layers froze. After that I have tried
>>>> less agressive mixing, different Poisson solvers, extracting the
>>>> single-zeta polarization basis set from the double-zeta polarization
>>>> basis sets, and eventually even a different eigensolver.
>>>>
>>>> These calculation were run at Louhi and the last one I tried was this one:
>>>>
>>>> from ase.parallel import paropen
>>>> from gpaw import restart, Mixer
>>>> from gpaw.poisson import PoissonSolver
>>>> from ase.constraints import FixAtoms
>>>> from ase.optimize import QuasiNewton
>>>> from numpy import zeros
>>>>
>>>> resultfile = paropen('8-layer-cg-results.txt', 'w')
>>>>
>>>> for vac in range(5, 15):
>>>>
>>>> name = '8-layer-cg-%.i-vacuum' % vac
>>>>
>>>> k = 8
>>>> a = 3.643 # Lattice constant from the convergence
>>>> calculations
>>>>
>>>> surface, calc = restart('bulk-8-initial.gpw',
>>>> txt = name + '.txt',
>>>> eigensolver = 'cg',
>>>> mixer=Mixer(beta=0.1, nmaxold=2, weight=100.0),
>>>> poissonsolver=PoissonSolver(nn=3, relax='J'),
>>>> basis='szp(dzp)',
>>>> parallel={'sl_default': (5,1,64),
>>>> 'domain': (1,1,5)},
>>>> kpts = (k, k, 1))
>>>>
>>>> # Fixing of the two bottom layers
>>>>
>>>> fix = 2
>>>>
>>>> positions = surface.get_positions()
>>>> number = positions.size / 3
>>>>
>>>> array = zeros([number])
>>>> for i in range(0, number):
>>>> if (surface.positions[i][2]< fix * a / 2):
>>>> array[i] = 1
>>>> surface.set_tags(array)
>>>> c = FixAtoms(mask=[atom.tag == 1 for atom in surface])
>>>> surface.set_constraint(c)
>>>>
>>>> surface.pbc = (1, 1, 0) # Set PBC
>>>> false in z-direction
>>>> surface.center(axis = 2, vacuum = vac) # Center the
>>>> system in z-direction and add vacuum
>>>>
>>>> surface.set_calculator(calc)
>>>>
>>>> # Relaxation of the surface
>>>>
>>>> relax = QuasiNewton(surface, trajectory = name + '.traj')
>>>> relax.run(fmax = 0.05)
>>>>
>>>> Equivalent code worked with 4 and 6 layers just fine. Then of course
>>>> there were no need for different eigensolver or basis etc. For some
>>>> reason the only error message I get, is Louhi sending me this to my
>>>> email:
>>>>
>>>> PBS Job Id: 1098436.sdb
>>>> Job Name: 8-layer-cg
>>>> Post job file processing error; job 1098436.sdb on host nid00143
>>>>
>>>>
>>>> Unable to copy file /var/spool/PBS/spool/1098436.sdb.OU to
>>>> nid00139://wrk/arjoran/8-layer-cg.o1098436
>>>> error from copy
>>>> nid00139: Connection refused
>>>> end error output
>>>> Output retained on that host in: /var/spool/PBS/undelivered/1098436.sdb.OU
>>>>
>>>> Any ideas how the get that running?
>>>>
>>>> Juho Arjoranta
>>>>
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>
>>>
>>>
>>
>
>
> --
> ***********************************
>
> Marcin Dulak
> Technical University of Denmark
> Department of Physics
> Building 307, Room 229
> DK-2800 Kongens Lyngby
> Denmark
> Tel.: (+45) 4525 3157
> Fax.: (+45) 4593 2399
> email: Marcin.Dulak at fysik.dtu.dk
>
> ***********************************
>
>
More information about the gpaw-users
mailing list