[gpaw-users] Calculation freezes after memory estimate
Marcin Dulak
Marcin.Dulak at fysik.dtu.dk
Wed Jul 4 09:12:56 CEST 2012
On 07/04/12 08:37, Juho Arjoranta wrote:
> Lainaus "Marcin Dulak"<Marcin.Dulak at fysik.dtu.dk>:
>
>> On 07/03/12 12:17, Juho Arjoranta wrote:
>>> Lainaus "Marcin Dulak"<Marcin.Dulak at fysik.dtu.dk>:
>>>
>>>> On 06/26/12 08:41, Juho Arjoranta wrote:
>>>>> Hello,
>>>>>
>>>>> I have been converging the vacuum around a copper surface with two
>>>>> bottom layers fixed and with 4 and 6 layers of copper everything went
>>>>> fine. Now I have an 8 layer surface and the calculation freezes after
>>>>> the evaluation of memory usage is made. The calculation for the
>>>>> surface was restarted from a bulk calculation where the initial state
>>>>> for the surface was set. The bulk was relaxed to make sure that the
>>>>> structure is really correct.
>>>> could freezing be ralated to scalapack?
>>>> I remember similar problems with scalapack on, for example
>>>> this hangs:
>>>> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_pdlasrt_hang.py
>>>>
>>>> Marcin
>>> It would seem that the freezing was related to scalapack. I tried to
>>> run the same calculations with the default eigensolver and without
>>> scalapack and it seems to be working now. To get the density and wave
>>> functions to converge I also made some modifications to the calculator:
>> did you get different energies without/with scalapack?
>> Do you know which version of scalapack do you use (on our systems
>> scalapack version 2.0.1 seems to behave better than 1.8.0)?
>> Can you run, on 4 cores:
>> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_mpirecv_crash.py
>> https://trac.fysik.dtu.dk/projects/gpaw/browser/trunk/gpaw/test/parallel/scalapack_pdlasrt_hang.py
>>
>> Marcin
>>
> I will run some test to see whether the energies are the same with and
> without scalapack. I'm not sure what version of scalapack is in use
> but I will find it out.
>
> I scalapack_mpirecv_crash.py with 4 cores and it crashed just before
> the memory estimate with a good old segmentation fault:
>
> _pmii_daemon(SIGCHLD): [NID 01792] [c7-0c0s0n0] [Tue Jul 3 16:36:39
> 2012] PE 2 exit signal Segmentation fault
> [NID 01792] 2012-07-03 16:36:39 Apid 4555952: initiated application
> termination
> Application 4555952 exit codes: 139
> Application 4555952 resources: utime 0, stime 0
this shows that you use an older scalapack
and most likely some results with/without scalapack will differ.
>
> The other one, scalapack_pdlasrt_hang.py, has been running over night
> (17 hours now) and I'm not sure if it is frozen or not? How long
> should it take to finish with 4 cores?
this hangs with scalapack 2.0.1. I haven't tried 2.0.2 version yet.
There is a ticket open about it now:
https://trac.fysik.dtu.dk/projects/gpaw/ticket/230
Marcin
>
> Juho
>
>>> surface, calc = restart('bulk-8-initial.gpw',
>>> txt = name + '.txt',
>>> xc = 'PBE',
>>> basis = 'szp(dzp)',
>>> mixer=Mixer(beta=0.1, nmaxold=2, weight=100.0),
>>> poissonsolver=PoissonSolver(nn=3, relax='GS',
>>> eps=1e-12),
>>> parallel={'domain': (1,2,7)},
>>> kpts = (8, 8, 1),
>>> h = 0.12)
>>>
>>> And for a different (but similiar system) I also increased the maximum
>>> number of iterations for the Poisson solver as explained in this
>>> thread:
>>> https://listserv.fysik.dtu.dk/pipermail/gpaw-users/2010-October/000408.html
>>>
>>> Juho
>>>
>>>>> Initially when I used the same calculator options as for the 6 layer
>>>>> surface the calculation for 8 layers froze. After that I have tried
>>>>> less agressive mixing, different Poisson solvers, extracting the
>>>>> single-zeta polarization basis set from the double-zeta polarization
>>>>> basis sets, and eventually even a different eigensolver.
>>>>>
>>>>> These calculation were run at Louhi and the last one I tried was this one:
>>>>>
>>>>> from ase.parallel import paropen
>>>>> from gpaw import restart, Mixer
>>>>> from gpaw.poisson import PoissonSolver
>>>>> from ase.constraints import FixAtoms
>>>>> from ase.optimize import QuasiNewton
>>>>> from numpy import zeros
>>>>>
>>>>> resultfile = paropen('8-layer-cg-results.txt', 'w')
>>>>>
>>>>> for vac in range(5, 15):
>>>>>
>>>>> name = '8-layer-cg-%.i-vacuum' % vac
>>>>>
>>>>> k = 8
>>>>> a = 3.643 # Lattice constant from the convergence
>>>>> calculations
>>>>>
>>>>> surface, calc = restart('bulk-8-initial.gpw',
>>>>> txt = name + '.txt',
>>>>> eigensolver = 'cg',
>>>>> mixer=Mixer(beta=0.1, nmaxold=2, weight=100.0),
>>>>> poissonsolver=PoissonSolver(nn=3, relax='J'),
>>>>> basis='szp(dzp)',
>>>>> parallel={'sl_default': (5,1,64),
>>>>> 'domain': (1,1,5)},
>>>>> kpts = (k, k, 1))
>>>>>
>>>>> # Fixing of the two bottom layers
>>>>>
>>>>> fix = 2
>>>>>
>>>>> positions = surface.get_positions()
>>>>> number = positions.size / 3
>>>>>
>>>>> array = zeros([number])
>>>>> for i in range(0, number):
>>>>> if (surface.positions[i][2]< fix * a / 2):
>>>>> array[i] = 1
>>>>> surface.set_tags(array)
>>>>> c = FixAtoms(mask=[atom.tag == 1 for atom in surface])
>>>>> surface.set_constraint(c)
>>>>>
>>>>> surface.pbc = (1, 1, 0) # Set PBC
>>>>> false in z-direction
>>>>> surface.center(axis = 2, vacuum = vac) # Center the
>>>>> system in z-direction and add vacuum
>>>>>
>>>>> surface.set_calculator(calc)
>>>>>
>>>>> # Relaxation of the surface
>>>>>
>>>>> relax = QuasiNewton(surface, trajectory = name + '.traj')
>>>>> relax.run(fmax = 0.05)
>>>>>
>>>>> Equivalent code worked with 4 and 6 layers just fine. Then of course
>>>>> there were no need for different eigensolver or basis etc. For some
>>>>> reason the only error message I get, is Louhi sending me this to my
>>>>> email:
>>>>>
>>>>> PBS Job Id: 1098436.sdb
>>>>> Job Name: 8-layer-cg
>>>>> Post job file processing error; job 1098436.sdb on host nid00143
>>>>>
>>>>>
>>>>> Unable to copy file /var/spool/PBS/spool/1098436.sdb.OU to
>>>>> nid00139://wrk/arjoran/8-layer-cg.o1098436
>>>>> error from copy
>>>>> nid00139: Connection refused
>>>>> end error output
>>>>> Output retained on that host in: /var/spool/PBS/undelivered/1098436.sdb.OU
>>>>>
>>>>> Any ideas how the get that running?
>>>>>
>>>>> Juho Arjoranta
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> gpaw-users mailing list
>>>>> gpaw-users at listserv.fysik.dtu.dk
>>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>>>>
>>>>
>>
>> --
>> ***********************************
>>
>> Marcin Dulak
>> Technical University of Denmark
>> Department of Physics
>> Building 307, Room 229
>> DK-2800 Kongens Lyngby
>> Denmark
>> Tel.: (+45) 4525 3157
>> Fax.: (+45) 4593 2399
>> email: Marcin.Dulak at fysik.dtu.dk
>>
>> ***********************************
>>
>>
>
--
***********************************
Marcin Dulak
Technical University of Denmark
Department of Physics
Building 307, Room 229
DK-2800 Kongens Lyngby
Denmark
Tel.: (+45) 4525 3157
Fax.: (+45) 4593 2399
email: Marcin.Dulak at fysik.dtu.dk
***********************************
More information about the gpaw-users
mailing list