[gpaw-users] BSSE parallel issue

Ask Hjorth Larsen asklarsen at gmail.com
Thu Sep 12 15:55:43 CEST 2013


Hello Glen

Yes, I can reproduce the problem on 24 cores.  I doubt it is
fundamentally caused by the ghost setups however, as they are just
special cases of ordinary setups.  There might be some issue with
array shapes changing strangely, which could be caused by any change
of setups.

On another note, it runs successfully with 8 cores so you should be
able to run the calculations that way.  8 cores is also a more
reasonable parallelization for this type of system although this type
of error should never occur.  Please report if you see the problem
again with other parameters, and thank you for reporting this.  I will
do a bit more testing but may not have time to get to the bottom of it
unless I can figure out a faster way to test it.

Regards
Ask

2013/9/11 Glen Jenness <glenjenness at gmail.com>:
> Ask,
> If you take a look at rhodium.py, I first optimize with LBFGS, and that goes
> fine.  Once I hit the part with:
>
> calc.set(setups={'Rh': 'ghost', 'C': 'paw', 'O': 'paw'})
> rhodium.set_calculator(calc)
> e_co = rhodium.get_potential_energy()
> parprint('e_co = %s' % e_co)
>
> That's where I hit trouble (as shown in 3nodes.out).  From 3nodes.txt we see
> that it starts to do a memory estimate....then nothing.  No ghost atoms are
> setup, nothing.  And it will hang there until I kill it  --- I once had it
> run overnight and it stayed at that point for ~20 hours.
>
> I did what you suggested, and for the ghost atoms I shunted the output to a
> different .txt file.  Here is a tarball of everything.
>
>
> On Sun, Sep 8, 2013 at 6:43 PM, Ask Hjorth Larsen <asklarsen at gmail.com>
> wrote:
>>
>> Hello Glen
>>
>> None of the calculations in 3nodes.txt contain any ghost atoms (try
>> grepping the output files for 'Ghost').  Are you quite sure that the
>> crash happens after the first relaxation is done?  You can try setting
>> a new txt after the relaxation is done so it writes new stuff into a
>> different file.
>>
>> While the segfault is nasty, I think we should solve one problem at a
>> time - so let's forget scalapack for now and concentrate on the other
>> stuff.
>>
>> Is the problem reproducible across multiple identical runs?
>>
>> Also: Parallelizing over 3 nodes with 5 k-points and nothing more than
>> ~20 atoms is very inefficient.  For a system of this size you should
>> not be using more than one node.  A single node should get you well
>> beyond 200 atoms on most computers, even with 5 k-points (although at
>> 200 atoms you could probably make effective use of approximately as
>> many nodes as there are irreducible-BZ k-points).  But we should keep
>> using 3 nodes for now in order to figure out what the problem is, of
>> course.
>>
>> Regards
>> Ask
>>
>> 2013/9/8 Glen Jenness <glenjenness at gmail.com>:
>> > Ask,
>> > Sorry it's a bit late (I moved from Wi to De in the past week), but here
>> > is
>> > the information you requested.  rhodium.py is the actual script --- it's
>> > just CO on a Rh (111) surface with 4 layers.  For 2nodes and 3nodes, I
>> > had
>> > PL = dict(), and then did a run with PL = {'sl_auto': True}.  2nodes was
>> > a
>> > successful run, 3nodes stalled --- once it got to that point I let it
>> > run
>> > for ~5 hours, and it didn't move.
>> >
>> > Rhodium.out gives the full errors from having sl_auto set to True.
>> >
>> > Thanks!
>> > Glen
>> >
>> >
>> > On Sun, Sep 1, 2013 at 8:39 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
>> > wrote:
>> >>
>> >> Also: Please attach full scripts (written so as to demonstrate the
>> >> error) and logfiles so I don't have to guess which parameters to
>> >> change.  For example I don't know how many CPUs you were using.
>> >>
>> >> Regards
>> >> Ask
>> >>
>> >> 2013/9/1 Ask Hjorth Larsen <asklarsen at gmail.com>:
>> >> > Hello
>> >> >
>> >> > It works for me.
>> >> >
>> >> > Note that 17 atoms is not enough for scalapack to be a good idea.
>> >> >
>> >> > The first parameter in your mixer should be 0.04, not 0.4.
>> >> >
>> >> > Best regards
>> >> > Ask
>> >> >
>> >> >
>> >> > 2013/9/1 Glen Jenness <glenjenness at gmail.com>:
>> >> >> Hi GPAW users!
>> >> >> I ran into a curious problem when running GPAW in parallel while
>> >> >> specifying
>> >> >> ghost centers for a BSSE correction.
>> >> >>
>> >> >> I will be able to run my dimer system (in this case a CO molecule on
>> >> >> a
>> >> >> Rh
>> >> >> (111) surface), but then when I specify calc.set(setups={'Rh':
>> >> >> 'ghost"
>> >> >> etc.), it'll enter the memory estimate part, and then freeze if I
>> >> >> run
>> >> >> over 1
>> >> >> node.
>> >> >>
>> >> >> A colleague suggested setting the parallel option sl_auto to True,
>> >> >> but
>> >> >> doing
>> >> >> so gives:
>> >> >>
>> >> >> ] [27] gpaw-python(PyObject_Call+0x5d) [0x49128d]
>> >> >> [compute-0-6:29024] [28] gpaw-python(PyEval_EvalFrameEx+0x399d)
>> >> >> [0x50dbfd]
>> >> >> [compute-0-6:29024] [29] gpaw-python(PyEval_EvalCodeEx+0x89b)
>> >> >> [0x511ffb]
>> >> >> [compute-0-6:29024] *** End of error message ***
>> >> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> >> base/pls_base_orted_cmds.c at line 275
>> >> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> >> pls_tm_module.c at line 572
>> >> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> >> errmgr_hnp.c at line 90
>> >> >> mpirun noticed that job rank 0 with PID 17481 on node compute-0-9
>> >> >> exited on
>> >> >> signal 11 (Segmentation fault).
>> >> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> >> base/pls_base_orted_cmds.c at line 188
>> >> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> >> pls_tm_module.c at line 603
>> >> >>
>> >> >>
>> >> >> Any idea what could cause either issue?
>> >> >>
>> >> >> Thanks!
>> >> >>
>> >> >> My python input is:
>> >> >>
>> >> >> from ase.atoms import Atoms
>> >> >> from ase.lattice.surface import fcc111, add_adsorbate
>> >> >> from ase.constraints import FixAtoms
>> >> >> from ase.optimize.lbfgs import LBFGS
>> >> >> from ase.parallel import parprint
>> >> >>
>> >> >> from gpaw import GPAW, Mixer, FermiDirac
>> >> >>
>> >> >> PL = {'sl_auto': True}
>> >> >>
>> >> >> verb = False
>> >> >> mix = Mixer(beta=0.40, nmaxold=45, weight=50.0)
>> >> >> occ = FermiDirac(0.1)
>> >> >>
>> >> >> calc = GPAW(mode='lcao', basis='dzp', txt='rhodium.txt',
>> >> >> kpts=(5,5,1),
>> >> >> occupations=occ, xc='PBE', verbose=verb, mixer=mix, parallel=PL)
>> >> >>
>> >> >> rhodium = fcc111('Rh', (1,1,4), vacuum=8.0)
>> >> >> constraint = FixAtoms([0, 1])
>> >> >> rhodium.set_constraint(constraint)
>> >> >> rhodium *= (2,2,1)
>> >> >>
>> >> >> co = Atoms('CO', positions=[(0,0,0), (0,0,1.14)])
>> >> >> add_adsorbate(rhodium, co, 1.8, position='ontop')
>> >> >>
>> >> >> rhodium.set_calculator(calc)
>> >> >>
>> >> >> opt = LBFGS(rhodium, trajectory='co-rhodium.traj')
>> >> >> opt.run(fmax=0.01)
>> >> >> e_ads = rhodium.get_potential_energy()
>> >> >> parprint('e_ads = %f' % e_ads)
>> >> >>
>> >> >> calc.set(setups={'Rh': 'ghost', 'C': 'paw', 'O': 'paw'})
>> >> >> rhodium.set_calculator(calc)
>> >> >> e_co = rhodium.get_potential_energy()
>> >> >> parprint('e_co = %s' % e_co)
>> >> >>
>> >> >> calc.set(setups={'Rh': 'paw', 'C': 'ghost', 'O': 'ghost'})
>> >> >> rhodium.set_calculator(calc)
>> >> >> e_surf = rhodium.get_potential_energy()
>> >> >> parprint('e_surf = %s' % e_surf)
>> >> >>
>> >> >> parprint('E_BE = %f' % ( e_ads - e_co - e_surf))
>> >> >>
>> >> >> --
>> >> >> Dr. Glen Jenness
>> >> >> Schmidt Group/Morgan Group
>> >> >> Department of Chemistry/Materials Science and Engineering (MSAE)
>> >> >> University of Wisconsin - Madison
>> >> >>
>> >> >> _______________________________________________
>> >> >> gpaw-users mailing list
>> >> >> gpaw-users at listserv.fysik.dtu.dk
>> >> >> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> >
>> >
>> >
>> >
>> > --
>> > Dr. Glen Jenness
>> > Schmidt Group/Morgan Group
>> > Department of Chemistry/Materials Science and Engineering (MSAE)
>> > University of Wisconsin - Madison
>
>
>
>
> --
> Dr. Glen Jenness
> Schmidt Group/Morgan Group
> Department of Chemistry/Materials Science and Engineering (MSAE)
> University of Wisconsin - Madison


More information about the gpaw-users mailing list