[gpaw-users] BSSE parallel issue

Mon Sep 9 01:43:07 CEST 2013

Hello Glen

None of the calculations in 3nodes.txt contain any ghost atoms (try
grepping the output files for 'Ghost').  Are you quite sure that the
crash happens after the first relaxation is done?  You can try setting
a new txt after the relaxation is done so it writes new stuff into a
different file.

While the segfault is nasty, I think we should solve one problem at a
time - so let's forget scalapack for now and concentrate on the other
stuff.

Is the problem reproducible across multiple identical runs?

Also: Parallelizing over 3 nodes with 5 k-points and nothing more than
~20 atoms is very inefficient.  For a system of this size you should
not be using more than one node.  A single node should get you well
beyond 200 atoms on most computers, even with 5 k-points (although at
200 atoms you could probably make effective use of approximately as
many nodes as there are irreducible-BZ k-points).  But we should keep
using 3 nodes for now in order to figure out what the problem is, of
course.

Regards
Ask

2013/9/8 Glen Jenness <glenjenness at gmail.com>:
> Ask,
> Sorry it's a bit late (I moved from Wi to De in the past week), but here is
> the information you requested.  rhodium.py is the actual script --- it's
> just CO on a Rh (111) surface with 4 layers.  For 2nodes and 3nodes, I had
> PL = dict(), and then did a run with PL = {'sl_auto': True}.  2nodes was a
> successful run, 3nodes stalled --- once it got to that point I let it run
> for ~5 hours, and it didn't move.
>
> Rhodium.out gives the full errors from having sl_auto set to True.
>
> Thanks!
> Glen
>
>
> On Sun, Sep 1, 2013 at 8:39 AM, Ask Hjorth Larsen <asklarsen at gmail.com>
> wrote:
>>
>> Also: Please attach full scripts (written so as to demonstrate the
>> error) and logfiles so I don't have to guess which parameters to
>> change.  For example I don't know how many CPUs you were using.
>>
>> Regards
>> Ask
>>
>> 2013/9/1 Ask Hjorth Larsen <asklarsen at gmail.com>:
>> > Hello
>> >
>> > It works for me.
>> >
>> > Note that 17 atoms is not enough for scalapack to be a good idea.
>> >
>> > The first parameter in your mixer should be 0.04, not 0.4.
>> >
>> > Best regards
>> > Ask
>> >
>> >
>> > 2013/9/1 Glen Jenness <glenjenness at gmail.com>:
>> >> Hi GPAW users!
>> >> I ran into a curious problem when running GPAW in parallel while
>> >> specifying
>> >> ghost centers for a BSSE correction.
>> >>
>> >> I will be able to run my dimer system (in this case a CO molecule on a
>> >> Rh
>> >> (111) surface), but then when I specify calc.set(setups={'Rh': 'ghost"
>> >> etc.), it'll enter the memory estimate part, and then freeze if I run
>> >> over 1
>> >> node.
>> >>
>> >> A colleague suggested setting the parallel option sl_auto to True, but
>> >> doing
>> >> so gives:
>> >>
>> >> ] [27] gpaw-python(PyObject_Call+0x5d) [0x49128d]
>> >> [compute-0-6:29024] [28] gpaw-python(PyEval_EvalFrameEx+0x399d)
>> >> [0x50dbfd]
>> >> [compute-0-6:29024] [29] gpaw-python(PyEval_EvalCodeEx+0x89b)
>> >> [0x511ffb]
>> >> [compute-0-6:29024] *** End of error message ***
>> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> base/pls_base_orted_cmds.c at line 275
>> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> pls_tm_module.c at line 572
>> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> errmgr_hnp.c at line 90
>> >> mpirun noticed that job rank 0 with PID 17481 on node compute-0-9
>> >> exited on
>> >> signal 11 (Segmentation fault).
>> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> base/pls_base_orted_cmds.c at line 188
>> >> [compute-0-9.local:17479] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> >> pls_tm_module.c at line 603
>> >>
>> >>
>> >> Any idea what could cause either issue?
>> >>
>> >> Thanks!
>> >>
>> >> My python input is:
>> >>
>> >> from ase.atoms import Atoms
>> >> from ase.lattice.surface import fcc111, add_adsorbate
>> >> from ase.constraints import FixAtoms
>> >> from ase.optimize.lbfgs import LBFGS
>> >> from ase.parallel import parprint
>> >>
>> >> from gpaw import GPAW, Mixer, FermiDirac
>> >>
>> >> PL = {'sl_auto': True}
>> >>
>> >> verb = False
>> >> mix = Mixer(beta=0.40, nmaxold=45, weight=50.0)
>> >> occ = FermiDirac(0.1)
>> >>
>> >> calc = GPAW(mode='lcao', basis='dzp', txt='rhodium.txt', kpts=(5,5,1),
>> >> occupations=occ, xc='PBE', verbose=verb, mixer=mix, parallel=PL)
>> >>
>> >> rhodium = fcc111('Rh', (1,1,4), vacuum=8.0)
>> >> constraint = FixAtoms([0, 1])
>> >> rhodium.set_constraint(constraint)
>> >> rhodium *= (2,2,1)
>> >>
>> >> co = Atoms('CO', positions=[(0,0,0), (0,0,1.14)])
>> >> add_adsorbate(rhodium, co, 1.8, position='ontop')
>> >>
>> >> rhodium.set_calculator(calc)
>> >>
>> >> opt = LBFGS(rhodium, trajectory='co-rhodium.traj')
>> >> opt.run(fmax=0.01)
>> >> e_ads = rhodium.get_potential_energy()
>> >> parprint('e_ads = %f' % e_ads)
>> >>
>> >> calc.set(setups={'Rh': 'ghost', 'C': 'paw', 'O': 'paw'})
>> >> rhodium.set_calculator(calc)
>> >> e_co = rhodium.get_potential_energy()
>> >> parprint('e_co = %s' % e_co)
>> >>
>> >> calc.set(setups={'Rh': 'paw', 'C': 'ghost', 'O': 'ghost'})
>> >> rhodium.set_calculator(calc)
>> >> e_surf = rhodium.get_potential_energy()
>> >> parprint('e_surf = %s' % e_surf)
>> >>
>> >> parprint('E_BE = %f' % ( e_ads - e_co - e_surf))
>> >>
>> >> --
>> >> Dr. Glen Jenness
>> >> Schmidt Group/Morgan Group
>> >> Department of Chemistry/Materials Science and Engineering (MSAE)
>> >> University of Wisconsin - Madison
>> >>
>> >> _______________________________________________
>> >> gpaw-users mailing list
>> >> gpaw-users at listserv.fysik.dtu.dk
>> >> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>
>
>
>
> --
> Dr. Glen Jenness
> Schmidt Group/Morgan Group
> Department of Chemistry/Materials Science and Engineering (MSAE)
> University of Wisconsin - Madison