[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?
Gaël Donval
G.Donval at bath.ac.uk
Thu Aug 9 18:01:48 CEST 2018
On Thu, 2018-08-09 at 10:06 -0500, Ask Hjorth Larsen wrote:
> Hello,
>
> 2018-08-09 5:31 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
> > Hi both,
> > > On 08/09/2018 05:46 AM, Ask Hjorth Larsen via gpaw-users wrote:
> > > > Hello,
> > > >
> > > > 2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
>
> (....)
> > > I think switching to mpi4py would be difficult and, as Ask
> > > mentioned,
> > > not so nice for users. And our C-extension still needs to call
> > > MPI
> > > functions.
> >
> > Let's forget about the switch then: I get your point.
> >
> > Ask, I also get that you don't see any compelling reasons to
> > separate
> > those things, so let's put that on hold for now.
> >
> > Let's assume I'm restricting myself to having a fully working
> > parallel
> > _gpaw.so version. Nothing more (I don't plan to touch `gpaw-python`
> > at
> > all). For that I need a single point of entry to MPI from Python.
> > Why?
> > Because it is simpler for both users and developers (i.e. single
> > way of
> > doing things, single place to look at, single place to update) AND
> > because that would allow us to provide the same guarantees as
> > `gpaw-
> > python`.
> >
>
> +1, I hope it is not too difficult
So do I...
>
> > That point of entry could check whether MPI is already initialized
> > and
> > raise a suitable exception if that's the case: that way, if no
> > exception is raised, we know we are in control, just like in `gpaw-
> > python`. The user could still load mpi4py after the fact and meddle
> > with MPI but so can he within `gpaw-python`...
>
> What do you think about having a Python-level 'gpaw' subcommand in
> which we manage our parallelization?
Something along the lines of the following?
mpiexec python -m gpaw blah.py
A "compatibility" `gpaw-python` script doing just that could also be
provided.
If so, I don't see why not.
>
> What matters is that we control the preemtive imports and code
> initialization. When we don't, the user may do things in any order
> and it becomes very difficult.
>
> >
> >
> > I get that ASE needs to know whether it's running in parallel but
> > does
> > not know what program it's going to use. There are 3 obvious
> > solutions
> > to that:
> > * make a separate MPI communicator subproject that implements the
> > required interface (that would be reimplementing mpi4py) or
> > alternatively, migrate mpi.c to ase since ase needs to know
> > about
> > MPI! (I know this is not really a solution but this is what
> > makes
> > sense)
> > * try to load a working communicator implementation from well-
> > known
> > compiled modules such as gpaw, asap, etc. (as long as the
> > interface
> > is identical, it wouldn't change anything...)
>
> I prefer/suggest that the program (gpaw, asap) knows what it wants
> and
> tells ASE what it knows about the runtime. It is more explicit.
I agree. How would you handle scripts using both GPAW and ASAP?
>
> > * Make a modifiable ase.parallel.world and add a registration
> > mechanism for gpaw to declare the existence of its MPI
> > implementation (Jens Jørgen's suggestion).
> >
> >
> > Assuming I follow the last route, what exactly would pose problem
> > in
> > ASE?
> >
> > The static rank numbers could become a Rank object instead with a
> > is_master() and is_slave() methods: that would seem to solve ~95%
> > of
> > the use cases in ASE (from a quick grep).
> >
> > There doesn't seem to be any static construct which construction we
> > can't postpone until just before the calculation. Actually,
> > ase.parallel.world could itself be a smart object so that local
> > `self.world` copies are still up to date.
>
> We can capture all we need at a single point: startup. This is
> fundamentally simpler than needing things to initialize correctly at
> different points in the code. The user can always somehow resolve a
> rank into an integer or boolean which later could become out-of-sync
> if parallel initialization can happen later.
I agree: it could be done in gpaw.__main__ then, guaranteeing a frozen
world from the very start. It's also very explicit and versatile: if
you want to do something else, then don't use `python -m gpaw`... I
quite like that approach.
I need to think about how it can be made. If you also have ideas about
how to get rid of ranks altogether, I'm all for.
>
> I have fought circular communicator imports and speculative attempts
> at initializing things on a few different occasions. That's why I am
> so much in favour of initializing at startup level and ensuring a
> layer (gpaw-python or gpaw subcommand) that *we* control.
I'll work with that in mind.
Thanks both for your input.
Gaël
>
> Best regards
> Ask
>
> >
> > Gaël
> >
> > >
> > > Jens Jørgen
> > >
> > > > So I'd say the (MPI-based) parallelism must be completely
> > > > determined
> > > > when the program starts, and definitely before any line written
> > > > by
> > > > the
> > > > user is executed.
> > > >
> > > > Best regards
> > > > Ask
> > > >
> > > > > Gaël
> > > > >
> > > > > > Best regards
> > > > > > Ask
> > > > > >
> > > > > > > Gaël
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > gpaw-users mailing list
> > > > > > > gpaw-users at listserv.fysik.dtu.dk<mailto:
> > > > > > > gpaw-users at listserv.fysik.dtu.dk>
> > > > > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> > > >
> > > > _______________________________________________
> > > > gpaw-users mailing list
> > > > gpaw-users at listserv.fysik.dtu.dk
> > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> > >
> > >
More information about the gpaw-users
mailing list