[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?

Gaël Donval G.Donval at bath.ac.uk
Thu Aug 9 12:31:58 CEST 2018


Hi both,
> On 08/09/2018 05:46 AM, Ask Hjorth Larsen via gpaw-users wrote:
> > Hello,
> > 
> > 2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
> > > Hi,
> > > > Hi,
> > > > 
> > > > 2018-08-08 9:59 GMT-05:00 Gaël Donval via gpaw-users
> > > > <gpaw-users at listserv.fysik.dtu.dk<mailto:
> > > > gpaw-users at listserv.fysik.dtu.dk>>:
> > > > > Hi,
> > > > > 
> > > > > The test `generic/hydrogen.py` hangs with the pristine python
> > > > > interpreter with PARALLEL support provided in this MR:
> > > > > 
> > > > >     https://gitlab.com/gpaw/gpaw/merge_requests/403
> > > > > 
> > > > > This is caused by the ASE DB access part of the test:
> > > > > 
> > > > >   * gpaw.mpi.rank gives the right rank.
> > > > >   * ase.parallel.rank always returns 0 ("DummyMPI()" is used
> > > > > because
> > > > >     _gpaw is then not built-in).
> > > > > 
> > > > > I can submit an MR in ASE to fix that last point but I'm
> > > > > really
> > > > > starting to wonder if we shouldn't:
> > > > > 
> > > > >   * at the very least separate mpi.c from _gpaw.c to avoid
> > > > > having
> > > > > MPI
> > > > >     coupling all other the place;
> > > > >   * migrate to mpi4py and keep it well-separated as well.
> > > > 
> > > > What is the motivation for separating MPI support from _gpaw?
> > > 
> > > Maintainability, separation of concerns, making MPI optional
> > > /switchable with other implementations or something else
> > > completely.
> > > 
> > > Maintainabily: more in Python, less in C at negligible runtime
> > > cost.
> > 
> > Only less C if we remove the C stuff though.  Having what we have
> > now
> > *plus* mpi4py means more complexity, even if there are also
> > advantages.  What this means is that the advantages must be
> > absolutely
> > clear, and significant.  Most of this is rather abstract to me
> > though.
> > It would be nice with some very concrete advantages relating to
> > typical use.
> > 
> > > Separation of concerns: currently _gpaw is a lib, a python module
> > > and an executable, all working slightly differently from each
> > > other. The "lib" part aggregates the content of hardcoded
> > > dependencies and code written in C. The "module" part makes those
> > > dependencies visible to Python. The "executable" part is an
> > > interpreter that leverages nothing from the "module" part of the
> > > file. Change something in mpi.c and you HAVE to grep things in
> > > the whole folder to adapt the rest of the code to the changes.
> > 
> > Yes, things could be made more beautiful.  I think it is not GPAW's
> > greatest problem though, because most of the things in _gpaw do not
> > interfere with each other.
> > 
> > > Making MPI optional: well, if it is optional, it is simpler to
> > > distribute and use (for instance, Python's concurrent module
> > > could be used to implement a multithreaded _Communicator class).
> > > Not very useful on an HPC cluster (though now with 32+ core
> > > nodes...) but it would be a boon to include GPAW in linux distros
> > > and still be able to use some degree of parallelism for post
> > > processing.
> > 
> > Which distros?  In debian it is already parallel, surely?
> > 
> > > Making MPI switchable: multithreading could be used, as
> > > mentioned, but not requiring to know you MPI implementation at
> > > compile time would let you switch to whatever you want at
> > > runtime. Instead of compiling "gpaw 1.4.0 with openmpi 2" then
> > > "gpaw 1.4.0 with openmpi 3", you compile "gpaw 1.4.0" that you
> > > plug with the mpi provider of your choice that you can swap
> > > however you want.
> > > 
> > > (The same thing could be said and done with fft and blas)
> > 
> > Having flexible threading is nice, but is it all that
> > important?  For
> > computations people will use one MPI or the other, and presumably
> > not
> > threading.  Unless we have some bold plans for improving threading
> > within GPAW, in which case there could be some MPI/threading
> > combination.
> > 
> > > 
> > > 
> > > > >  From Python's perspective, all the MPI stuff in `_gpaw` does
> > > > > not
> > > > > exist:
> > > > > it is only ever used in `gpaw.mpi` precisely to provide MPI.
> > > > > Yet it
> > > > > is
> > > > > everywhere: I'd really like to get rid of it.
> > > > > 
> > > > > What do you think bout it?
> > > > 
> > > > Which is the exact thing that you would like to get rid of?
> > > 
> > > The coupling between _gpaw and MPI things.
> > > 
> > > MPI (or any other parallelism provider) can be selected in Python
> > > instead.
> > 
> > If the user does "from ase.parallel import rank" in line 1, then
> > the
> > damage has happened: The rank is what it was and can never be
> > changed.
> > 
> > > > When you start gpaw-python it will immediately initialize all
> > > > the MPI
> > > > stuff, guaranteeing that there can be no problem - but we can
> > > > only do
> > > > that because we control the startup sequence in gpaw-python.
> > > 
> > > Couldn't you do that when GPAW is loaded instead? I mean, if
> > > mpi.c is converted to a proper module, moduleinit could handle
> > > the same things, couldn't it? I don't know why mpi.c should be
> > > aware of anything in _gpaw.c to initialise itself correctly and
> > > if it doesn't depend on anything in _gpaw, it's better if it
> > > doesn't appear there.
> > > 
> > > > Probably it is a good idea to migrate to mpi4py in the long run
> > > > from
> > > > a
> > > > development perspective, but we often get very unpopular for
> > > > adding
> > > > dependencies, so it is not something that I would push for
> > > > unless
> > > > there are very good reasons to ditch the old framework.
> > > 
> > > I understand that.
> > > 
> > > There is no need to ditch the code at all (though I must admit,
> > > that was totally the plan)! mpi.c-based Python module, mpi4py,
> > > etc. could be considered as mere communicator providers. If
> > > people know what MPI implementation they want to use at compile
> > > time, mpi.c would still work as it is now. But if people/distros
> > > don't want to hardcode such relationship, it is hard today, not
> > > because of mpi.c but because it is explicitly used in _gpaw.c.
> > > 
> > > 
> > > My present approach is to define a proper Communicator interface
> > > in ASE (completely based on GPAW's one). When ASE is loaded, it
> > > would look for suitable Communicator implementations (mpi4py,
> > > mpi.c, etc) and use whichever is specified or works. By default,
> > > everything should be exactly the same, except the MPI
> > > initialization that is occuring later.
> > > 
> > > Does that make sense?
> > 
> > When ASE is imported, it does not know whether the user is later
> > going
> > to import GPAW or ASAP or mpi4py, and hence whether to get a
> > communicator from there.  But a ton of mechanisms need to know
> > whether
> > things will occur in parallel, and the way they are written, they
> > cannot change dynamically.  They could be rewritten to change
> > dynamically, but I think it would be very prone to regressions.
> 
> It would be a bit of work, but doable and worth it in my opinion. In
> the 
> long run, I'd like to get rid of gpaw-python and just have _gpaw.so.
> 
> We could start by getting rid of all "from ase.parallel import rank"
> and 
> "from gpaw.mpi import rank" and maybe add a deprecation warning
> (don't 
> know how that can be done).  Then we could create a new 
> ase.parallel.world object that can be modified at run time.
> 
> I think switching to mpi4py would be difficult and, as Ask
> mentioned, 
> not so nice for users.  And our C-extension still needs to call MPI 
> functions.

Let's forget about the switch then: I get your point.

Ask, I also get that you don't see any compelling reasons to separate
those things, so let's put that on hold for now.

Let's assume I'm restricting myself to having a fully working parallel
_gpaw.so version. Nothing more (I don't plan to touch `gpaw-python` at
all). For that I need a single point of entry to MPI from Python. Why?
Because it is simpler for both users and developers (i.e. single way of
doing things, single place to look at, single place to update) AND
because that would allow us to provide the same guarantees as `gpaw-
python`.

That point of entry could check whether MPI is already initialized and
raise a suitable exception if that's the case: that way, if no
exception is raised, we know we are in control, just like in `gpaw-
python`. The user could still load mpi4py after the fact and meddle
with MPI but so can he within `gpaw-python`...


I get that ASE needs to know whether it's running in parallel but does
not know what program it's going to use. There are 3 obvious solutions
to that: 
 * make a separate MPI communicator subproject that implements the
   required interface (that would be reimplementing mpi4py) or
   alternatively, migrate mpi.c to ase since ase needs to know about
   MPI! (I know this is not really a solution but this is what makes
   sense)
 * try to load a working communicator implementation from well-known
   compiled modules such as gpaw, asap, etc. (as long as the interface
   is identical, it wouldn't change anything...)
 * Make a modifiable ase.parallel.world and add a registration
   mechanism for gpaw to declare the existence of its MPI
   implementation (Jens Jørgen's suggestion).


Assuming I follow the last route, what exactly would pose problem in
ASE? 

The static rank numbers could become a Rank object instead with a
is_master() and is_slave() methods: that would seem to solve ~95% of
the use cases in ASE (from a quick grep).

There doesn't seem to be any static construct which construction we
can't postpone until just before the calculation. Actually,
ase.parallel.world could itself be a smart object so that local
`self.world` copies are still up to date.

Gaël

> 
> Jens Jørgen
> 
> > So I'd say the (MPI-based) parallelism must be completely
> > determined
> > when the program starts, and definitely before any line written by
> > the
> > user is executed.
> > 
> > Best regards
> > Ask
> > 
> > > Gaël
> > > 
> > > > Best regards
> > > > Ask
> > > > 
> > > > > Gaël
> > > > > 
> > > > > _______________________________________________
> > > > > gpaw-users mailing list
> > > > > gpaw-users at listserv.fysik.dtu.dk<mailto:
> > > > > gpaw-users at listserv.fysik.dtu.dk>
> > > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> > 
> > _______________________________________________
> > gpaw-users mailing list
> > gpaw-users at listserv.fysik.dtu.dk
> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> 
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3492 bytes
Desc: not available
URL: <http://listserv.fysik.dtu.dk/pipermail/gpaw-users/attachments/20180809/162eddf5/attachment.bin>


More information about the gpaw-users mailing list