[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?

Thu Aug 9 05:46:17 CEST 2018

Hello,

2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
> Hi,
>> Hi,
>>
>> 2018-08-08 9:59 GMT-05:00 Gaël Donval via gpaw-users
>> <gpaw-users at listserv.fysik.dtu.dk<mailto:gpaw-users at listserv.fysik.dtu.dk>>:
>> > Hi,
>> >
>> > The test `generic/hydrogen.py` hangs with the pristine python
>> > interpreter with PARALLEL support provided in this MR:
>> >
>> >    https://gitlab.com/gpaw/gpaw/merge_requests/403
>> >
>> > This is caused by the ASE DB access part of the test:
>> >
>> >  * gpaw.mpi.rank gives the right rank.
>> >  * ase.parallel.rank always returns 0 ("DummyMPI()" is used because
>> >    _gpaw is then not built-in).
>> >
>> > I can submit an MR in ASE to fix that last point but I'm really
>> > starting to wonder if we shouldn't:
>> >
>> >  * at the very least separate mpi.c from _gpaw.c to avoid having
>> > MPI
>> >    coupling all other the place;
>> >  * migrate to mpi4py and keep it well-separated as well.
>>
>> What is the motivation for separating MPI support from _gpaw?
>
> Maintainability, separation of concerns, making MPI optional /switchable with other implementations or something else completely.
>
> Maintainabily: more in Python, less in C at negligible runtime cost.

Only less C if we remove the C stuff though.  Having what we have now
*plus* mpi4py means more complexity, even if there are also
advantages.  What this means is that the advantages must be absolutely
clear, and significant.  Most of this is rather abstract to me though.
It would be nice with some very concrete advantages relating to
typical use.

>
> Separation of concerns: currently _gpaw is a lib, a python module and an executable, all working slightly differently from each other. The "lib" part aggregates the content of hardcoded dependencies and code written in C. The "module" part makes those dependencies visible to Python. The "executable" part is an interpreter that leverages nothing from the "module" part of the file. Change something in mpi.c and you HAVE to grep things in the whole folder to adapt the rest of the code to the changes.

Yes, things could be made more beautiful.  I think it is not GPAW's
greatest problem though, because most of the things in _gpaw do not
interfere with each other.

>
> Making MPI optional: well, if it is optional, it is simpler to distribute and use (for instance, Python's concurrent module could be used to implement a multithreaded _Communicator class). Not very useful on an HPC cluster (though now with 32+ core nodes...) but it would be a boon to include GPAW in linux distros and still be able to use some degree of parallelism for post processing.

Which distros?  In debian it is already parallel, surely?

>
> Making MPI switchable: multithreading could be used, as mentioned, but not requiring to know you MPI implementation at compile time would let you switch to whatever you want at runtime. Instead of compiling "gpaw 1.4.0 with openmpi 2" then "gpaw 1.4.0 with openmpi 3", you compile "gpaw 1.4.0" that you plug with the mpi provider of your choice that you can swap however you want.
>
> (The same thing could be said and done with fft and blas)

Having flexible threading is nice, but is it all that important?  For
computations people will use one MPI or the other, and presumably not
threading.  Unless we have some bold plans for improving threading
within GPAW, in which case there could be some MPI/threading
combination.

>
>
>
>>
>> >
>> > From Python's perspective, all the MPI stuff in `_gpaw` does not
>> > exist:
>> > it is only ever used in `gpaw.mpi` precisely to provide MPI. Yet it
>> > is
>> > everywhere: I'd really like to get rid of it.
>> >
>> > What do you think bout it?
>>
>> Which is the exact thing that you would like to get rid of?
>
> The coupling between _gpaw and MPI things.
>
> MPI (or any other parallelism provider) can be selected in Python instead.

If the user does "from ase.parallel import rank" in line 1, then the
damage has happened: The rank is what it was and can never be changed.

>
>> When you start gpaw-python it will immediately initialize all the MPI
>> stuff, guaranteeing that there can be no problem - but we can only do
>> that because we control the startup sequence in gpaw-python.
>
> Couldn't you do that when GPAW is loaded instead? I mean, if mpi.c is converted to a proper module, moduleinit could handle the same things, couldn't it? I don't know why mpi.c should be aware of anything in _gpaw.c to initialise itself correctly and if it doesn't depend on anything in _gpaw, it's better if it doesn't appear there.
>
>>
>> Probably it is a good idea to migrate to mpi4py in the long run from
>> a
>> development perspective, but we often get very unpopular for adding
>> dependencies, so it is not something that I would push for unless
>> there are very good reasons to ditch the old framework.
>
> I understand that.
>
> There is no need to ditch the code at all (though I must admit, that was totally the plan)! mpi.c-based Python module, mpi4py, etc. could be considered as mere communicator providers. If people know what MPI implementation they want to use at compile time, mpi.c would still work as it is now. But if people/distros don't want to hardcode such relationship, it is hard today, not because of mpi.c but because it is explicitly used in _gpaw.c.
>
>
> My present approach is to define a proper Communicator interface in ASE (completely based on GPAW's one). When ASE is loaded, it would look for suitable Communicator implementations (mpi4py, mpi.c, etc) and use whichever is specified or works. By default, everything should be exactly the same, except the MPI initialization that is occuring later.
>
> Does that make sense?

When ASE is imported, it does not know whether the user is later going
to import GPAW or ASAP or mpi4py, and hence whether to get a
communicator from there.  But a ton of mechanisms need to know whether
things will occur in parallel, and the way they are written, they
cannot change dynamically.  They could be rewritten to change
dynamically, but I think it would be very prone to regressions.

So I'd say the (MPI-based) parallelism must be completely determined
when the program starts, and definitely before any line written by the
user is executed.

Best regards
Ask

>
> Gaël
>
>>
>> Best regards
>> Ask
>>
>> >
>> > Gaël
>> >
>> > _______________________________________________
>> > gpaw-users mailing list
>> > gpaw-users at listserv.fysik.dtu.dk<mailto:gpaw-users at listserv.fysik.dtu.dk>
>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>