[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?

Thu Aug 9 09:56:29 CEST 2018

On 08/09/2018 05:46 AM, Ask Hjorth Larsen via gpaw-users wrote:
> Hello,
>
> 2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
>> Hi,
>>> Hi,
>>>
>>> 2018-08-08 9:59 GMT-05:00 Gaël Donval via gpaw-users
>>> <gpaw-users at listserv.fysik.dtu.dk<mailto:gpaw-users at listserv.fysik.dtu.dk>>:
>>>> Hi,
>>>>
>>>> The test `generic/hydrogen.py` hangs with the pristine python
>>>> interpreter with PARALLEL support provided in this MR:
>>>>
>>>>     https://gitlab.com/gpaw/gpaw/merge_requests/403
>>>>
>>>> This is caused by the ASE DB access part of the test:
>>>>
>>>>   * gpaw.mpi.rank gives the right rank.
>>>>   * ase.parallel.rank always returns 0 ("DummyMPI()" is used because
>>>>     _gpaw is then not built-in).
>>>>
>>>> I can submit an MR in ASE to fix that last point but I'm really
>>>> starting to wonder if we shouldn't:
>>>>
>>>>   * at the very least separate mpi.c from _gpaw.c to avoid having
>>>> MPI
>>>>     coupling all other the place;
>>>>   * migrate to mpi4py and keep it well-separated as well.
>>> What is the motivation for separating MPI support from _gpaw?
>> Maintainability, separation of concerns, making MPI optional /switchable with other implementations or something else completely.
>>
>> Maintainabily: more in Python, less in C at negligible runtime cost.
> Only less C if we remove the C stuff though.  Having what we have now
> *plus* mpi4py means more complexity, even if there are also
> advantages.  What this means is that the advantages must be absolutely
> clear, and significant.  Most of this is rather abstract to me though.
> It would be nice with some very concrete advantages relating to
> typical use.
>
>> Separation of concerns: currently _gpaw is a lib, a python module and an executable, all working slightly differently from each other. The "lib" part aggregates the content of hardcoded dependencies and code written in C. The "module" part makes those dependencies visible to Python. The "executable" part is an interpreter that leverages nothing from the "module" part of the file. Change something in mpi.c and you HAVE to grep things in the whole folder to adapt the rest of the code to the changes.
> Yes, things could be made more beautiful.  I think it is not GPAW's
> greatest problem though, because most of the things in _gpaw do not
> interfere with each other.
>
>> Making MPI optional: well, if it is optional, it is simpler to distribute and use (for instance, Python's concurrent module could be used to implement a multithreaded _Communicator class). Not very useful on an HPC cluster (though now with 32+ core nodes...) but it would be a boon to include GPAW in linux distros and still be able to use some degree of parallelism for post processing.
> Which distros?  In debian it is already parallel, surely?
>
>> Making MPI switchable: multithreading could be used, as mentioned, but not requiring to know you MPI implementation at compile time would let you switch to whatever you want at runtime. Instead of compiling "gpaw 1.4.0 with openmpi 2" then "gpaw 1.4.0 with openmpi 3", you compile "gpaw 1.4.0" that you plug with the mpi provider of your choice that you can swap however you want.
>>
>> (The same thing could be said and done with fft and blas)
> Having flexible threading is nice, but is it all that important?  For
> computations people will use one MPI or the other, and presumably not
> threading.  Unless we have some bold plans for improving threading
> within GPAW, in which case there could be some MPI/threading
> combination.
>
>>
>>
>>>>  From Python's perspective, all the MPI stuff in `_gpaw` does not
>>>> exist:
>>>> it is only ever used in `gpaw.mpi` precisely to provide MPI. Yet it
>>>> is
>>>> everywhere: I'd really like to get rid of it.
>>>>
>>>> What do you think bout it?
>>> Which is the exact thing that you would like to get rid of?
>> The coupling between _gpaw and MPI things.
>>
>> MPI (or any other parallelism provider) can be selected in Python instead.
> If the user does "from ase.parallel import rank" in line 1, then the
> damage has happened: The rank is what it was and can never be changed.
>
>>> When you start gpaw-python it will immediately initialize all the MPI
>>> stuff, guaranteeing that there can be no problem - but we can only do
>>> that because we control the startup sequence in gpaw-python.
>> Couldn't you do that when GPAW is loaded instead? I mean, if mpi.c is converted to a proper module, moduleinit could handle the same things, couldn't it? I don't know why mpi.c should be aware of anything in _gpaw.c to initialise itself correctly and if it doesn't depend on anything in _gpaw, it's better if it doesn't appear there.
>>
>>> Probably it is a good idea to migrate to mpi4py in the long run from
>>> a
>>> development perspective, but we often get very unpopular for adding
>>> dependencies, so it is not something that I would push for unless
>>> there are very good reasons to ditch the old framework.
>> I understand that.
>>
>> There is no need to ditch the code at all (though I must admit, that was totally the plan)! mpi.c-based Python module, mpi4py, etc. could be considered as mere communicator providers. If people know what MPI implementation they want to use at compile time, mpi.c would still work as it is now. But if people/distros don't want to hardcode such relationship, it is hard today, not because of mpi.c but because it is explicitly used in _gpaw.c.
>>
>>
>> My present approach is to define a proper Communicator interface in ASE (completely based on GPAW's one). When ASE is loaded, it would look for suitable Communicator implementations (mpi4py, mpi.c, etc) and use whichever is specified or works. By default, everything should be exactly the same, except the MPI initialization that is occuring later.
>>
>> Does that make sense?
> When ASE is imported, it does not know whether the user is later going
> to import GPAW or ASAP or mpi4py, and hence whether to get a
> communicator from there.  But a ton of mechanisms need to know whether
> things will occur in parallel, and the way they are written, they
> cannot change dynamically.  They could be rewritten to change
> dynamically, but I think it would be very prone to regressions.

It would be a bit of work, but doable and worth it in my opinion. In the 
long run, I'd like to get rid of gpaw-python and just have _gpaw.so.

We could start by getting rid of all "from ase.parallel import rank" and 
"from gpaw.mpi import rank" and maybe add a deprecation warning (don't 
know how that can be done).  Then we could create a new 
ase.parallel.world object that can be modified at run time.

I think switching to mpi4py would be difficult and, as Ask mentioned, 
not so nice for users.  And our C-extension still needs to call MPI 
functions.

Jens Jørgen

> So I'd say the (MPI-based) parallelism must be completely determined
> when the program starts, and definitely before any line written by the
> user is executed.
>
> Best regards
> Ask
>
>> Gaël
>>
>>> Best regards
>>> Ask
>>>
>>>> Gaël
>>>>
>>>> _______________________________________________
>>>> gpaw-users mailing list
>>>> gpaw-users at listserv.fysik.dtu.dk<mailto:gpaw-users at listserv.fysik.dtu.dk>
>>>> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
> _______________________________________________
> gpaw-users mailing list
> gpaw-users at listserv.fysik.dtu.dk
> https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users