[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?

Ask Hjorth Larsen asklarsen at gmail.com
Thu Aug 9 17:06:05 CEST 2018


Hello,

2018-08-09 5:31 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
> Hi both,
>> On 08/09/2018 05:46 AM, Ask Hjorth Larsen via gpaw-users wrote:
>> > Hello,
>> >
>> > 2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
(....)
>> I think switching to mpi4py would be difficult and, as Ask
>> mentioned,
>> not so nice for users.  And our C-extension still needs to call MPI
>> functions.
>
> Let's forget about the switch then: I get your point.
>
> Ask, I also get that you don't see any compelling reasons to separate
> those things, so let's put that on hold for now.
>
> Let's assume I'm restricting myself to having a fully working parallel
> _gpaw.so version. Nothing more (I don't plan to touch `gpaw-python` at
> all). For that I need a single point of entry to MPI from Python. Why?
> Because it is simpler for both users and developers (i.e. single way of
> doing things, single place to look at, single place to update) AND
> because that would allow us to provide the same guarantees as `gpaw-
> python`.
>

+1, I hope it is not too difficult

> That point of entry could check whether MPI is already initialized and
> raise a suitable exception if that's the case: that way, if no
> exception is raised, we know we are in control, just like in `gpaw-
> python`. The user could still load mpi4py after the fact and meddle
> with MPI but so can he within `gpaw-python`...

What do you think about having a Python-level 'gpaw' subcommand in
which we manage our parallelization?

What matters is that we control the preemtive imports and code
initialization.  When we don't, the user may do things in any order
and it becomes very difficult.

>
>
> I get that ASE needs to know whether it's running in parallel but does
> not know what program it's going to use. There are 3 obvious solutions
> to that:
>  * make a separate MPI communicator subproject that implements the
>    required interface (that would be reimplementing mpi4py) or
>    alternatively, migrate mpi.c to ase since ase needs to know about
>    MPI! (I know this is not really a solution but this is what makes
>    sense)
>  * try to load a working communicator implementation from well-known
>    compiled modules such as gpaw, asap, etc. (as long as the interface
>    is identical, it wouldn't change anything...)

I prefer/suggest that the program (gpaw, asap) knows what it wants and
tells ASE what it knows about the runtime.  It is more explicit.

>  * Make a modifiable ase.parallel.world and add a registration
>    mechanism for gpaw to declare the existence of its MPI
>    implementation (Jens Jørgen's suggestion).
>
>
> Assuming I follow the last route, what exactly would pose problem in
> ASE?
>
> The static rank numbers could become a Rank object instead with a
> is_master() and is_slave() methods: that would seem to solve ~95% of
> the use cases in ASE (from a quick grep).
>
> There doesn't seem to be any static construct which construction we
> can't postpone until just before the calculation. Actually,
> ase.parallel.world could itself be a smart object so that local
> `self.world` copies are still up to date.

We can capture all we need at a single point: startup.  This is
fundamentally simpler than needing things to initialize correctly at
different points in the code.  The user can always somehow resolve a
rank into an integer or boolean which later could become out-of-sync
if parallel initialization can happen later.

I have fought circular communicator imports and speculative attempts
at initializing things on a few different occasions.  That's why I am
so much in favour of initializing at startup level and ensuring a
layer (gpaw-python or gpaw subcommand) that *we* control.

Best regards
Ask

>
> Gaël
>
>>
>> Jens Jørgen
>>
>> > So I'd say the (MPI-based) parallelism must be completely
>> > determined
>> > when the program starts, and definitely before any line written by
>> > the
>> > user is executed.
>> >
>> > Best regards
>> > Ask
>> >
>> > > Gaël
>> > >
>> > > > Best regards
>> > > > Ask
>> > > >
>> > > > > Gaël
>> > > > >
>> > > > > _______________________________________________
>> > > > > gpaw-users mailing list
>> > > > > gpaw-users at listserv.fysik.dtu.dk<mailto:
>> > > > > gpaw-users at listserv.fysik.dtu.dk>
>> > > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> >
>> > _______________________________________________
>> > gpaw-users mailing list
>> > gpaw-users at listserv.fysik.dtu.dk
>> > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>>
>>
>



More information about the gpaw-users mailing list