[gpaw-users] Different MPI worlds ASE vs. GPAW: small fix, big fix or migrating to mpi4py?
Ask Hjorth Larsen
asklarsen at gmail.com
Thu Aug 9 18:31:22 CEST 2018
Hi,
I think I agree with all of the below now (sorry for top-posting :)).
Thank you.
GPAW and ASAP MPI objects are probably almost compatible since they
are almost copies. I think I have run ASAP calculations with
gpaw-python at some point. But nothing guarantees that it still
works.
It will definitely be very good to have a way to guarantee that they
can work together, and also alongside e.g. mpi4py.
FWIW I would be in favour of having a communicator object whose main
interface is written in Python. Things become easier to work with.
The debug mode communicator interface accomplishes exactly this, but
it is usually not instantiated at all, and therefore one does not gain
any maintenance benefits from implementing stuff on it.
Best regards
Ask
2018-08-09 11:01 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
> On Thu, 2018-08-09 at 10:06 -0500, Ask Hjorth Larsen wrote:
>> Hello,
>>
>> 2018-08-09 5:31 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
>> > Hi both,
>> > > On 08/09/2018 05:46 AM, Ask Hjorth Larsen via gpaw-users wrote:
>> > > > Hello,
>> > > >
>> > > > 2018-08-08 13:27 GMT-05:00 Gaël Donval <G.Donval at bath.ac.uk>:
>>
>> (....)
>> > > I think switching to mpi4py would be difficult and, as Ask
>> > > mentioned,
>> > > not so nice for users. And our C-extension still needs to call
>> > > MPI
>> > > functions.
>> >
>> > Let's forget about the switch then: I get your point.
>> >
>> > Ask, I also get that you don't see any compelling reasons to
>> > separate
>> > those things, so let's put that on hold for now.
>> >
>> > Let's assume I'm restricting myself to having a fully working
>> > parallel
>> > _gpaw.so version. Nothing more (I don't plan to touch `gpaw-python`
>> > at
>> > all). For that I need a single point of entry to MPI from Python.
>> > Why?
>> > Because it is simpler for both users and developers (i.e. single
>> > way of
>> > doing things, single place to look at, single place to update) AND
>> > because that would allow us to provide the same guarantees as
>> > `gpaw-
>> > python`.
>> >
>>
>> +1, I hope it is not too difficult
>
> So do I...
>
>
>>
>> > That point of entry could check whether MPI is already initialized
>> > and
>> > raise a suitable exception if that's the case: that way, if no
>> > exception is raised, we know we are in control, just like in `gpaw-
>> > python`. The user could still load mpi4py after the fact and meddle
>> > with MPI but so can he within `gpaw-python`...
>>
>> What do you think about having a Python-level 'gpaw' subcommand in
>> which we manage our parallelization?
>
> Something along the lines of the following?
>
> mpiexec python -m gpaw blah.py
>
> A "compatibility" `gpaw-python` script doing just that could also be
> provided.
>
> If so, I don't see why not.
>
>
>>
>> What matters is that we control the preemtive imports and code
>> initialization. When we don't, the user may do things in any order
>> and it becomes very difficult.
>>
>> >
>> >
>> > I get that ASE needs to know whether it's running in parallel but
>> > does
>> > not know what program it's going to use. There are 3 obvious
>> > solutions
>> > to that:
>> > * make a separate MPI communicator subproject that implements the
>> > required interface (that would be reimplementing mpi4py) or
>> > alternatively, migrate mpi.c to ase since ase needs to know
>> > about
>> > MPI! (I know this is not really a solution but this is what
>> > makes
>> > sense)
>> > * try to load a working communicator implementation from well-
>> > known
>> > compiled modules such as gpaw, asap, etc. (as long as the
>> > interface
>> > is identical, it wouldn't change anything...)
>>
>> I prefer/suggest that the program (gpaw, asap) knows what it wants
>> and
>> tells ASE what it knows about the runtime. It is more explicit.
>
> I agree. How would you handle scripts using both GPAW and ASAP?
>
>>
>> > * Make a modifiable ase.parallel.world and add a registration
>> > mechanism for gpaw to declare the existence of its MPI
>> > implementation (Jens Jørgen's suggestion).
>> >
>> >
>> > Assuming I follow the last route, what exactly would pose problem
>> > in
>> > ASE?
>> >
>> > The static rank numbers could become a Rank object instead with a
>> > is_master() and is_slave() methods: that would seem to solve ~95%
>> > of
>> > the use cases in ASE (from a quick grep).
>> >
>> > There doesn't seem to be any static construct which construction we
>> > can't postpone until just before the calculation. Actually,
>> > ase.parallel.world could itself be a smart object so that local
>> > `self.world` copies are still up to date.
>>
>> We can capture all we need at a single point: startup. This is
>> fundamentally simpler than needing things to initialize correctly at
>> different points in the code. The user can always somehow resolve a
>> rank into an integer or boolean which later could become out-of-sync
>> if parallel initialization can happen later.
>
> I agree: it could be done in gpaw.__main__ then, guaranteeing a frozen
> world from the very start. It's also very explicit and versatile: if
> you want to do something else, then don't use `python -m gpaw`... I
> quite like that approach.
>
> I need to think about how it can be made. If you also have ideas about
> how to get rid of ranks altogether, I'm all for.
>
>>
>> I have fought circular communicator imports and speculative attempts
>> at initializing things on a few different occasions. That's why I am
>> so much in favour of initializing at startup level and ensuring a
>> layer (gpaw-python or gpaw subcommand) that *we* control.
>
> I'll work with that in mind.
>
> Thanks both for your input.
> Gaël
>
>>
>> Best regards
>> Ask
>>
>> >
>> > Gaël
>> >
>> > >
>> > > Jens Jørgen
>> > >
>> > > > So I'd say the (MPI-based) parallelism must be completely
>> > > > determined
>> > > > when the program starts, and definitely before any line written
>> > > > by
>> > > > the
>> > > > user is executed.
>> > > >
>> > > > Best regards
>> > > > Ask
>> > > >
>> > > > > Gaël
>> > > > >
>> > > > > > Best regards
>> > > > > > Ask
>> > > > > >
>> > > > > > > Gaël
>> > > > > > >
>> > > > > > > _______________________________________________
>> > > > > > > gpaw-users mailing list
>> > > > > > > gpaw-users at listserv.fysik.dtu.dk<mailto:
>> > > > > > > gpaw-users at listserv.fysik.dtu.dk>
>> > > > > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> > > >
>> > > > _______________________________________________
>> > > > gpaw-users mailing list
>> > > > gpaw-users at listserv.fysik.dtu.dk
>> > > > https://listserv.fysik.dtu.dk/mailman/listinfo/gpaw-users
>> > >
>> > >
>
More information about the gpaw-users
mailing list