[ase-users] ase-users Digest, Vol 100, Issue 5

Fri Oct 7 01:48:25 CEST 2016

2016-10-06 15:48 GMT+02:00 John Kitchin via ase-users
<ase-users at listserv.fysik.dtu.dk>:
>
>> 2016-10-05 15:58 GMT+02:00 John Kitchin via ase-users
>> <ase-users at listserv.fysik.dtu.dk>:
>>>> Message: 1
>>>> Date: Wed, 5 Oct 2016 08:06:10 +0200
>>>> From: Jens J?rgen Mortensen <jensj at fysik.dtu.dk>
>>>> To: Ask Hjorth Larsen <asklarsen at gmail.com>,
>>>>       "ase-users at listserv.fysik.dtu.dk"       <ase-users at listserv.fysik.dtu.dk>
>>>> Subject: Re: [ase-users] 'Compatibility' of Atoms object with other
>>>>       codes
>>>> Message-ID: <479efb61-24d8-2371-8d41-87ae05824c00 at fysik.dtu.dk>
>>>> Content-Type: text/plain; charset="windows-1252"; format=flowed
>>>>
>>>> Den 03-10-2016 kl. 18:39 skrev Ask Hjorth Larsen via ase-users:
>>>>> Hello
>>>>>
>>>>> Maybe we have discussed this before, but I am not quite sure.
>>>>>
>>>>> With ASE you can call other codes, and this is never much of a problem
>>>>> when ASE defines what the code should do.  However, for reading data
>>>>> produced with other codes, the Atoms object frequently falls short on
>>>>> two counts:
>>>>>
>>>>>    1) The Atoms object always has a 3x3 cell, and many codes have
>>>>> either no cell, irregular shapes, or something else.
>>>>>    2) In ASE, each atom has a chemical symbol which corresponds
>>>>> one-to-one with an atomic number.  Many codes allows arbitrary names
>>>>> for species or something entirely different.
>>>>>
>>>>> I suggest somehow improving these two things.
>>>>>
>>>>> We could allow the 'cell' could be None.  This will undoubtedly be an
>>>>> annoyance, although only when something wrong was happening in the
>>>>> first place.  One could also define a boolean or something more
>>>>> complex to describe the cell when an array is not appropriate.
>>>>>
>>>>> An extra optional array of 'labels' could represent names when
>>>>> something more than the chemical symbol is necessary.  I know we
>>>>> already have 'tags', which are numbers.
>>>
>>> Isn't there already an info dictionary that can accommodate this?
>>>
>>> I think it is worth sketching out how this would be used. For example,
>>> are you sure an array is what you want for labels? Numpy arrays usually
>>> have just one data type in them, unless you specify the type of each
>>> element (and then you have to keep the order straight). A list could
>>> have all kinds of things in it.
>>>
>>> For example, the first result here is false because 4 is cast as a
>>> string.
>>>
>>> #+BEGIN_SRC python
>>> import numpy as np
>>>
>>> print(4 in np.array(['aa', 'b', 4]))
>>> print(4 in ['aa', 'b', 4])
>>> #+END_SRC
>>>
>>> #+RESULTS:
>>> :RESULTS:
>>> False
>>> True
>>> :END:
>>>
>>> That is an example where you might want to filter on some label
>>> criteria. Filtering can be done with tags pretty nicely too (although I
>>> always refer to
>>> http://kitchingroup.cheme.cmu.edu/dft-book/dft.html#orgheadline63
>>> Advanced Tagging) to remember how to do it!).
>>>
>>> Anyway, if labels end up being just an array or list of strings, I do not think we gain much
>>> over just the tags.
>>
>> When slicing or otherwise manipulating an Atoms object, the functions
>> can never know how to handle the contents of 'info' because this is
>> something the user decides.  But if something is a per-atom array,
>> then it will be obvious what to do with it.  Therefore, if we
>> acknowledge that it is worth being able to label atoms, we must also
>> do it with a systematic mechanism.
>
> Fair enough. the info is on the atoms, and this is a proposal to attach
> labels to an atom.
>
>>
>> A real-life application is if you are using the spacegroup module to
>> construct a system out of a list of atoms, then you want the tags and
>> associated information duplicated systematically (presently tags are
>> neglected, which is something I might fix).
>
> This should definitely be fixed.

Actually that was not correct.  After a closer look, and upon testing,
it turns out that it does copy tags, so never mind.  I think I was
passing something incorrectly when testing it before.

>
>> Else you would have to
>> use a hack.  Consider a case where we have Al and Al-2, which are
>> distinct species.  Then we substitute Pu for Al2 so they can be
>> distinguished, build the crystal, and then search for the Pu atoms to
>> identify those that are really Al-2.
>
> That is a hack for sure, and one that might not be needed if tags were
> duplicated.

Right, so we have established that such an awful hack would not be needed :)

(But if the atom labels are strings, then I will still need to build a
dictionary that associates the strings with numbers, then use the
numbers for tags, then generate the crystal, and then convert the tags
back to string labels.  This is still something I would rather not
have to do, and given that many codes allow it, I think it's best for
ASE to facilitate this.)

>
>>
>> In the above example, 'tags' would work.  But nothing is as good as
>> being able to represent the information that many other atomistic
>> codes have.
>
> I do not argue with the utility of the labels! Just what the tradeoffs
> in syntax are, if you get capability that isn't currently possible, or
> is currently hard, and how you actually use them. Your example is also like
> a filter example where you might do something like:
>
> for atom in atoms:
>     if 'Al-2' in atom.labels:
>         atom.chemical_symbol = 'Pu'
>
> It is the syntax of the second line, and what it enables that I want to
> think through.
>
> With tags it could look like:
>
> AL_2 = 1
>
> for atom in atoms:
>     if atom.tag == AL_2:
>         atom.chemical_symbol = 'Pu'
>
> I am for the idea, I just want to see how people would try to use it
> before it is implemented.

I think the most important use of this would be when writing
calculator interfaces.  If someone wants to extract an Atoms object
from the output (or input) of some code, then this work becomes easier
the more closely the Atoms object can be mapped to the output.  In bad
cases you end up with an Atoms object plus something extra which must
then be stored on the calculator.  Octopus allows arbitrarily named
"atoms" (an "atom" can even be an external potential), force field
codes seem to commonly have lots of special labels, and so on.

Best regards
Ask