[ase-users] Clarifications on genetic algorithm

Sun Dec 4 01:49:43 CET 2016

Thank you Esben.

I now get correct energies in the database file and GA run looks okay I guess. All I did was I commented out the "break" within read_energy() in ase/calculators/nwchem.py.

I'll try what you suggested: try to load in the relaxed trajectory file after it is relaxed as “a”/”a3” and the set the  raw_score and then add it to the db.

Thanks again.

Best,
Satish
________________________________
From: Esben Leonhard Kolsbjerg <esb at phys.au.dk>
Sent: Thursday, December 1, 2016 11:28:52 AM
To: Iyemperumal, Satish Kumar; Kondov, Ivan (SCC); ase-users at listserv.fysik.dtu.dk
Subject: Re: Clarifications on genetic algorithm

You are, as you say, always optimizing the raw score and you have to manually set the raw_score. I have personally never optimized anything but the calculated energy but this should really not be an issue.

When you call “get_all_relaxed_candidates” it will return list of all the relaxed candidates sorted for the raw_score.

What I don’t understand is why the raw score and the energy isn’t the same as you (I guess) just sat the energy manually in the calculator to the exact same value as the raw_score.

To answer your question! No, I don’t think it is an issure!

(Now I’m speculating!)
If the wrong files are saved in the database I think the error must come from the fact that the atoms object normally called “a”/”a3” are not updated in the script you are running. If the relaxation happens outside of the ga-script this could explain both why the wrong energy and structures are not displayed in the database. Could you maybe as an ugly hack (but maybe prettier than what you are doing now) try to load in the relaxed trajectory file after it is relaxed as “a”/”a3” and the set the  raw_score and then add it to the db?
I don’t totally get your relaxation procedure but I don’t think it is a standard way and this might be why.

Esben

Fra: "Iyemperumal, Satish Kumar" <siyemperumal at wpi.edu>
Dato: onsdag den 30. november 2016 kl. 16.50
Til: Esben Leonhard Kolsbjerg <esb at phys.au.dk>, "Kondov, Ivan (SCC)" <ivan.kondov at kit.edu>, "ase-users at listserv.fysik.dtu.dk" <ase-users at listserv.fysik.dtu.dk>
Emne: Re: Clarifications on genetic algorithm

Thanks again for your input, Esben.

The code does work. However, it leads to another question as you indicated. Does changing the (1) energy as a.calc.results['energy']=actual_energy*Hartree and (2) manually defining what raw_score is by setting a.info['key_value_pairs']['raw_score'] = -actual_energy * Hartree affect GA procedure.

I do not think (1) will be affected since, as you also mentioned earlier that in GA, it is the raw score that is being maximized.

In the case of (2), I am manually defining the correct raw scores within the GA script. This however, leads to all_candidates.traj file being generated that is not sorted from low to high energy structures (see attached for example traj file). Moreover, the geometries that are in all_candidates.traj file are not the final ground state geometries (not sure if they are the first ionic step geometries for each relaxed candidate).  This makes me think that setting raw_scores perhaps affects the output files like all_candidates.traj generated by ASE. Anyways, due to this trouble, I manually save all the relaxed candidate structures from my NWChem run in separate directories. Finally, for getting the correct structure and energy, I just loop over these NWChem directories to read the correct results instead of postprocessing from ASE generated output files like all_candidates.traj and gadb.db.

Do you think this approach is reliable? or do you think changing raw scores inherently messes something up within the GA procedure itself?

Thank you for your time again!

Best,
Satish
________________________________
From: Esben Leonhard Kolsbjerg <esb at phys.au.dk>
Sent: Wednesday, November 30, 2016 5:09:04 AM
To: Iyemperumal, Satish Kumar; Kondov, Ivan (SCC); ase-users at listserv.fysik.dtu.dk
Subject: Re: Clarifications on genetic algorithm

I’m not an expert in this database thing. But I took a look in the ase.db.core and as I see it you are not allowed through ase to update values like the energy in a database-file in a row already there.

So
con = connect('gadb.db')
con.update(ids=53, **{'energy':100})

will give you:

ValueError: Bad key: energy

What you might be able to do is directly set “new” results in the calculator attached to the atoms object.

It will then look something like this:
...
    with open('in.out') as f:
      for line in f:
        line = line.rstrip()
        if re.search('Total.DFT', line):
          actual_energy = float(line.strip().split()[-1])
    a.info['key_value_pairs']['raw_score'] = -actual_energy * Hartree
    a.calc.results['energy'] = -actual_energy * Hartree                           # setting the actual calculated property in the results dict of the calculator
    da.add_relaxed_step(a)
...

Be aware that this hack might course you trouble elsewhere.

I just ran a GA with the EMT potential where I think what you are looking for are working.

Hope it helps?

Esben

Fra: "Iyemperumal, Satish Kumar" <siyemperumal at wpi.edu>
Dato: mandag den 28. november 2016 kl. 02.15
Til: "Kondov, Ivan (SCC)" <ivan.kondov at kit.edu>, Esben Leonhard Kolsbjerg <esb at phys.au.dk>, "ase-users at listserv.fysik.dtu.dk" <ase-users at listserv.fysik.dtu.dk>
Emne: Re: Clarifications on genetic algorithm

I was able to read in the right energies from the output file of nwchem calculations as the raw_scores in the genetic algorithm. But I also wanted to display the correct energy when I enter "ase-db gadb.db", which prints out id, age, user...magmom. To change raw_scores to read correct values, I did

...

    with open('in.out') as f:

      for line in f:

        line = line.rstrip()

        if re.search('Total.DFT', line):

          actual_energy = float(line.strip().split()[-1])

    a.info['key_value_pairs']['raw_score'] = -actual_energy * Hartree

    #a.info['energy'] = -actual_energy * Hartree

    da.add_relaxed_step(a)

...

But, when I say a.info['energy'] = -actual_energy * Hartree, the energy is not updated by the actual_energy. I tried ase.db.connect but I could not find a way to modify the energy entry in gadb.db by the actual_energy values. How can I fix this?

Thank you for your time on this. Complete updated script is attached.

Best,
Satish
________________________________
From: Kondov, Ivan (SCC) <ivan.kondov at kit.edu>
Sent: Tuesday, November 22, 2016 3:19:09 AM
To: Iyemperumal, Satish Kumar; Esben Leonhard Kolsbjerg; ase-users at listserv.fysik.dtu.dk
Subject: RE: Clarifications on genetic algorithm

Dear all,

> 2) I think I found the reason. get_potential_energy() gives the energy
corresponding to task=energy. So even if I set task=optimize, the energy
corresponds to the energy after first ionic step. I can read the correct
energy corresponding to task=optimize
> using some readlines() method, but I wanted to know if just updating
raw_score of a row in the database file would be sufficient to make a GA run
as follows.

Apropos: task='optimize' with NWChem does not work properly. See my post
from 17 October 2016 on this list:

https://listserv.fysik.dtu.dk/pipermail/ase-users/2016-October/003187.html

Meanwhile I have a local fix which I am testing in operation and will soon
make a merge request to include it.

Best regards,
Ivan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/ase-users/attachments/20161204/62cd7e77/attachment-0001.html>