[ase-users] [EXTERNAL] Potential Pubchem API Integration

Ollodart, David Bernd davidbo2 at illinois.edu
Sat Aug 10 00:30:19 CEST 2019


Hello,


I've used PUGREST for getting organic molecules' structure data and making atoms objects. For molecules with many conformers the user has to select which one to take. This code is slow but may be helpful for your current needs.


<code>

from urllib.request import urlopen
import json
from ase import Atom,Atoms
from ase.visualize import view
from ase.data import chemical_symbols

def atoms_from_pubchem(name):
    url='https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/'+name+'/conformers/JSON'
    query=urlopen(url)
    qread=query.read()
    record=json.loads(qread)
    conformer_ids=record['InformationList']['Information'][0]['ConformerID'] #note: cid = compound id != conformer id
    conformers=[]
    for cr_id in conformer_ids:
        atoms=Atoms()
        curl='https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/'+cr_id+'/JSON'
        b=urlopen(curl)
        cr_dct=json.loads(b.read())['PC_Compounds'][0]
        elements=cr_dct['atoms']['element'] #in number
        xcoords=cr_dct['coords'][0]['conformers'][0]['x']
        ycoords=cr_dct['coords'][0]['conformers'][0]['y']
        zcoords=cr_dct['coords'][0]['conformers'][0]['z']
        for i in range(len(xcoords)):
            atom_position=[xcoords[i],ycoords[i],zcoords[i]]
            atoms+=Atom(chemical_symbols[elements[i]],atom_position)
        conformers.append(atoms)

    return conformers

for molecule in ['acetaldehyde','ethanol','octane']:
    conformers=atoms_from_pubchem(molecule)
    print(len(conformers))
    view(conformers[0])

</code>

Best,

David Ollodart



________________________________
From: ase-users-bounces at listserv.fysik.dtu.dk <ase-users-bounces at listserv.fysik.dtu.dk> on behalf of Hermes, Eric via ase-users <ase-users at listserv.fysik.dtu.dk>
Sent: Friday, August 9, 2019 4:19:24 PM
To: Ben Comer <bcomer3 at gatech.edu>; ase-users at listserv.fysik.dtu.dk <ase-users at listserv.fysik.dtu.dk>
Subject: Re: [ase-users] [EXTERNAL] Potential Pubchem API Integration

I really like this idea. Please consider creating a MR on gitlab.

I would think this functionality should go in ase.data. It looks like ase.data.isotopes does something similar with physics.nist.gov to get isotope information, so ASE connecting to NIST isn't completely unheard of.

The function should probably also return a list of Atoms objects, since pubchem will provide multiple conformers. We may also want to improve the "sdf" parser so that it stores PUBCHEM_MMFF94_PARTIAL_CHARGES in arrays['initial_charges'].

It also looks like the URL needs to have "?record_type=3d" at the end, otherwise everything gets squashed into the xy-plane.
--
Eric Hermes
Postdoctoral Researcher
Sandia National Laboratories

On 8/9/19, 13:01, "ase-users-bounces at listserv.fysik.dtu.dk on behalf of Ben Comer via ase-users" <ase-users-bounces at listserv.fysik.dtu.dk on behalf of ase-users at listserv.fysik.dtu.dk> wrote:

    Hello all,

    I've noticed that ASE has a very limited number of molecules that can be
    easily made into an atoms object. An easy fix for this could be to
    integrate a restful API to a chemical database to query and pull in
    structures from the internet. Pubchem
    (https://pubchem.ncbi.nlm.nih.gov/) might make sense for this, since it
PubChem<https://pubchem.ncbi.nlm.nih.gov/>
pubchem.ncbi.nlm.nih.gov
Search and explore chemical information in the world's largest freely accessible chemistry database.

    does not require an API key to access. We could just use the requests
    library to access molecules by name (or anything else, like cid fro
    example.) Some code to access the API would be a fairly short putt, we
    might make something like this:

    from io import StringIO
    from ase.io import read
    import requests


    def pubchem(name):
         r =
    requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{}/sdf".format(name))
         f_like = StringIO(r.text)
         atoms = read(f_like, format = 'sdf')
         return atoms

    atoms = pubchem('ammonia')
    print(atoms)


    Just an idea, what do you guys think?


    Thanks,

    Ben Comer


    _______________________________________________
    ase-users mailing list
    ase-users at listserv.fysik.dtu.dk
    https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users


_______________________________________________
ase-users mailing list
ase-users at listserv.fysik.dtu.dk
https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/ase-users/attachments/20190809/efe2628e/attachment-0001.html>


More information about the ase-users mailing list