[ase-users] [EXTERNAL] Potential Pubchem API Integration
Hermes, Eric
ehermes at sandia.gov
Tue Aug 20 21:57:35 CEST 2019
Thanks, this is starting to look good.
I have one suggestion though: give pubchem_search keyword arguments for the fields that it supports. Something like:
def pubchem_search(name=None, cid=None, smiles=None):
If the user provides multiple arguments, you can raise a ValueError, since it is ambiguous what the user wants in that case.
In case you haven’t already, make sure to add some tests for this new functionality.
--
Eric Hermes
Postdoctoral Researcher
Sandia National Laboratories
From: <ase-users-bounces at listserv.fysik.dtu.dk> on behalf of Ben Comer via ase-users <ase-users at listserv.fysik.dtu.dk>
Reply-To: Ben Comer <bcomer3 at gatech.edu>
Date: Tuesday, August 20, 2019 at 12:41
To: "ase-users at listserv.fysik.dtu.dk" <ase-users at listserv.fysik.dtu.dk>
Subject: Re: [ase-users] [EXTERNAL] Potential Pubchem API Integration
Hey guys,
Thanks for the great feedback. I've put together a more substantive bit of python code which I'd love feedback on before trying to merge it to see if there is anything I've over looked. I've added error messages if the names cannot be found and the ability to return all conformers for a given structure (thanks to David's code.) It should be able to handle smiles strings with the correct arguments as well.
Ask, I take your point on the user not quite knowing what will be accepted. The function I've written takes in generic "query" "field" information, so you can ask for things with "ammonia" in the field "name" or "222" in the field "CID" (which would return the same thing.) Smiles strings can also be put in. I would expect if a user wants some very complex compound they would go on PubChem and find the CID to input into the function. This appears to be most of the functionality PubChem has at the moment.
https://gitlab.com/benmcomer/ase/blob/pubchem/ase/data/pubchem.py
On 8/9/19 6:30 PM, Ollodart, David Bernd wrote:
Hello,
I've used PUGREST for getting organic molecules' structure data and making atoms objects. For molecules with many conformers the user has to select which one to take. This code is slow but may be helpful for your current needs.
<code>
from urllib.request import urlopen
import json
from ase import Atom,Atoms
from ase.visualize import view
from ase.data import chemical_symbols
def atoms_from_pubchem(name):
url='https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/'+name+'/conformers/JSON'
query=urlopen(url)
qread=query.read()
record=json.loads(qread)
conformer_ids=record['InformationList']['Information'][0]['ConformerID'] #note: cid = compound id != conformer id
conformers=[]
for cr_id in conformer_ids:
atoms=Atoms()
curl='https://pubchem.ncbi.nlm.nih.gov/rest/pug/conformers/'+cr_id+'/JSON'
b=urlopen(curl)
cr_dct=json.loads(b.read())['PC_Compounds'][0]
elements=cr_dct['atoms']['element'] #in number
xcoords=cr_dct['coords'][0]['conformers'][0]['x']
ycoords=cr_dct['coords'][0]['conformers'][0]['y']
zcoords=cr_dct['coords'][0]['conformers'][0]['z']
for i in range(len(xcoords)):
atom_position=[xcoords[i],ycoords[i],zcoords[i]]
atoms+=Atom(chemical_symbols[elements[i]],atom_position)
conformers.append(atoms)
return conformers
for molecule in ['acetaldehyde','ethanol','octane']:
conformers=atoms_from_pubchem(molecule)
print(len(conformers))
view(conformers[0])
</code>
Best,
David Ollodart
________________________________
From: ase-users-bounces at listserv.fysik.dtu.dk<mailto:ase-users-bounces at listserv.fysik.dtu.dk> <ase-users-bounces at listserv.fysik.dtu.dk><mailto:ase-users-bounces at listserv.fysik.dtu.dk> on behalf of Hermes, Eric via ase-users <ase-users at listserv.fysik.dtu.dk><mailto:ase-users at listserv.fysik.dtu.dk>
Sent: Friday, August 9, 2019 4:19:24 PM
To: Ben Comer <bcomer3 at gatech.edu><mailto:bcomer3 at gatech.edu>; ase-users at listserv.fysik.dtu.dk<mailto:ase-users at listserv.fysik.dtu.dk> <ase-users at listserv.fysik.dtu.dk><mailto:ase-users at listserv.fysik.dtu.dk>
Subject: Re: [ase-users] [EXTERNAL] Potential Pubchem API Integration
I really like this idea. Please consider creating a MR on gitlab.
I would think this functionality should go in ase.data. It looks like ase.data.isotopes does something similar with physics.nist.gov to get isotope information, so ASE connecting to NIST isn't completely unheard of.
The function should probably also return a list of Atoms objects, since pubchem will provide multiple conformers. We may also want to improve the "sdf" parser so that it stores PUBCHEM_MMFF94_PARTIAL_CHARGES in arrays['initial_charges'].
It also looks like the URL needs to have "?record_type=3d" at the end, otherwise everything gets squashed into the xy-plane.
--
Eric Hermes
Postdoctoral Researcher
Sandia National Laboratories
On 8/9/19, 13:01, "ase-users-bounces at listserv.fysik.dtu.dk on behalf of Ben Comer via ase-users"<mailto:ase-users-bounces at listserv.fysik.dtu.dkonbehalfofBenComerviaase-users> <ase-users-bounces at listserv.fysik.dtu.dk on behalf of ase-users at listserv.fysik.dtu.dk><mailto:ase-users-bounces at listserv.fysik.dtu.dkonbehalfofase-users@listserv.fysik.dtu.dk> wrote:
Hello all,
I've noticed that ASE has a very limited number of molecules that can be
easily made into an atoms object. An easy fix for this could be to
integrate a restful API to a chemical database to query and pull in
structures from the internet. Pubchem
(https://pubchem.ncbi.nlm.nih.gov/) might make sense for this, since it
PubChem<https://pubchem.ncbi.nlm.nih.gov/>
pubchem.ncbi.nlm.nih.gov
Search and explore chemical information in the world's largest freely accessible chemistry database.
does not require an API key to access. We could just use the requests
library to access molecules by name (or anything else, like cid fro
example.) Some code to access the API would be a fairly short putt, we
might make something like this:
from io import StringIO
from ase.io import read
import requests
def pubchem(name):
r =
requests.get("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{}/sdf".format(name<https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/%7b%7d/sdf%22.format(name>))
f_like = StringIO(r.text)
atoms = read(f_like, format = 'sdf')
return atoms
atoms = pubchem('ammonia')
print(atoms)
Just an idea, what do you guys think?
Thanks,
Ben Comer
_______________________________________________
ase-users mailing list
ase-users at listserv.fysik.dtu.dk<mailto:ase-users at listserv.fysik.dtu.dk>
https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
_______________________________________________
ase-users mailing list
ase-users at listserv.fysik.dtu.dk<mailto:ase-users at listserv.fysik.dtu.dk>
https://listserv.fysik.dtu.dk/mailman/listinfo/ase-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.fysik.dtu.dk/pipermail/ase-users/attachments/20190820/6d927757/attachment-0001.html>
More information about the ase-users
mailing list