PubChemQuery: A Python Package for Accessing Chemical Information from PubChem.
PubChemQuery is a Python package that provides a simple and intuitive API for retrieving chemical information from the PubChem database. With this package, you can easily fetch chemical data, including:
- CID (Compound ID) by name
- All CIDs by name
- 2D images by CID or name
- SDF (Structure Data File) by CID or name
- Compound properties, including:
- Molecular formula and weight
- SMILES and InChI representations
- IUPAC name and title
- Physicochemical properties (e.g., XLogP, exact mass, TPSA)
- Structural features (e.g., bond and atom counts, stereochemistry)
- 3D properties (e.g., volume, steric quadrupole moments, feature counts)
- Fingerprint and conformer information
The package offers a straightforward interface, allowing users to access PubChem data with minimal code. Whether you're a chemist, researcher, or developer, PubChemQuery simplifies the process of integrating chemical information into your projects.
Key Features:
Retrieve chemical data by name or CID Access 2D images and SDF files Get compound properties, including physicochemical, structural, and 3D features Easy-to-use API with minimal code required
Simple and Concise API:
There are functions that perform all of the above-mentioned tasks, making it easy to integrate PubChem data into your projects:
get_cid_by_inchi(inchi)
: Get a CID by InChIget_cids_by_formula(formula)
: Get CIDs by formulaget_cid_by_name(name)
: Get CID by nameget_cids_by_name(name)
: Get all CIDs by nameget_image_by_cid(cid)
: Get 2D image by CIDget_image_by_name(name)
: Get 2D image by nameget_image_by_inchi(inchi)
: Get 2D image by InChIget_structure_by_cid(cid)
: Get SDF by CIDget_structure_by_name(name)
: Get SDF by nameget_similar_structures_cids_by_compound_id(cid/SMILES/InChI)
: Get similar structures CIDs by cid, SMILES, InChI
Compound Object:
The package also includes a Compound
object that encapsulates the retrieved data, providing a convenient way
to access and manipulate the data.
compound(cid_or_name)
: Create a compound object with properties and methods
Getting Started:
To use PubChemQuery, simply install the package and import it into your Python script. Refer to the example code snippets above for a quick start.
Install PubChemQuery with pip
pip install PubChemQuery
Import package as:
import pubchemquery as pcq
Use the functions to retrieve data:
# get a cid by formula
cid = pcq.get_cids_by_formula('C6H6')
print(type(cid), len(cid))
# get a cid by inchi
cid = pcq.get_cid_by_inchi(
'InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H')
print(cid)
# get a cid by name
cid = pcq.get_cid_by_name('benzene')
print(cid)
# get all cids by name
cids = pcq.get_cids_by_name('benzene')
print(type(cids), len(cids))
# get 2d image
# by cid
image = pcq.get_image_by_cid('241')
image
# by name
image = pcq.get_image_by_name('benzene')
image
# by inchi
image = pcq.get_image_by_inchi(
'InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H')
print(image)
# get sdf by cid
sdf = pcq.get_structure_by_cid('241')
print(sdf)
# get sdf by name
sdf = pcq.get_structure_by_name('benzene')
print(sdf)
# get similar structure cids by cid
# cids = pcq.get_similar_structures_cids_by_compound_id('241')
# cids = pcq.get_similar_structures_cids_by_compound_id(
# 'C1=CC=CC=C1', compound_id='SMILES')
cids = pcq.get_similar_structures_cids_by_compound_id(
'InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H', compound_id='InChI')
print(type(cids), len(cids))
Make a compound and then get its properties:
# make a compound
cid = 2244
# compound = pcq.compound(cid)
# name
name = '2-acetyloxybenzoic acid'
compound = pcq.compound(name)
print(compound)
# properties
# InChI
print(compound.InChI)
# InChIKey
print(compound.InChIKey)
# IUPACName
print(compound.IUPACName)
# similar structure cids
print(len(compound.similar_structure_cids))
# image
compound.image
# dataframe
compound.prop_df()
For any question, contact me on LinkedIn