Skip to content

Latest commit

 

History

History
169 lines (131 loc) · 4.75 KB

README.md

File metadata and controls

169 lines (131 loc) · 4.75 KB

PubChemQuery

Downloads PyPI Python Version License Open In Colab

PubChemQuery: A Python Package for Accessing Chemical Information from PubChem.

PubChemQuery is a Python package that provides a simple and intuitive API for retrieving chemical information from the PubChem database. With this package, you can easily fetch chemical data, including:

  • CID (Compound ID) by name
  • All CIDs by name
  • 2D images by CID or name
  • SDF (Structure Data File) by CID or name
  • Compound properties, including:
    • Molecular formula and weight
    • SMILES and InChI representations
    • IUPAC name and title
    • Physicochemical properties (e.g., XLogP, exact mass, TPSA)
    • Structural features (e.g., bond and atom counts, stereochemistry)
    • 3D properties (e.g., volume, steric quadrupole moments, feature counts)
    • Fingerprint and conformer information

The package offers a straightforward interface, allowing users to access PubChem data with minimal code. Whether you're a chemist, researcher, or developer, PubChemQuery simplifies the process of integrating chemical information into your projects.

Key Features:

Retrieve chemical data by name or CID Access 2D images and SDF files Get compound properties, including physicochemical, structural, and 3D features Easy-to-use API with minimal code required

Simple and Concise API:

There are functions that perform all of the above-mentioned tasks, making it easy to integrate PubChem data into your projects:

  • get_cid_by_inchi(inchi): Get a CID by InChI
  • get_cids_by_formula(formula): Get CIDs by formula
  • get_cid_by_name(name): Get CID by name
  • get_cids_by_name(name): Get all CIDs by name
  • get_image_by_cid(cid): Get 2D image by CID
  • get_image_by_name(name): Get 2D image by name
  • get_image_by_inchi(inchi): Get 2D image by InChI
  • get_structure_by_cid(cid): Get SDF by CID
  • get_structure_by_name(name): Get SDF by name
  • get_similar_structures_cids_by_compound_id(cid/SMILES/InChI): Get similar structures CIDs by cid, SMILES, InChI

Compound Object: The package also includes a Compound object that encapsulates the retrieved data, providing a convenient way to access and manipulate the data.

  • compound(cid_or_name): Create a compound object with properties and methods

Getting Started:

To use PubChemQuery, simply install the package and import it into your Python script. Refer to the example code snippets above for a quick start.

Installation

Install PubChemQuery with pip

  pip install PubChemQuery

Examples

Import package as:

import pubchemquery as pcq

Use the functions to retrieve data:

# get a cid by formula
cid = pcq.get_cids_by_formula('C6H6')
print(type(cid), len(cid))
# get a cid by inchi
cid = pcq.get_cid_by_inchi(
    'InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H')
print(cid)
# get a cid by name
cid = pcq.get_cid_by_name('benzene')
print(cid)
# get all cids by name
cids = pcq.get_cids_by_name('benzene')
print(type(cids), len(cids))
# get 2d image
# by cid
image = pcq.get_image_by_cid('241')
image

# by name
image = pcq.get_image_by_name('benzene')
image

# by inchi
image = pcq.get_image_by_inchi(
    'InChI=1S/C6H5NO3/c8-6-3-1-5(2-4-6)7(9)10/h1-4,8H')
print(image)
# get sdf by cid
sdf = pcq.get_structure_by_cid('241')
print(sdf)
# get sdf by name
sdf = pcq.get_structure_by_name('benzene')
print(sdf)
# get similar structure cids by cid
# cids = pcq.get_similar_structures_cids_by_compound_id('241')
# cids = pcq.get_similar_structures_cids_by_compound_id(
#     'C1=CC=CC=C1', compound_id='SMILES')
cids = pcq.get_similar_structures_cids_by_compound_id(
    'InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H', compound_id='InChI')
print(type(cids), len(cids))

Make a compound and then get its properties:

# make a compound
cid = 2244
# compound = pcq.compound(cid)
# name
name = '2-acetyloxybenzoic acid'
compound = pcq.compound(name)
print(compound)
# properties
# InChI
print(compound.InChI)
# InChIKey
print(compound.InChIKey)
# IUPACName
print(compound.IUPACName)
# similar structure cids
print(len(compound.similar_structure_cids))
# image
compound.image
# dataframe
compound.prop_df()

FAQ

For any question, contact me on LinkedIn

Authors