-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
18 changed files
with
172 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,88 @@ | ||
# Python Summer School 2024 | ||
|
||
## Substructure search | ||
|
||
Substructure search of chemical compounds is a crucial tool in cheminformatics, enabling researchers to identify and analyze chemical structures containing specific substructures. This method is widely applied in various fields of chemistry, including drug discovery, materials science, and environmental research. Substructure search helps scientists and engineers identify compounds with desired properties, predict reactivity, and understand the mechanisms of chemical reactions. | ||
|
||
Modern chemical compound databases contain millions of entries, making traditional search methods inefficient and time-consuming. Substructure search utilizes algorithms that allow for quick and accurate identification of compounds with specified structural fragments. These algorithms are based on graph theory and the use of SMARTS (SMiles ARbitrary Target Specification) codes, ensuring high performance and precision in the search process. | ||
|
||
## SMILES | ||
|
||
A key element in the representation of chemical structures is the Simplified Molecular Input Line Entry System (SMILES). SMILES is a notation that allows a user to represent a chemical structure in a way that can be easily processed by computers. It encodes molecular structures as a series of text strings, which can then be used for various computational analyses, including substructure searches. The simplicity and efficiency of SMILES make it a widely adopted standard in cheminformatics. | ||
|
||
Here are some examples of SMILES notation: | ||
|
||
- Water (H₂O): O | ||
|
||
- Methane (CH₄): C | ||
|
||
- Ethanol (C₂H₅OH): CCO | ||
|
||
- Benzene (C₆H₆): c1ccccc1 | ||
|
||
- Acetic acid (CH₃COOH): CC(=O)O | ||
|
||
- Aspirin (C₉H₈O₄): CC(=O)Oc1ccccc1C(=O)O | ||
|
||
## Examples of Substructure Search | ||
|
||
|
||
Below are some examples of substructure searches with visual representations: | ||
|
||
1. Searching for the Benzene Ring: | ||
- Substructure (Benzene): c1ccccc1 | ||
|
||
<img title="a title" alt="Alt text" src="./images/c1ccccc1.png"> | ||
|
||
- Example of Found Compound (Toluene): Cc1ccccc1 | ||
<img title="a title" alt="Alt text" src="./images/Cc1ccccc1.png"> | ||
- | ||
2. Searching for a Carboxylic Acid Group: | ||
- Substructure (Carboxylic Acid): C(=O)O | ||
<img title="a title" alt="Alt text" src="./images/C(=O)O.png"> | ||
- Example of Found Compound (Acetic Acid): CC(=O)O | ||
<img title="a title" alt="Alt text" src="./images/CC(=O)O.png"> | ||
|
||
These examples illustrate how substructure searches can be used to find compounds containing specific functional groups or structural motifs. By using SMILES notation and cheminformatics tools, researchers can efficiently identify and study compounds of interest. | ||
|
||
## Homework | ||
|
||
As part of our homeworks, we will try to build a web service for storing and substructural search of chemical compounds | ||
|
||
- Use the **RDKit** library to implement substructure search | ||
|
||
<img title="a title" alt="Alt text" src="./images/1.png"> | ||
|
||
- Build RESTful API using **FastApi** | ||
|
||
<img title="a title" alt="Alt text" src="./images/2.png"> | ||
|
||
- Containerizing our solution using **Docker** | ||
|
||
<img title="a title" alt="Alt text" src="./images/3.png"> | ||
|
||
- Adding tests using **pytest** | ||
|
||
<img title="a title" alt="Alt text" src="./images/4.png"> | ||
|
||
- **CI / CD** | ||
|
||
<img title="a title" alt="Alt text" src="./images/5.png"> | ||
|
||
- We will add a **database** for storing molecules | ||
|
||
<img title="a title" alt="Alt text" src="./images/6.png"> | ||
|
||
- **Logging** | ||
|
||
<img title="a title" alt="Alt text" src="./images/7.png"> | ||
|
||
- We will add caching using **Redis** to optimize queries | ||
|
||
<img title="a title" alt="Alt text" src="./images/8.png"> | ||
|
||
- **Celery** to speed up queries | ||
<img title="a title" alt="Alt text" src="./images/9.png"> | ||
|
||
Every week we will add tasks to the folder **hw**. | ||
Delivery details are available in the **README.md** in the **hw** |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Homework 4 | ||
## Introduction to RDKit | ||
|
||
RDKit is an open-source cheminformatics software library that provides a wide range of tools for working with chemical informatics, particularly focusing on molecule representation, manipulation, and visualization. It is widely used in both academia and industry for tasks such as molecular modeling, chemical database searching, and molecular property prediction. One of RDKit's key strengths is its ability to seamlessly integrate with Python, making it accessible and flexible for a variety of applications. | ||
|
||
In this introduction, we will explore the basic functionalities of RDKit, focusing on how to represent, manipulate, and visualize chemical structures using Python. We will cover the following topics: | ||
|
||
1. **Installation of RDKit** | ||
2. **Basic molecular representations** | ||
3. **Substructure search** | ||
4. **Molecular visualization** | ||
|
||
### 1. Installation of RDKit | ||
|
||
To install [RDKit](https://www.rdkit.org/docs/Install.html), you can use the following commands in your terminal. It's recommended to use a conda environment to manage dependencies easily: | ||
|
||
```sh | ||
conda create -c conda-forge -n my-rdkit-env rdkit | ||
conda activate my-rdkit-env | ||
``` | ||
|
||
### 2. Basic Molecular Representations | ||
|
||
RDKit allows you to create and manipulate molecular structures easily. Here's how you can create a molecule from a SMILES string and visualize it: | ||
|
||
```python | ||
from rdkit import Chem | ||
from rdkit.Chem import Draw | ||
|
||
# Create a molecule from a SMILES string | ||
smiles = "CCO" # Ethanol | ||
molecule = Chem.MolFromSmiles(smiles) | ||
|
||
# Draw the molecule | ||
Draw.MolToImage(molecule) | ||
``` | ||
|
||
### 3. Substructure Search | ||
|
||
RDKit can be used to perform substructure searches, identifying specific functional groups or substructures within a molecule. | ||
|
||
```python | ||
from rdkit import Chem | ||
|
||
# Define the molecule and the substructure to search for | ||
benzene = Chem.MolFromSmiles("c1ccccc1") | ||
ethanol = Chem.MolFromSmiles("CCO") | ||
|
||
# Perform the substructure search | ||
match = ethanol.HasSubstructMatch(benzene) | ||
print("Benzene ring found in ethanol:", match) | ||
``` | ||
|
||
### 4. Molecular Visualization | ||
|
||
RDKit provides several options for visualizing molecules. You can visualize individual molecules or draw multiple molecules in a grid. | ||
|
||
```python | ||
from rdkit.Chem import Draw | ||
|
||
# Create a list of molecules | ||
smiles_list = ["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"] | ||
molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list] | ||
|
||
# Draw the molecules in a grid | ||
img = Draw.MolsToGridImage(molecules, molsPerRow=2, subImgSize=(200, 200), returnPNG=False) | ||
img.show() | ||
``` | ||
|
||
## Substructure search | ||
Now you have everything to implement the substructure search function: | ||
|
||
<img title="a title" alt="Alt text" src="../../images/1.png"> | ||
|
||
Implement a function in file /src/main.py that takes two arguments: | ||
1. Set of molecules | ||
2. A molecule by which we will filter molecules from the set based on the property of inclusion of one molecule in another | ||
|
||
```python | ||
substructure_search(["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"], "c1ccccc1") | ||
["c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Homework |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
def substructure_search(mols, mol): | ||
pass |