Skip to content

Commit 02d3e8a

Browse files
committed
first hw
1 parent a77013c commit 02d3e8a

File tree

18 files changed

+172
-0
lines changed

18 files changed

+172
-0
lines changed

README.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,88 @@
11
# Python Summer School 2024
2+
3+
## Substructure search
4+
5+
Substructure search of chemical compounds is a crucial tool in cheminformatics, enabling researchers to identify and analyze chemical structures containing specific substructures. This method is widely applied in various fields of chemistry, including drug discovery, materials science, and environmental research. Substructure search helps scientists and engineers identify compounds with desired properties, predict reactivity, and understand the mechanisms of chemical reactions.
6+
7+
Modern chemical compound databases contain millions of entries, making traditional search methods inefficient and time-consuming. Substructure search utilizes algorithms that allow for quick and accurate identification of compounds with specified structural fragments. These algorithms are based on graph theory and the use of SMARTS (SMiles ARbitrary Target Specification) codes, ensuring high performance and precision in the search process.
8+
9+
## SMILES
10+
11+
A key element in the representation of chemical structures is the Simplified Molecular Input Line Entry System (SMILES). SMILES is a notation that allows a user to represent a chemical structure in a way that can be easily processed by computers. It encodes molecular structures as a series of text strings, which can then be used for various computational analyses, including substructure searches. The simplicity and efficiency of SMILES make it a widely adopted standard in cheminformatics.
12+
13+
Here are some examples of SMILES notation:
14+
15+
- Water (H₂O): O
16+
17+
- Methane (CH₄): C
18+
19+
- Ethanol (C₂H₅OH): CCO
20+
21+
- Benzene (C₆H₆): c1ccccc1
22+
23+
- Acetic acid (CH₃COOH): CC(=O)O
24+
25+
- Aspirin (C₉H₈O₄): CC(=O)Oc1ccccc1C(=O)O
26+
27+
## Examples of Substructure Search
28+
29+
30+
Below are some examples of substructure searches with visual representations:
31+
32+
1. Searching for the Benzene Ring:
33+
- Substructure (Benzene): c1ccccc1
34+
35+
<img title="a title" alt="Alt text" src="./images/c1ccccc1.png">
36+
37+
- Example of Found Compound (Toluene): Cc1ccccc1
38+
<img title="a title" alt="Alt text" src="./images/Cc1ccccc1.png">
39+
-
40+
2. Searching for a Carboxylic Acid Group:
41+
- Substructure (Carboxylic Acid): C(=O)O
42+
<img title="a title" alt="Alt text" src="./images/C(=O)O.png">
43+
- Example of Found Compound (Acetic Acid): CC(=O)O
44+
<img title="a title" alt="Alt text" src="./images/CC(=O)O.png">
45+
46+
These examples illustrate how substructure searches can be used to find compounds containing specific functional groups or structural motifs. By using SMILES notation and cheminformatics tools, researchers can efficiently identify and study compounds of interest.
47+
48+
## Homework
49+
50+
As part of our homeworks, we will try to build a web service for storing and substructural search of chemical compounds
51+
52+
- Use the **RDKit** library to implement substructure search
53+
54+
<img title="a title" alt="Alt text" src="./images/1.png">
55+
56+
- Build RESTful API using **FastApi**
57+
58+
<img title="a title" alt="Alt text" src="./images/2.png">
59+
60+
- Containerizing our solution using **Docker**
61+
62+
<img title="a title" alt="Alt text" src="./images/3.png">
63+
64+
- Adding tests using **pytest**
65+
66+
<img title="a title" alt="Alt text" src="./images/4.png">
67+
68+
- **CI / CD**
69+
70+
<img title="a title" alt="Alt text" src="./images/5.png">
71+
72+
- We will add a **database** for storing molecules
73+
74+
<img title="a title" alt="Alt text" src="./images/6.png">
75+
76+
- **Logging**
77+
78+
<img title="a title" alt="Alt text" src="./images/7.png">
79+
80+
- We will add caching using **Redis** to optimize queries
81+
82+
<img title="a title" alt="Alt text" src="./images/8.png">
83+
84+
- **Celery** to speed up queries
85+
<img title="a title" alt="Alt text" src="./images/9.png">
86+
87+
Every week we will add tasks to the folder **hw**.
88+
Delivery details are available in the **README.md** in the **hw**

hw/1/README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Homework 4
2+
## Introduction to RDKit
3+
4+
RDKit is an open-source cheminformatics software library that provides a wide range of tools for working with chemical informatics, particularly focusing on molecule representation, manipulation, and visualization. It is widely used in both academia and industry for tasks such as molecular modeling, chemical database searching, and molecular property prediction. One of RDKit's key strengths is its ability to seamlessly integrate with Python, making it accessible and flexible for a variety of applications.
5+
6+
In this introduction, we will explore the basic functionalities of RDKit, focusing on how to represent, manipulate, and visualize chemical structures using Python. We will cover the following topics:
7+
8+
1. **Installation of RDKit**
9+
2. **Basic molecular representations**
10+
3. **Substructure search**
11+
4. **Molecular visualization**
12+
13+
### 1. Installation of RDKit
14+
15+
To install [RDKit](https://www.rdkit.org/docs/Install.html), you can use the following commands in your terminal. It's recommended to use a conda environment to manage dependencies easily:
16+
17+
```sh
18+
conda create -c conda-forge -n my-rdkit-env rdkit
19+
conda activate my-rdkit-env
20+
```
21+
22+
### 2. Basic Molecular Representations
23+
24+
RDKit allows you to create and manipulate molecular structures easily. Here's how you can create a molecule from a SMILES string and visualize it:
25+
26+
```python
27+
from rdkit import Chem
28+
from rdkit.Chem import Draw
29+
30+
# Create a molecule from a SMILES string
31+
smiles = "CCO" # Ethanol
32+
molecule = Chem.MolFromSmiles(smiles)
33+
34+
# Draw the molecule
35+
Draw.MolToImage(molecule)
36+
```
37+
38+
### 3. Substructure Search
39+
40+
RDKit can be used to perform substructure searches, identifying specific functional groups or substructures within a molecule.
41+
42+
```python
43+
from rdkit import Chem
44+
45+
# Define the molecule and the substructure to search for
46+
benzene = Chem.MolFromSmiles("c1ccccc1")
47+
ethanol = Chem.MolFromSmiles("CCO")
48+
49+
# Perform the substructure search
50+
match = ethanol.HasSubstructMatch(benzene)
51+
print("Benzene ring found in ethanol:", match)
52+
```
53+
54+
### 4. Molecular Visualization
55+
56+
RDKit provides several options for visualizing molecules. You can visualize individual molecules or draw multiple molecules in a grid.
57+
58+
```python
59+
from rdkit.Chem import Draw
60+
61+
# Create a list of molecules
62+
smiles_list = ["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"]
63+
molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list]
64+
65+
# Draw the molecules in a grid
66+
img = Draw.MolsToGridImage(molecules, molsPerRow=2, subImgSize=(200, 200), returnPNG=False)
67+
img.show()
68+
```
69+
70+
## Substructure search
71+
Now you have everything to implement the substructure search function:
72+
73+
<img title="a title" alt="Alt text" src="../../images/1.png">
74+
75+
Implement a function in file /src/main.py that takes two arguments:
76+
1. Set of molecules
77+
2. A molecule by which we will filter molecules from the set based on the property of inclusion of one molecule in another
78+
79+
```python
80+
substructure_search(["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"], "c1ccccc1")
81+
["c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"]
82+
```

hw/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Homework

images/1.png

114 KB
Loading

images/2.png

97.6 KB
Loading

images/3.png

114 KB
Loading

images/4.png

120 KB
Loading

images/5.png

146 KB
Loading

images/6.png

202 KB
Loading

images/7.png

257 KB
Loading

images/8.png

290 KB
Loading

images/9.png

305 KB
Loading

images/C(=O)O.png

42 KB
Loading

images/CC(=O)O.png

22.2 KB
Loading

images/Cc1ccccc1.png

28.2 KB
Loading

images/c1ccccc1.png

25.2 KB
Loading

src/__init__.py

Whitespace-only changes.

src/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
def substructure_search(mols, mol):
2+
pass

0 commit comments

Comments
 (0)