Skip to content

Commit

Permalink
first hw
Browse files Browse the repository at this point in the history
  • Loading branch information
JDima committed Jul 11, 2024
1 parent a77013c commit 02d3e8a
Show file tree
Hide file tree
Showing 18 changed files with 172 additions and 0 deletions.
87 changes: 87 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,88 @@
# Python Summer School 2024

## Substructure search

Substructure search of chemical compounds is a crucial tool in cheminformatics, enabling researchers to identify and analyze chemical structures containing specific substructures. This method is widely applied in various fields of chemistry, including drug discovery, materials science, and environmental research. Substructure search helps scientists and engineers identify compounds with desired properties, predict reactivity, and understand the mechanisms of chemical reactions.

Modern chemical compound databases contain millions of entries, making traditional search methods inefficient and time-consuming. Substructure search utilizes algorithms that allow for quick and accurate identification of compounds with specified structural fragments. These algorithms are based on graph theory and the use of SMARTS (SMiles ARbitrary Target Specification) codes, ensuring high performance and precision in the search process.

## SMILES

A key element in the representation of chemical structures is the Simplified Molecular Input Line Entry System (SMILES). SMILES is a notation that allows a user to represent a chemical structure in a way that can be easily processed by computers. It encodes molecular structures as a series of text strings, which can then be used for various computational analyses, including substructure searches. The simplicity and efficiency of SMILES make it a widely adopted standard in cheminformatics.

Here are some examples of SMILES notation:

- Water (H₂O): O

- Methane (CH₄): C

- Ethanol (C₂H₅OH): CCO

- Benzene (C₆H₆): c1ccccc1

- Acetic acid (CH₃COOH): CC(=O)O

- Aspirin (C₉H₈O₄): CC(=O)Oc1ccccc1C(=O)O

## Examples of Substructure Search


Below are some examples of substructure searches with visual representations:

1. Searching for the Benzene Ring:
- Substructure (Benzene): c1ccccc1

<img title="a title" alt="Alt text" src="./images/c1ccccc1.png">

- Example of Found Compound (Toluene): Cc1ccccc1
<img title="a title" alt="Alt text" src="./images/Cc1ccccc1.png">
-
2. Searching for a Carboxylic Acid Group:
- Substructure (Carboxylic Acid): C(=O)O
<img title="a title" alt="Alt text" src="./images/C(=O)O.png">
- Example of Found Compound (Acetic Acid): CC(=O)O
<img title="a title" alt="Alt text" src="./images/CC(=O)O.png">

These examples illustrate how substructure searches can be used to find compounds containing specific functional groups or structural motifs. By using SMILES notation and cheminformatics tools, researchers can efficiently identify and study compounds of interest.

## Homework

As part of our homeworks, we will try to build a web service for storing and substructural search of chemical compounds

- Use the **RDKit** library to implement substructure search

<img title="a title" alt="Alt text" src="./images/1.png">

- Build RESTful API using **FastApi**

<img title="a title" alt="Alt text" src="./images/2.png">

- Containerizing our solution using **Docker**

<img title="a title" alt="Alt text" src="./images/3.png">

- Adding tests using **pytest**

<img title="a title" alt="Alt text" src="./images/4.png">

- **CI / CD**

<img title="a title" alt="Alt text" src="./images/5.png">

- We will add a **database** for storing molecules

<img title="a title" alt="Alt text" src="./images/6.png">

- **Logging**

<img title="a title" alt="Alt text" src="./images/7.png">

- We will add caching using **Redis** to optimize queries

<img title="a title" alt="Alt text" src="./images/8.png">

- **Celery** to speed up queries
<img title="a title" alt="Alt text" src="./images/9.png">

Every week we will add tasks to the folder **hw**.
Delivery details are available in the **README.md** in the **hw**
82 changes: 82 additions & 0 deletions hw/1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Homework 4
## Introduction to RDKit

RDKit is an open-source cheminformatics software library that provides a wide range of tools for working with chemical informatics, particularly focusing on molecule representation, manipulation, and visualization. It is widely used in both academia and industry for tasks such as molecular modeling, chemical database searching, and molecular property prediction. One of RDKit's key strengths is its ability to seamlessly integrate with Python, making it accessible and flexible for a variety of applications.

In this introduction, we will explore the basic functionalities of RDKit, focusing on how to represent, manipulate, and visualize chemical structures using Python. We will cover the following topics:

1. **Installation of RDKit**
2. **Basic molecular representations**
3. **Substructure search**
4. **Molecular visualization**

### 1. Installation of RDKit

To install [RDKit](https://www.rdkit.org/docs/Install.html), you can use the following commands in your terminal. It's recommended to use a conda environment to manage dependencies easily:

```sh
conda create -c conda-forge -n my-rdkit-env rdkit
conda activate my-rdkit-env
```

### 2. Basic Molecular Representations

RDKit allows you to create and manipulate molecular structures easily. Here's how you can create a molecule from a SMILES string and visualize it:

```python
from rdkit import Chem
from rdkit.Chem import Draw

# Create a molecule from a SMILES string
smiles = "CCO" # Ethanol
molecule = Chem.MolFromSmiles(smiles)

# Draw the molecule
Draw.MolToImage(molecule)
```

### 3. Substructure Search

RDKit can be used to perform substructure searches, identifying specific functional groups or substructures within a molecule.

```python
from rdkit import Chem

# Define the molecule and the substructure to search for
benzene = Chem.MolFromSmiles("c1ccccc1")
ethanol = Chem.MolFromSmiles("CCO")

# Perform the substructure search
match = ethanol.HasSubstructMatch(benzene)
print("Benzene ring found in ethanol:", match)
```

### 4. Molecular Visualization

RDKit provides several options for visualizing molecules. You can visualize individual molecules or draw multiple molecules in a grid.

```python
from rdkit.Chem import Draw

# Create a list of molecules
smiles_list = ["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"]
molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list]

# Draw the molecules in a grid
img = Draw.MolsToGridImage(molecules, molsPerRow=2, subImgSize=(200, 200), returnPNG=False)
img.show()
```

## Substructure search
Now you have everything to implement the substructure search function:

<img title="a title" alt="Alt text" src="../../images/1.png">

Implement a function in file /src/main.py that takes two arguments:
1. Set of molecules
2. A molecule by which we will filter molecules from the set based on the property of inclusion of one molecule in another

```python
substructure_search(["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"], "c1ccccc1")
["c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"]
```
1 change: 1 addition & 0 deletions hw/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Homework
Binary file added images/1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/C(=O)O.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/CC(=O)O.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/Cc1ccccc1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/c1ccccc1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file added src/__init__.py
Empty file.
2 changes: 2 additions & 0 deletions src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
def substructure_search(mols, mol):
pass

0 comments on commit 02d3e8a

Please sign in to comment.