|
| 1 | +# Homework 4 |
| 2 | +## Introduction to RDKit |
| 3 | + |
| 4 | +RDKit is an open-source cheminformatics software library that provides a wide range of tools for working with chemical informatics, particularly focusing on molecule representation, manipulation, and visualization. It is widely used in both academia and industry for tasks such as molecular modeling, chemical database searching, and molecular property prediction. One of RDKit's key strengths is its ability to seamlessly integrate with Python, making it accessible and flexible for a variety of applications. |
| 5 | + |
| 6 | +In this introduction, we will explore the basic functionalities of RDKit, focusing on how to represent, manipulate, and visualize chemical structures using Python. We will cover the following topics: |
| 7 | + |
| 8 | +1. **Installation of RDKit** |
| 9 | +2. **Basic molecular representations** |
| 10 | +3. **Substructure search** |
| 11 | +4. **Molecular visualization** |
| 12 | + |
| 13 | +### 1. Installation of RDKit |
| 14 | + |
| 15 | +To install [RDKit](https://www.rdkit.org/docs/Install.html), you can use the following commands in your terminal. It's recommended to use a conda environment to manage dependencies easily: |
| 16 | + |
| 17 | +```sh |
| 18 | +conda create -c conda-forge -n my-rdkit-env rdkit |
| 19 | +conda activate my-rdkit-env |
| 20 | +``` |
| 21 | + |
| 22 | +### 2. Basic Molecular Representations |
| 23 | + |
| 24 | +RDKit allows you to create and manipulate molecular structures easily. Here's how you can create a molecule from a SMILES string and visualize it: |
| 25 | + |
| 26 | +```python |
| 27 | +from rdkit import Chem |
| 28 | +from rdkit.Chem import Draw |
| 29 | + |
| 30 | +# Create a molecule from a SMILES string |
| 31 | +smiles = "CCO" # Ethanol |
| 32 | +molecule = Chem.MolFromSmiles(smiles) |
| 33 | + |
| 34 | +# Draw the molecule |
| 35 | +Draw.MolToImage(molecule) |
| 36 | +``` |
| 37 | + |
| 38 | +### 3. Substructure Search |
| 39 | + |
| 40 | +RDKit can be used to perform substructure searches, identifying specific functional groups or substructures within a molecule. |
| 41 | + |
| 42 | +```python |
| 43 | +from rdkit import Chem |
| 44 | + |
| 45 | +# Define the molecule and the substructure to search for |
| 46 | +benzene = Chem.MolFromSmiles("c1ccccc1") |
| 47 | +ethanol = Chem.MolFromSmiles("CCO") |
| 48 | + |
| 49 | +# Perform the substructure search |
| 50 | +match = ethanol.HasSubstructMatch(benzene) |
| 51 | +print("Benzene ring found in ethanol:", match) |
| 52 | +``` |
| 53 | + |
| 54 | +### 4. Molecular Visualization |
| 55 | + |
| 56 | +RDKit provides several options for visualizing molecules. You can visualize individual molecules or draw multiple molecules in a grid. |
| 57 | + |
| 58 | +```python |
| 59 | +from rdkit.Chem import Draw |
| 60 | + |
| 61 | +# Create a list of molecules |
| 62 | +smiles_list = ["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"] |
| 63 | +molecules = [Chem.MolFromSmiles(smiles) for smiles in smiles_list] |
| 64 | + |
| 65 | +# Draw the molecules in a grid |
| 66 | +img = Draw.MolsToGridImage(molecules, molsPerRow=2, subImgSize=(200, 200), returnPNG=False) |
| 67 | +img.show() |
| 68 | +``` |
| 69 | + |
| 70 | +## Substructure search |
| 71 | +Now you have everything to implement the substructure search function: |
| 72 | + |
| 73 | +<img title="a title" alt="Alt text" src="../../images/1.png"> |
| 74 | + |
| 75 | +Implement a function in file /src/main.py that takes two arguments: |
| 76 | +1. Set of molecules |
| 77 | +2. A molecule by which we will filter molecules from the set based on the property of inclusion of one molecule in another |
| 78 | + |
| 79 | +```python |
| 80 | +substructure_search(["CCO", "c1ccccc1", "CC(=O)O", "CC(=O)Oc1ccccc1C(=O)O"], "c1ccccc1") |
| 81 | +["c1ccccc1", "CC(=O)Oc1ccccc1C(=O)O"] |
| 82 | +``` |
0 commit comments