Skip to content

Commit

Permalink
Merge pull request #53 from Goosang-Yu/dev_goosang
Browse files Browse the repository at this point in the history
Update daocs and fix bugs in SynonymousPE.stack
  • Loading branch information
Goosang-Yu authored Jan 3, 2024
2 parents 04d3096 + 394a52f commit f36ea1e
Show file tree
Hide file tree
Showing 50 changed files with 875 additions and 306 deletions.
Binary file removed dataset/genes/DP_variant_293T_PE2_Conv_220428.npy
Binary file not shown.
8 changes: 4 additions & 4 deletions docs/en/1_Predict/1_howworks.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
CRISPR system들은 gRNA와 그에 대응하는 target 서열 정보에 따라 genome editing 효율이 결정된다. 서열의 특정 motif 또는 GC contents 등이 영향을 미칠 수 있다.

## High-throughput screening
![CRISPR High-throughput screening](../images/en_1_1_1_High-throughput_screening.svg)
![CRISPR High-throughput screening](../assets/contents/en_1_1_1_High-throughput_screening.svg)


## Features determining genome editing efficiencies
![CRISPR High-throughput screening](../images/en_1_1_2_SHAP_analysis.svg)
![CRISPR High-throughput screening](../assets/contents/en_1_1_2_SHAP_analysis.svg)

![CRISPR High-throughput screening](../images/en_1_1_3_SHAP_feature_value.svg)
![CRISPR High-throughput screening](../assets/contents/en_1_1_3_SHAP_feature_value.svg)

![CRISPR High-throughput screening](../images/en_1_1_4_SHAP_force_plot.svg)
![CRISPR High-throughput screening](../assets/contents/en_1_1_4_SHAP_force_plot.svg)


2 changes: 1 addition & 1 deletion docs/en/1_Predict/4_predict_pe.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

### Predict Prime editing efficiency
![](../images/en_1_4_1_DeepPrime_architecture.svg)
![](../assets/contents/en_1_4_1_DeepPrime_architecture.svg)
DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch.

```python
Expand Down
8 changes: 4 additions & 4 deletions docs/en/2_Design/2_SynonymousPE.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ pegRNA의 표적이 되는 DNA 서열 정보. `SynonymousPE`는 이 서열 내
`frame`: int
Reference sequence (ref_seq) 서열의 frame을 나타내는 값이다. CDS의 codon frame에 의해 결정되며, 0, 1, 2 중에 하나로 표현된다. 예를 들어, CDS 서열이 codon (3bp)의 맨 앞부터 시작한다면, `frame`은 0으로 입력하면 된다. Frame이 정확하지 않으면 전혀 다른 amino acid 서열에 의한 synonymou mutation이 생성되므로, 꼼꼼하게 확인하고 입력해야 한다.

![codon_frame](../images/en_2_2_1_codon_frame.svg)
![codon_frame](../assets/contents/en_2_2_1_codon_frame.svg)

`cds_start`: int
`ref_seq`에서 CDS가 시작하는 위치를 의미한다.
Expand All @@ -68,7 +68,7 @@ SynonymousPE의 frame과 CDS 위치 지정에 대한 내용은 처음 사용할

#### Example 1:
---
![Example_1](../images/ko_2_2_2_Synony_example_1.svg)
![Example_1](../assets/contents/ko_2_2_2_Synony_example_1.svg)

```python
from genet.design import SynonymousPE
Expand All @@ -83,7 +83,7 @@ synony_pegrna = SynonymousPE(dp_record, ref_seq=ref_seq, frame=0, cds_start=0, c

#### Example 2:
---
![Example_2](../images/ko_2_2_3_Synony_example_2.svg)
![Example_2](../assets/contents/ko_2_2_3_Synony_example_2.svg)

```python
from genet.design import SynonymousPE
Expand All @@ -98,7 +98,7 @@ synony_pegrna = SynonymousPE(dp_record, ref_seq=ref_seq, frame=2, cds_start=0, c

#### Example 3:
---
![Example_3](../images/ko_2_2_4_Synony_example_3.svg)
![Example_3](../assets/contents/ko_2_2_4_Synony_example_3.svg)

```python
from genet.design import SynonymousPE
Expand Down
37 changes: 37 additions & 0 deletions docs/en/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<!-- ---
template: home.html
title: Home
social:
cards_layout_options:
title: Documentation that simply works
--- -->

# Welcome to GenET
[![Python](https://img.shields.io/badge/Python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue)](https://badge.fury.io/py/genet)
[![PyPI version](https://badge.fury.io/py/genet.svg)](https://badge.fury.io/py/genet)
[![Slack](https://img.shields.io/badge/slack-chat-blueviolet.svg?logo=slack)](https://genethq.slack.com/archives/C04DP727E4E)
[![License](https://img.shields.io/pypi/l/ansicolortags.svg)](https://img.shields.io/pypi/l/ansicolortags.svg)


GenET (Genome Editing Toolkit) is a library of various python functions for the purpose of analyzing and evaluating data from genome editing experiments.

GenET is still in its early stages of development and continue to improve and expand. Currently planned functions include guideRNA design, saturation library design, deep sequenced data analysis, and guide RNA activity prediction.


## Who should use GenET?
GenET was developed for anyone interested in the field of genome editing. Especially, Genet can provide aid to those with the following objectives.: <br />

- Develop a quick and easy to design an genome editing experiment for a specific gene.
- Perform genome editing analysis based on sequening data
- Predict the activtiy of specific guideRNAs or all guideRNAs designed for editing a specific product.

## Get Help
The fastest way to get help is through Slack channel. You can also see our Issue log for answers to questions asked in the past by other members or raise a new question if it's not asked before.
<!-- (만약 FAQ 페이지를 만들면) Check our Frequently Asked Questions (FAQs) page. -->

## Support GenET
### ⭐ Star GenET on GitHub
Give GenET a star on our [GitHub repository](https://github.com/Goosang-Yu/genet) (click the star button on the top right corner)

### 📢 Tweet about GenET
Help GenET spread the word. We love to hear success stories and use cases.
Empty file added docs/en/_README.md
Empty file.
9 changes: 9 additions & 0 deletions docs/en/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<!-- ---
template: home.html
title: Home
social:
cards_layout_options:
title: Documentation that simply works
--- -->

Welcome to GenET test page.
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added docs/en/assets/images/dna.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
135 changes: 42 additions & 93 deletions docs/en/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,116 +1,65 @@
GenET은 python을 기반으로 다양한 genome editing 관련된 연구 설계와 분석을 할 수 있는 library이다. 특히 CRISPR systme (Cas9, base editing, prime editing)을 손쉽게 활용할 수 있는 기능들을 제공하고 있다.

## Installation
#### 1/ Create virtual environment and install genet
```python
# Create virtual env for genet. (python 3.8 was tested)
conda create -n genet python=3.8
conda activate genet

# Install genet ( >= ver. 0.7.0)
## Installation
GenET은 PyPI를 이용해서 손쉽게 설치할 수 있다. [설치방법 페이지](/genet/installation)를 확인하길 바란다.
```bash
pip install genet
```

#### 2/ Install Pytorch (v1.11.0 was tested)
Pytorch ver.2 is not compatible yet.
```python
# For OSX (MacOS)
pip install torch==1.11.0

# For Linux and Windows
# CUDA 11.3 (choose version degending on your GPU)
pip install torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113

# CPU only
pip install torch==1.11.0+cpu --extra-index-url https://download.pytorch.org/whl/cpu
```
#### 3/ Install ViennaRNA
```python
# install ViennaRNA package for prediction module
conda install viennarna
```

## Who should use GenET?
GenET was developed for anyone interested in the field of genome editing. Especially, Genet can provide aid to those with the following objectives.: <br />

- Develop a quick and easy to design an genome editing experiment for a specific gene.
- Perform genome editing analysis based on sequening data
- Predict the activtiy of specific guideRNAs or all guideRNAs designed for editing a specific product.



## Example: Prediction of prime editing efficiency by DeepPrime
![](images/Fig3_DeepPrime_architecture.svg)
DeepPrime is a prediction model for evaluating prime editing guideRNAs (pegRNAs) that target specific target sites for prime editing ([Yu et al. Cell 2023](https://doi.org/10.1016/j.cell.2023.03.034)). DeepSpCas9 prediction score is calculated simultaneously and requires tensorflow (version >=2.6). DeepPrime was developed on pytorch.
## Genome editing 연구를 위한 GenET 활용 예시
GenET을 통해, 유전 정보 및 CRISPR를 이용한 연구를 하기 위해 필요한 다양한 기능들을 사용할 수 있다 (또는 추가 예정).

```python
from genet.predict import DeepPrime
> Case1:
seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'

pegrna = DeepPrime('Test', seq_wt, seq_ed, edit_type='sub', edit_len=1)
```python

# check designed pegRNAs
>>> pegrna.features
```

| | ID | WT74_On | Edited74_On | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | type_sub | type_ins | type_del | Tm1 | Tm2 | Tm2new | Tm3 | Tm4 | TmD | nGCcnt1 | nGCcnt2 | nGCcnt3 | fGCcont1 | fGCcont2 | fGCcont3 | MFE3 | MFE4 | DeepSpCas9_score |
| - | ---- | -------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | -------- | -------- | -------- | -------- | ------- | ------- | --------- | -------- | --------- | ------- | ------- | ------- | -------- | -------- | -------- | ------ | ----- | ---------------- |
| 0 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxxCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 7 | 35 | 42 | 34 | 1 | 1 | 1 | 0 | 0 | 16.19097 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 5 | 16 | 21 | 71.42857 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
| 1 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxxCCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 8 | 35 | 43 | 34 | 1 | 1 | 1 | 0 | 0 | 30.19954 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 75 | 45.71429 | 51.16279 | \-10.4 | \-0.6 | 45.96754 |
| 2 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxxACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 9 | 35 | 44 | 34 | 1 | 1 | 1 | 0 | 0 | 33.78395 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 6 | 16 | 22 | 66.66667 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
| 3 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxxCACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 10 | 35 | 45 | 34 | 1 | 1 | 1 | 0 | 0 | 38.51415 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 70 | 45.71429 | 51.11111 | \-10.4 | \-0.6 | 45.96754 |
| 4 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxxACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 11 | 35 | 46 | 34 | 1 | 1 | 1 | 0 | 0 | 40.87411 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 63.63636 | 45.71429 | 50 | \-10.4 | \-0.6 | 45.96754 |
| 5 | Test | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATG | xxxxxxxxxAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGxxxxxxxxxxxxxxxxxx | 12 | 35 | 47 | 34 | 1 | 1 | 1 | 0 | 0 | 40.07098 | 62.1654 | 62.1654 | \-277.939 | 58.22525 | \-340.105 | 7 | 16 | 23 | 58.33333 | 45.71429 | 48.93617 | \-10.4 | \-0.6 | 45.96754 |

Next, select model PE system and run DeepPrime
```python
pe2max_output = pegrna.predict(pe_system='PE2max', cell_type='HEK293T')

>>> pe2max_output.head()
```

| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score |
| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ |
| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 |
| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 |
| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 |
| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 |
| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 |
## GenET에서 제공하는 기능들
GenET에서 제공 (예정 포함)하는 기능들을 아래와 같다.

| Module | Functions | Descriptions | Status |
| -------- | -------------- | --------------------------------------------------------------------- | ------ |
| Predict | SpCas9 | DeepSpCas9 모델 사용 | 사용가능 |
| Predict | SpCas9variants | DeepSpCas9variants 모델 사용 | 사용가능 |
| Predict | Base editor | DeepBE 모델 사용 | 개발예정 |
| Predict | Prime editor | DeepPrime 모델 사용 | 사용가능 |
| Design | KOLiD | Genome-wide KO library design | 개발예정 |
| Design | ReLiD | Gene regulation library design | 개발예정 |
| Design | CRISPRStop | Design gRNA for inducing premature stop codon using CBE | 개발예정 |
| Design | SynonymousPE | Design pegRNA containing additional synonymousmutation in RT template | 사용가능 |
| Database | GetGenome | NCBI database에서 genome data를 가져오는 기능 | 사용가능 |
| Database | GetGene | NCBI database에서 특정 gene의 정보를 가져오는 기능 | 개발예정 |
| Database | GenBankParser | GenBank file에서 원하는 정보들을 찾아내는 기능 | 개발예정 |
| Database | DFConverter | NCBI genbank file의 형태를 DataFrame으로 변환하는 기능 | 사용가능 |
| Analysis | SGE | Saturation genome editing 데이터를 분석하기 위한 기능 | 개발예정 |
| Analysis | UMItools | UMI 분석을 위한 함수 (from UMI-tools) | 사용가능 |
| Utils | request_file | HTTP protocol을 이용해 서버에서 원하는 파일을 다운로드 하는 | 사용가능 |
| Utils | SplitFastq | FASTQ 파일을 작은 크기들로 나눠주는 기능 | 사용가능 |

The previous function, ```pe_score()```, is still available for use. However, please note that this function will be deprecated in the near future.
```python
from genet import predict as prd

# Place WT sequence and Edited sequence information, respectively.
# And select the edit type you want to make and put it in.
#Input seq: 60bp 5' context + 1bp center + 60bp 3' context (total 121bp)

seq_wt = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGAAGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
seq_ed = 'ATGACAATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGAAACTGAGACGAACTATAACCTGCAAATGTCAACTGAAACCTTAAAGTGAGTATTTAATTGAGCTGAAGT'
alt_type = 'sub1'
## Need help?
Look at the issues section to find out about specific cases and others.

df_pe = prd.pe_score(seq_wt, seq_ed, alt_type)
df_pe.head()
```
If you still have doubts or cannot solve the problem, please consider opening an issue

| | Target | Spacer | RT-PBS | PBSlen | RTlen | RT-PBSlen | Edit_pos | Edit_len | RHA_len | PE2max_score |
| - | ------------------------------------------------- | ------------------------------ | ---------------------------------------------- | ------ | ----- | --------- | -------- | -------- | ------- | ------------ |
| 0 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGG | 7 | 35 | 42 | 34 | 1 | 1 | 0.904907 |
| 1 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGG | 8 | 35 | 43 | 34 | 1 | 1 | 2.377118 |
| 2 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGT | 9 | 35 | 44 | 34 | 1 | 1 | 2.613841 |
| 3 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTG | 10 | 35 | 45 | 34 | 1 | 1 | 3.643573 |
| 4 | ATAAAAGACAACACCCTTGCCTTGTGGAGTTTTCAAAGCTCCCAGA... | ATAAAAGACAACACCCTTGCCTTGTGGAGT | CGTCTCAGTTTCTGGGAGCTTTGAAAACTCCACAAGGCAAGGGTGT | 11 | 35 | 46 | 34 | 1 | 1 | 3.770234 |
Please send all comments and questions to [email protected]



It is also possible to predict other cell lines (A549, DLD1...) and PE systems (PE2max, PE4max...).
## GenET 인용하기

```python
df_pe = prd.pe_score(seq_wt, seq_ed, alt_type, sID='MyGene', pe_system='PE4max', cell_type='A549')
```



Please send all comments and questions to [email protected]
@Manual {GenET,
title = {GenET: Python package for genome editing research},
author = {Goosang Yu},
year = {2024},
month = {January},
note = {GenET version 0.13.1},
url = {https://github.com/Goosang-Yu/genet}
}
```
Loading

0 comments on commit f36ea1e

Please sign in to comment.