-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
39 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,30 +1,51 @@ | ||
<p align="center"> | ||
<image src=https://github.com/scharch/aligator/assets/6708960/c6acd3d9-d082-4b0b-9f09-c99c7f8f651a> | ||
</p> | ||
|
||
# ALIGaToR - Annotator of Loci for IG and T-cell Receptors | ||
A pipeline for annotating genomic contigs from the IG and TR loci. The pipeline includes: | ||
- Extract: A parsing script that extracts gene, exon, and RSS name and corrdinates from reference annotations of choice of closely related species. | ||
- Predict: A prediction script calls submodule DnaGrep, that predicts RSS sequences based on genomic contigs. | ||
- Annotate: Annotator script that uses the extracted reference genome and genomic information to generate a search databse for blast. Blast hits are matched with predicted RSSs. Other scripts are called to check for start and stop codons, and splice sites. | ||
|
||
## Getting Started | ||
Clone the aligator repository | ||
git clone https://github.com/scharch/aligator.git | ||
|
||
## Dependencies/Prerequisites | ||
- Python | ||
- Beautifulsoup 4.12.3 | ||
- Python 3.6 or greater | ||
- Muscle | ||
- Blast+ | ||
- pyBedTools | ||
|
||
## Usage | ||
aligator --help | ||
### Example | ||
#Download BK063715 fasta file from IMGT.org | ||
#extract IGH annotations from IMGT's rheMac10 | ||
aligator extract https://imgt.org/ligmdb/view.action?id=BK063715 BK063715 | ||
- BedTools | ||
|
||
## Getting Started | ||
Clone the aligator repository: | ||
|
||
git clone https://github.com/scharch/aligator.git | ||
|
||
Install required python packages: | ||
|
||
pip install -r aligator/requirements.txt | ||
|
||
Set enviromental variable: | ||
|
||
export ALIGATOR_PATH=$(pwd)/aligator | ||
|
||
Quick help: | ||
|
||
`aligator help` | ||
|
||
|
||
## Vignette annotating MF989451 from Ramesh et al Frontiers Immunology 2017: | ||
Data is in `aligator/sample_data`. | ||
|
||
First, get reference genome from IMGT: | ||
|
||
#Download BK063715 fasta file from https://imgt.org/ligmdb/view.action?format=FASTA&id=BK063715 | ||
#Then create bedfile with reference annotations | ||
aligator extract https://imgt.org/ligmdb/view.action?id=BK063715 BK063715 | ||
|
||
Find possible RSS motifs in the target contig. For MF989451, the output should look the same as `sample_data/MF989451.rss12_pred.bed` and `sample_data/MF989451.rss23_pred.bed`: | ||
|
||
#predict RSS for MF989451 and compare to sample data | ||
aligator predict /sample_data /sample_data/MF989451.fa MF989451 | ||
aligator predict $ALIGATOR_PATH/sample_data/MF989451.fa MF989451 | ||
|
||
Finally, annotate the target contig. For MF989451, the actual annotations provided by Ramesh et al are included as `sample_data/MF989451.ground_truth.bed`: | ||
|
||
#annotate MF989451 and compare to sample data | ||
aligator annotate /sample_data/MF989451.fa /sample_data/MF989451.rss12_pred.bed MF989451.rss23_pred.bed IGH BK063715.fasta BK063715.bed --alleledb coding.fa --outgff annotations.gff --outfasta IgGenes.fa --blast blastn | ||
aligator annotate $ALIGATOR_PATH/sample_data/MF989451.fa MF989451.RSS12.bed MF989451.RSS23.bed IGH BK063715.fasta BK063715.bed | ||
|