zebrafish-tRNA-seq

This repository provides code to reproduce the results presented in https://doi.org/10.1093/nar/gkae595 on OSX/Linux.

Install Dependencies

Please install the following software:

Snakemake
fastp
cutadapt
umi_tools
cmalign
segemehl
python

and the following Python 3 modules:

pandas
sklearn
numpy
matplotlib
seaborn
yaml
biopython
scipy
BCBio

We recommend installation via Conda, as explained below.

Install Conda

The installation instructions are linked below:

Conda

Setting Up the Conda Environment

Create a new Conda environment with all necessary packages:

conda create -n zebrafish python=3.8 \
    bioconda::snakemake \
    bioconda::fastp bioconda::cutadapt bioconda::umi_tools bioconda::cmalign bioconda::segemehl \
    conda-forge::numpy conda-forge::pandas conda-forge::matplotlib-base conda-forge::seaborn-base conda-forge::pyyaml \
    conda-forge::biopython conda-forge::scipy bioconda::bcbio-nextgen conda-forge::scikit-learn

Activate the Conda environment:
```
conda activate zebrafish
```

Download this Workflow

Download this Snakemake workflow, e.g., with git clone:

git clone [email protected]:mwaldl/zebrafish-tRNA-seq.git

Set Up Input Files

Required inputs are the demultiplexed fastq sequencing files, a file with meta data for each sample and a config file that among other things specifies genome sources. To set up this input files:

Create raw/rawr_reads folder and copy your (unzipped) FASTQ files there. The FASTQ files for reproducing the results presented in Rappol et al. 2024 can be downloaded here: https://dataview.ncbi.nlm.nih.gov/object/PRJNA1061456.
Set up the cconfig/samples.tsv file with metadata for each sample. An example for reproducing the results presented in our paper is included with the workflow. The columns fastq, treatment, timepoint, timepoint_name and replicate are required. The name listed in the column fastq should correspond to the FASTQ file name (e.g., EV01001 and raw/rawr_reads/EV01001.R1.fastq).
Edit the config/config.yaml to reflect your desired parameters (see comments in config file for details).
Canonical tRNA positions are annotated based on the tRNA Rfam alignment within the per_ref_nt_count rule in workflow/rules/coverage.smk: The Stockholm file raw/canonical_tRNA/RF00005.stockholm_withcanonical annotation.txt includes manually annotated canonical positions after the #=GC SS_cons line. Zebrafish tRNAs are aligned to the Rfam alignment and canonical positions are inferred by the corresponding positions in X14835.1_6927-7002, which represents all canonical positions. The assignment can be changed by editing the alignment_to_canonical_positions_mapper function within the previously mentioned rule. No editing is required if you want to use the annotated canonical positions as in our publication.
The tRNA Rfam alignment and the corresponding covariance model need to be provided within the path specified in the condig/config.yaml file. By default:
- rfam_alignment: 'raw/canonical_tRNA/RF00005.stockholm.txt'
- rfam_cm: 'raw/canonical_tRNA/RF00005.cm' (The version used in our paper is provided with the workflow).
The genomic tRNA reference genome is generated based on tRNAscanSE scans and the zebrafish genome. Fasta files for extracting the genomic tRNAs are downloaded automatically. The assembly ID of the zebrafish genome needs to be provided in the config file (assembly_id: 'GCF_000002035.6_GRCz11'). The required annotations from tRNAscanSE can, for example, be downloaded from gtrnadb.ucsc.edu ("Download tRNAscan-SE Results"). The following three files need to be copied to the paths specified in the config file:
- hc_genomic_tRNAs_from_GtRNAdb: 'raw/references/danRer11-tRNAs/danRer11-mature-tRNAs.fa'
- downloaded_tRNAscanSE_summary: 'raw/references/danRer11-tRNAs/danRer11-tRNAs-detailed.out'
- downloaded_tRNAscanSE_name_mapping: 'raw/references/danRer11-tRNAs/danRer11-tRNAs_name_map.txt'
Mitochondrial tRNAs need to be provided at the path specified in the config file (e.g., mitochondrial_tRNAs_from_mt_tRNADB: 'raw/references/tRNAdb/mit_from_tRNAdb.fst', downloaded from tRNADB, included with the workflow).
A manually curated reference genome, including SNPs found during our analysis, is provided at raw/references/manual.fa. This can be edited to fit your needs. Make sure you follow the same nameing scheme.

How to Run the Pipeline

After you downloaded the worklflow, edited inoutfiles and parameters as needed and installed the dependencies via Conda, you can

move to the worklflow directory
```
cd zebrafish-tRNA-seq
```
activate the conda enviornment (if not already active)
```
conda activate zebrafish
```
execute the pipeline
```
snakemake 'results/all.txt' -c2
```
Replace '2' with the number of cores you wish to use, at least 2.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
config		config
raw		raw
workflow		workflow
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zebrafish-tRNA-seq

Install Dependencies

Install Conda

Setting Up the Conda Environment

Download this Workflow

Set Up Input Files

How to Run the Pipeline

About

Releases

Packages

Languages

mwaldl/zebrafish-tRNA-seq

Folders and files

Latest commit

History

Repository files navigation

zebrafish-tRNA-seq

Install Dependencies

Install Conda

Setting Up the Conda Environment

Download this Workflow

Set Up Input Files

How to Run the Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages