A collection of scripts for processing longread UMI data. Tested on Linux 3.10.0
Citation:
Karst, S. M., Ziels, R. M., Kirkegaard, R. H., & Albertsen, M. (2019). Enabling high-accuracy long-read amplicon sequences using unique molecular identifiers and Nanopore sequencing. bioRxiv, 645903.
https://www.biorxiv.org/content/10.1101/645903v2
Conda or Miniconda3 installed
usearch
version 10 or higher
-
Go to desired installation directory, open a terminal and run:
git clone https://github.com/ziels/longread-UMI-pipeline
-
Go to scripts directory:
cd longread-UMI-pipeline/scripts
-
Modify
dependencies.sh
with path tousearch
Change lineexport USEARCH=usearch_path
to give your exact file path to theusearch
executable file (instead ofusearch_path
).
conda env create -f environment.yaml
- Check that Conda env is installed
conda info —-envs
Make sure you get something like:
# conda environments:
#
longread-UMI <path to conda envs>/longread-UMI
Note the installation path of the longread-UMI
environment (for next steps)
- Activate conda environment
conda activate longread-UMI
Or, depending on your conda version:source activate longread-UMI
-
Find path of conda environments from command
conda info —-envs
-
Check the
porechop
path works:
ls < path to conda environments >/longread-UMI/lib/python3.6/site-packages/porechop
Make sure you see an adapters.py
returned from the above command.
- Back-up and replace
adapters.py
mv <path_to_conda_environments>/longread-UMI/lib/python3.6/site-packages/porechop/adapters.py <path_to_conda_environments>/longread-UMI/lib/python3.6/site-packages/porechop/adapters_old.py
(From within longread-UMI-pipeline/scripts
directory):
cp ./adapters.py <path_to_conda_environments>/longread-UMI/lib/python3.6/site-packages/porechop/adapters.py
Go to /path/to/longread-UMI-pipeline/test_data
Open a terminal in the directory and run
../longread_UMI_pipeline.sh -d test_reads.fq -s 10 -c 30 -t 1
- Go to desired installation directory, open a terminal and run:
git clone https://github.com/SorenKarst/longread-UMI-pipeline
- Go to longread-UMI-pipeline directory, open a terminal and run:
find . -name "*.sh" -exec chmod +x {} \;
- Create symlink in ~/bin by opening a terminal and run:
mkdir -p ~/bin
ln -s /path/to/longread-UMI-pipeline/longread_UMI_pipeline.sh ~/bin/longread-UMI-pipeline
ln -s /path/to/longread-UMI-pipeline/longread_UMI_mockanalysis.sh ~/bin/longread-UMI-mockanalysis
- Open /path/to/longread-UMI-pipeline/scripts/dependencies.sh in a texteditor.
- Change all paths under "Program paths" to reflect installation paths on your system.
- If unsure of the paths try to type
which <function>
in the terminal. I.e.which racon
. - Install any missing dependencies.
- We recommend to make a seperate installation of porechop to use with the longread-UMI-pipeline.
- Go to path/to/porechop/porechop/
- Backup current adapters.py.
- Replace current adapters.py with path/to/longread-UMI-pipeline/scripts/adapters.py.
- Open a terminal anywhere and run:
longread-UMI-pipeline -h
or/path/to/longread-UMI-pipeline -h
- Test longread-UMI-pipeline on test data:
Go to /path/to/longread-UMI-pipeline/test_data
Open a terminal in the directory and runlongread-UMI-pipeline -d test_reads.fq -s 10 -c 30 -t 1
- Create a working directory, open a terminal, download the Zymo mock fastq data and decompress:
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR333/003/ERR3336963/ERR3336963_1.fastq.gz; gunzip -c ERR3336963_1.fastq.gz > reads.fq
- Open a terminal in the directory and run:
longread-UMI-pipeline -d reads.fq -s 1000000 -c 30 -t <Number-of-threads>
- Open a terminal in the directory and run:
longread-UMI-mockanalysis <Number-of-threads>