DNA Circuit Detection from Raw Nanopore Sensing Data

Analysis pipeline for extracting, filtering, classifying, and quantifying DNA circuit output on a nanopore sensing platform. Bulk raw data was collected from Oxford Nanopore Technologies' MinION using R9.4.1 flow cells and a custom MinKNOW run script.

Adapted from https://github.com/uwmisl/NanoporeTERs, which uses this pipeline for peptide detection.

System Requirements and Installation

This software is compatible with Linux operating systems. The classification algorithms in this software also utilize a GPU (CUDA 10.0).

This repository primarily consists of iPython notebooks that were developed and tested on a Jupyter server with Python 2.7. The following dependencies should be installed:

dask (1.2.2)
future (0.17.1)
h5py (2.9.0)
joblib (0.14.0)
matplotlib (2.2.4)
numpy (1.16.2)
pandas (0.24.2)
scikit-learn (0.20.4)
scipy (1.2.2)
pytorch (1.2.0) for CUDA 10.0
yaml (0.1.7)

Installation of these dependencies should only take a few minutes with the exception of pytorch, which can take several hours depending on download speed.

How to Use

The input for this analysis pipeline is the bulk raw fast5 file generated by MinKNOW after an experimental run. Details of the experimental run, including the times at which each analyte is introduced, should be recorded in a Google spreadsheet. An example of this spreadsheet can be found here.

Open nanopore_experiments/prep_experiment_notebook.ipynb. Change date in Cell 2 to match the appropriate experiment. Change f5_base_dir to the directory of the raw fast5 file. Change output_dir to the desired directory for output capture data. Run the entire notebook. This will create a new experiment notebook in nanopore_experiments under the name experiment_DATE_FLOWCELL.ipynb, as well as a config file in nanopore_experiments/configs under the name segment_DATE_FLOWCELL.yml.

Open the newly generated experiment notebook. Details are written in the notebook, as well as in the Methods section of the accompanying manuscript, on the expected behavior and available parameters for each major step in the data processing pipeline. All cells in the notebook should be run in sequential order.

The output from this pipeline should include:

Split fast5 files for each analyte, saved to the same directory as the bulk raw fast5
Example nanopore traces for each analyte, saved to nanopore_experiments/plots
Map of good channels for each analyte, saved to nanopore_experiments/plots
Capture metadata for each analyte, saved to user-defined output_dir
Raw capture data for each analyte, saved to user-defined output_dir
Filtered and classified capture metadata for each analyte, saved to user-defined output_dir
Quantification of each analyte, saved to concentration

Demo

An example raw fast5 file is provided here (file size ~6 GB), corresponding to the experiment logged on the example spreadsheet.

The fully-executed experiment notebook for this demo is provided at nanopore_experiments/experiment_20210118_FAP26604.ipynb. The expected runtime for this demo (from raw fast5 file to quantification results) is ~10 minutes. Expected results for both time until capture-based and frequency-based quantification are provided at concentration.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
classification		classification
concentration		concentration
nanopore_experiments		nanopore_experiments
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DNA Circuit Detection from Raw Nanopore Sensing Data

System Requirements and Installation

How to Use

Demo

About

Releases 1

Packages

Languages

uwmisl/dna-nanopore-computing

Folders and files

Latest commit

History

Repository files navigation

DNA Circuit Detection from Raw Nanopore Sensing Data

System Requirements and Installation

How to Use

Demo

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages