NanoSwe is a preliminary analysis toolkit for experiments that involve sequencing data from ONT's PromethION device. It has also been used for other long-read SweGen data (e.g. PacBio).
Purpose | Program |
---|---|
Quality Control | NanoPlot for QC and NanoComp |
Mapping to the reference | Minimap2-2.14 |
Sorting, Indexing, and calculating statistics | Samtools 1.9 |
Subsampling | Sambamba 0.7.1 |
BAM QC Statistics | Qualimap 2.2.1 |
Structural Variant Calling | Sniffles 1.0.10 |
Data Extraction (VCF Files only) | bcftools 1.9 |
Finding intersection in genomic regions | Survivor 1.0.7 |
Evaluation of SVs | Survivor 1.0.7 and surpyvor: 0.5.0 |
Removing control DNA sequences | NanoLyse |
Trimming Short Reads | BBMap/BBTools |
Homology Detection | Blast 2.7.1+ |
Data Visualisation | R version 3.5.3. See the scripts directory for information on libraries/packages used. |
Example tree structure of nanopore sequencing data files
├── /basecalled/<sample>/<flowcell>/
│ ├── fastq_0.fastq
│ ├── fastq_850.fastq
│ ├── sequencing_summary_0.txt
│ ├── sequencing_summary_850.txt
│ └── reads (1)
│ ├── 0 (2)
│ │ ├── file_read_1_ch_90_strand.fast5
│ │ ├── file_read_41_ch_40_strand2.fast5
│ │ └── file_read_300_ch_40_strand2.fast5
│ └── 850
│ ├── file_read_1000_ch_200_strand.fast5
│ ├── file_read_9000_ch_100_strand.fast5
│ └── file_read_95000_ch_1000_strand2.fast5
└── /bin/
(1) Each folder contains ~8000 fast5 files
(2) fast5 file named e.g. PCT0001_YYYYMMDD_0001A20B002222C_{flowcell}_sequencing_run_{library_full_name}__read_{number}_ch_{number}_strand.fast5)
Example tree structure of data organisation
├── /basecalled/<sample>/<flowcell>/
│ ├── FASTQ_files
│ │ ├── fastq_0.fastq
│ │ └── fastq_850.fastq
│ ├── sequencing_summary
│ │ ├── sequencing_summary_0.txt
│ │ └── sequencing_summary_850.txt
│ ├── reads *
│ │ ├── 0 *
│ │ │ ├── file_read_1_ch_90_strand.fast5
│ │ │ ├── file_read_41_ch_40_strand2.fast5
│ │ │ └── file_read_300_ch_40_strand2.fast5
│ │ └── 850
│ │ ├── file_read_1000_ch_200_strand.fast5
│ │ ├── file_read_9000_ch_100_strand.fast5
│ │ └── file_read_95000_ch_1000_strand2.fast5
│ └── <sample>_analysis
│ ├── reference_genome.fna
| ├── reference_genome.fna.fai
│ ├── Snakefile
│ ├── /bam_files/
│ ├── /vcf_files/
│ └── /logs/
└── /bin/
./scRipts
- R scripts created for visulisation of long read data.
commands.md
- Tool commands used for different analyses.
- SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
- De novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.
- Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Which human reference genome to use?
- Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome
- The thesis
- Evaluating nanopore sequencing data processing pipelines for structural variation identification
If you plan to use repository as a guide, simply and kindly mention the link https://github.com/Nazeeefa/NanoSwe for acknowledgment. To cite our publication, you can cite it as as shown below otherwise visit citeas.org to choose a different format. Thank you.
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes. 2020; 11(12):1444.
Fatima, Nazeefa; Petri, Anna; Gyllensten, Ulf; Feuk, Lars; Ameur, Adam. 2020. "Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes." Genes 11, no. 12: 1444.