ONT_data_analysis

Here you will find a collection of scripts to analyse long-read sequencing raw data obtained with the Oxford Nanopore Technology. Each script will run a PBS job on a HPC cluster, but it can be modified to meet other requirements (e.g. SLURM, local...).

Script 1: 01_basecalling_dorado.pbs. This script performs barcode classification in-line with basecalling using Dorado under the most accurate basecalling method (it requires more time). The .fast5 file generated by the sequencer should be located in the directory ./00_data/00_raw. Depending on the library preparation kit used, you might have to change the value of the option --kit-name. Then, the BAM file is splitted into a BAM file per barcode. This job is computationally expensive, so make sure you provide enough resources (GPU).
Script 2: 02_read_statistics.pbs. This script analyses the quality of the raw reads using NanoPlot (also provides some statistics) and FastQC.
Script 3: 03_quality_filtering_porechop.pbs. This script finds and removes adapters from the reads using Porechop.
Script 4: 04_rename_IDs.pbs. This script renames the header of the fastq reads so that they are unique for downstream analyses. Basically, the headers of the sequences are modified to include a suffix derived from the last _ of the last field in the header line.
Script 5: 05_quality_filtering_bbmap.pbs. This script performs quality filering of the reads using BBMap. First, reformat.sh will discard any sequences shorter than 250 bp, and, second, bbduk.sh will trim both ends of the reads to a minimum quality of 10 using the Phred algorithm.
Script 6: 06_reads_statistics.pbs. This script analyses the quality of the filtered reads using NanoPlot and FastQC.
Script 7: 07_assembly_flye.pbs. This script assemblies the reads into contings using Flye for high quality reads (in combination with dorado basecaller sup in script 1).
Script 8: 08_assembly_polishing_medaka.pbs. This script creates consensus assembled genomes using Medaka.
Script 9: 09_quast.pbs. This script calculates the statistics of the assembled genomes using Quast.
Script 10: 10_checkM.pbs. This script assesses the quality of the assembled draft genomes using CheckM.
Script 11: 11_genome_coverage.pbs. This script calculates the genome coverage of the assembled draft genome by mapping the fastq reads used for assembly using [minimap2] (https://github.com/lh3/minimap2) and SAMtools.
Script 12: 12_reordering_genomes_mauve.pbs. This script reorder draft contigs according to the reference genome using Mauve. It will help to determine global rearrangement structures based on next gene annotations.
Script 13: 13_annotation_prokka.pbs. This script annotated the assembled genomes using Prokka. Annotations will be first added from a reference genome with the parameter --proteins. Modify the command as desired.
Script 14: 14_AMR_ABRicate.pbs. This script looks for antimicrobial resistance using all the databases in ABRicate.
Script 15: 15_pangenome_roary.pbs. This script construct the pangenome using Roary with the annotations from Prokka

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ONT_data_analysis

About

Releases

Packages

Languages

License

ireneortega/ONT_data_analysis

Folders and files

Latest commit

History

Repository files navigation

ONT_data_analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages