Skip to content

Latest commit

 

History

History
112 lines (90 loc) · 10.7 KB

output.md

File metadata and controls

112 lines (90 loc) · 10.7 KB

URLpipe: Output

(Some parts adapted from nf-core TEMPLATE.)

Table of Contents

Introduction
Results

Introduction

This document describes the output produced by URLpipe. By default, all results are saved in the "./results" folder, as specified by the outdir = "./results" parameter. Results from different modules are organized into corresponding subdirectories (e.g. "./results/1_preprocess", "./results/2_qc_and_umi", etc.)

Results

Nextflow implements a caching mechanism that stores all intermediate and final results in the "./work/" directory. By default, files in the "./results/" are symbolic links to that in the "./work/". To switch from symlink to copy, use publish_dir_mode = copy argument. Below summarizes the main contents of each result folder, using Example study1 as an example. Note that since the sample_dataset1.config specifies outdir = "./results_dataset1", all results will be saved in the "./results_dataset1" directory.

Subfolder: 0_pipeline_info

This subfolder contains pipeline execution details and the validated sample sheet file.

  • samplesheet.valid.csv: Validated samplesheet file.
Subfolder: 1_preprocess

This subfolder contains intermediate files produced during the preprocessing steps.

  • 1a_lane_merge/: Contains merged fastq files for the same libraries.
  • 1b_umi_extract/: Contains fastq files with UMIs extracted and appened to the read names.
  • 1c_cutadapt/: Contains fastq files with adapters removed.
Subfolder: 2_qc_and_umi

This subfolder contains intermediate files generated during the QC and UMI processing steps.

  • 2a_fastqc/: Contains FastQC reports for sequencing data at each preprocessing step.
    • fastq_raw/
    • fastq_cutadapt/
    • fastq_readthrough/
  • 2b_read_per_umi_cutadapt/: Contains distribution of the number of reads per UMI for fastq files after trimming.
    • stat/ (Example): Contains statistical summary of read count per UMI.
    • plot/ (Example): Contains bin plot visualizing the read count per UMI summary.
  • 2c_read_per_umi_readthrough/: Contains distribution of the number of reads per UMI for readthrough fastq files.
    • stat/ (Example): Contains statistical summary of read count per UMI.
    • plot/ (Example): Contains bin plot visualizing the read count per UMI summary.
Subfolder: 3_read_category

This subfolder contains categorized reads and associated statistics.

  • 3a_classify_locus/: Contains reads mapped to the target reference.
    • classify_locus.csv (Exmaple): Statistical summary of on-locus read fractions.
    • on_target_locus/: Contains on-locus reads (5'-end of reference appear in R1 and 3'-end of reference appear in R2).
    • off_target_locus/: Contains off-locus reads (Neither end of the reference sequence appears in either R1 or R2).
    • problem_reads/: Contains problematic reads (One end of the reference sequence appears in either R1 or R2).
  • 3b_classify_indel/: Contains reads categorized by INDEL presence flanking the repeat region.
    • classify_indel.csv (Exmaple): Statistical summarizing of non-INDEL read fractions.
    • no_indel/: Contains no-INDEL reads (both sides of repeat flanking sequences appear in either R1 or R2).
    • indel_5p/: Contains reads with INDELs occuring in the 5'-end repeat flanking region.
    • indel_3p/: Contains reads with INDELs occuring in the 3'-end repeat flanking region.
    • indel_5p_and_3p/: Contains reads with INDELs occuring in both the 5'-end and 3'-end repeat flanking regions.
    • indel_5p_or_3p/: Contains reads with INDELs occuring in either the 5'-end or 3'-end repeat flanking region.
  • 3c_classify_readthrough/: Contains reads that span the flanking repeat regions at both ends.
    • classify_readthrough.csv (Example): Statistical summary of readthrough read fractions.
    • readthrough/: Contains readthrough reads (both sides of repeat flanking sequences appear in R1).
    • non_readthrough/: Contains non-readthrough reads.
    • stat/: Contains readthrough read fractions for each individual sample.
Subfolder: 4_repeat_statistics

This subfolder contains repeat length statistics for readthrough reads determined in 3_read_category.

  • 4a_repeat_length_distribution/: Contains distribution of repeat lengths for each UMI cutoff.

    • repeat_length_count_default_umi_XXX.csv (Example): Statistical summary of repeat length distribution when collapsed at each UMI cutoff.
    • repeat_length_count_default_umi_XXX.html (Example): Bar plot visualizing the repeat length distribution summary.
  • 4a_repeat_length_distribution_bwa_length/: Contains intermediate files used for length_mode = "reference_align".

  • 4a_repeat_length_distribution_bwa/: Contains intermediate BAM and fastq files used for length_mode = "reference_align".

  • 4b_repeat_length_distribution_per_umi/: Contains repeat length distribution per UMI.

    • csv/ (Example): Contains summary tables.
    • html/ (Example): Contains bar plots visualizing the above tables.
  • 4c_repeat_length_fraction/ (Example): Contains fraction of repeat lengths falling into different ranges, defined by allele-specific repeat lengths for each sample.

Subfolder: 5_indel_statistics

This subfolder contains repeat length statistics for INDEL reads determined in 3_read_category.

  • 5a_read_count_per_umi_cutoff/ (Example): INDEL read count after UMI correction.
Subfolder: 6_summary

This subfolder contains repeat length summary statistics and plots generated by combining results from 4_repeat_statistics, and 5_indel_statistics.

  • 6a_master_table/: Statistical tables summarizing repeat lengths per UMI cutoff.
    • master_table_allele_umi_XXX.csv (Example): Fraction of repeat length that falls into different ranges defined with allele-specific repeat length for all samples.
    • master_table_repeat_bin_umi_XXX.csv (Example): Like above but more flexible, stores fraction of repeat length (including INDEL reads) that falls into different bins defined with repeat_bins = "[(0,50), (51,60), (61,137), (138,154), (155,1000)]", for all samples.
  • 6b_bin_plot/: Statistical plots summarizing repeat lengths per UMI cutoff.
    • master_table_repeat_bin_umi_XXX.count.withoutIndel.html (Example): Bin plot for repeat length count for each sample per UMI cutoff excluding INDEL reads.
    • master_table_repeat_bin_umi_XXX.count.withIndel.html (Example): Bin plot for repeat length count for each sample per UMI cutoff including INDEL reads.
    • master_table_repeat_bin_umi_XXX.ratio.withoutIndel.html (Example): Bin plot for repeat length fraction for each sample per UMI cutoff excluding INDEL reads.
    • master_table_repeat_bin_umi_XXX.ratio.withIndel.html (Example): Bin plot for repeat length fraction for each sample per UMI cutoff including INDEL reads.