(Some parts adapted from nf-core TEMPLATE.)
This document describes the output produced by URLpipe. By default, all results are saved in the "./results" folder, as specified by the outdir = "./results"
parameter. Results from different modules are organized into corresponding subdirectories (e.g. "./results/1_preprocess", "./results/2_qc_and_umi", etc.)
Nextflow implements a caching mechanism that stores all intermediate and final results in the "./work/" directory. By default, files in the "./results/" are symbolic links to that in the "./work/". To switch from symlink
to copy
, use publish_dir_mode = copy
argument. Below summarizes the main contents of each result folder, using Example study1 as an example. Note that since the sample_dataset1.config specifies outdir = "./results_dataset1"
, all results will be saved in the "./results_dataset1" directory.
Subfolder: 0_pipeline_info
This subfolder contains pipeline execution details and the validated sample sheet file.
samplesheet.valid.csv
: Validated samplesheet file.
Subfolder: 1_preprocess
This subfolder contains intermediate files produced during the preprocessing steps.
1a_lane_merge/
: Contains merged fastq files for the same libraries.1b_umi_extract/
: Contains fastq files with UMIs extracted and appened to the read names.1c_cutadapt/
: Contains fastq files with adapters removed.
Subfolder: 2_qc_and_umi
This subfolder contains intermediate files generated during the QC and UMI processing steps.
2a_fastqc/
: Contains FastQC reports for sequencing data at each preprocessing step.fastq_raw/
fastq_cutadapt/
fastq_readthrough/
2b_read_per_umi_cutadapt/
: Contains distribution of the number of reads per UMI for fastq files after trimming.2c_read_per_umi_readthrough/
: Contains distribution of the number of reads per UMI for readthrough fastq files.
Subfolder: 3_read_category
This subfolder contains categorized reads and associated statistics.
3a_classify_locus/
: Contains reads mapped to the target reference.classify_locus.csv
(Exmaple): Statistical summary of on-locus read fractions.on_target_locus/
: Contains on-locus reads (5'-end of reference appear in R1 and 3'-end of reference appear in R2).off_target_locus/
: Contains off-locus reads (Neither end of the reference sequence appears in either R1 or R2).problem_reads/
: Contains problematic reads (One end of the reference sequence appears in either R1 or R2).
3b_classify_indel/
: Contains reads categorized by INDEL presence flanking the repeat region.classify_indel.csv
(Exmaple): Statistical summarizing of non-INDEL read fractions.no_indel/
: Contains no-INDEL reads (both sides of repeat flanking sequences appear in either R1 or R2).indel_5p/
: Contains reads with INDELs occuring in the 5'-end repeat flanking region.indel_3p/
: Contains reads with INDELs occuring in the 3'-end repeat flanking region.indel_5p_and_3p/
: Contains reads with INDELs occuring in both the 5'-end and 3'-end repeat flanking regions.indel_5p_or_3p/
: Contains reads with INDELs occuring in either the 5'-end or 3'-end repeat flanking region.
3c_classify_readthrough/
: Contains reads that span the flanking repeat regions at both ends.classify_readthrough.csv
(Example): Statistical summary of readthrough read fractions.readthrough/
: Contains readthrough reads (both sides of repeat flanking sequences appear in R1).non_readthrough/
: Contains non-readthrough reads.stat/
: Contains readthrough read fractions for each individual sample.
Subfolder: 4_repeat_statistics
This subfolder contains repeat length statistics for readthrough reads determined in 3_read_category
.
-
4a_repeat_length_distribution/
: Contains distribution of repeat lengths for each UMI cutoff. -
4a_repeat_length_distribution_bwa_length/
: Contains intermediate files used forlength_mode = "reference_align"
. -
4a_repeat_length_distribution_bwa/
: Contains intermediate BAM and fastq files used forlength_mode = "reference_align"
. -
4b_repeat_length_distribution_per_umi/
: Contains repeat length distribution per UMI. -
4c_repeat_length_fraction/
(Example): Contains fraction of repeat lengths falling into different ranges, defined by allele-specific repeat lengths for each sample.
Subfolder: 5_indel_statistics
This subfolder contains repeat length statistics for INDEL reads determined in 3_read_category
.
5a_read_count_per_umi_cutoff/
(Example): INDEL read count after UMI correction.
Subfolder: 6_summary
This subfolder contains repeat length summary statistics and plots generated by combining results from 4_repeat_statistics
, and 5_indel_statistics
.
6a_master_table/
: Statistical tables summarizing repeat lengths per UMI cutoff.master_table_allele_umi_XXX.csv
(Example): Fraction of repeat length that falls into different ranges defined with allele-specific repeat length for all samples.master_table_repeat_bin_umi_XXX.csv
(Example): Like above but more flexible, stores fraction of repeat length (including INDEL reads) that falls into different bins defined withrepeat_bins = "[(0,50), (51,60), (61,137), (138,154), (155,1000)]"
, for all samples.
6b_bin_plot/
: Statistical plots summarizing repeat lengths per UMI cutoff.master_table_repeat_bin_umi_XXX.count.withoutIndel.html
(Example): Bin plot for repeat length count for each sample per UMI cutoff excluding INDEL reads.master_table_repeat_bin_umi_XXX.count.withIndel.html
(Example): Bin plot for repeat length count for each sample per UMI cutoff including INDEL reads.master_table_repeat_bin_umi_XXX.ratio.withoutIndel.html
(Example): Bin plot for repeat length fraction for each sample per UMI cutoff excluding INDEL reads.master_table_repeat_bin_umi_XXX.ratio.withIndel.html
(Example): Bin plot for repeat length fraction for each sample per UMI cutoff including INDEL reads.