-
Notifications
You must be signed in to change notification settings - Fork 3
General otb directory structure
At its core, otb is a nextflow pipeline with some extra bells and whistles, this page describes the structure of an example otb project:
Here is the directory structure of a genome created for Xylecopa micans
0_Xylecopa_micans/
├── config
│ ├── none.cfg
│ ├── sge.cfg
│ ├── slurm_atlas.cfg
│ ├── slurm.cfg
│ └── slurm_usda.cfg
├── execute.slurm
├── LICENSE
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh
├── RawHiC
│ ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│ └── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
├── RawHiFI
│ ├── m54334U_210423_073502.ccs.bam
│ ├── m54334U_210423_073502.ccs.bam.bai
│ ├── m54334U_210423_073502.ccs.bam.md5
│ └── m54334U_210423_073502.ccs.bam.pbi
├── README.md
├── reports
├── results
│ ├── busco_no_polish
│ ├── busco_polish
│ ├── filtering
│ │ └── fastq_check.log.txt
│ ├── genome
│ │ ├── left.fastq.gz.stats
│ │ ├── log
│ │ └── right.fastq.gz.stats
│ ├── genomescope
│ │ ├── genomescope2.log.txt
│ │ ├── jellyfish.log.txt
│ │ ├── kcov.txt -> ../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/kcov.txt
│ │ ├── version.txt -> ../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/version.txt
│ │ └── Xylecopa_micans
│ │ ├── fitted_hist.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/fitted_hist.png
│ │ ├── linear_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/linear_plot.png
│ │ ├── log_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/log_plot.png
│ │ ├── lookup_table.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/lookup_table.txt
│ │ ├── model.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/model.txt
│ │ ├── progress.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/progress.txt
│ │ ├── summary.txt -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/summary.txt
│ │ ├── transformed_linear_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/transformed_linear_plot.png
│ │ └── transformed_log_plot.png -> ../../../work/e8/22e87b3c09c5cd59a28b99cf21e31a/Xylecopa_micans/transformed_log_plot.png
│ └── software_versions
│ ├── any2fasta_version.txt
│ ├── bbtools_version.txt
│ ├── bcftools_version.txt
│ ├── busco_version.txt
│ ├── genomescope_version.txt
│ ├── hicstuff_version.txt
│ ├── hifiasm_version.txt
│ ├── jellyfish_version.txt
│ ├── pbadapterfilt_version.txt
│ ├── ragtag_version.txt
│ ├── samtools_version.txt
│ └── shhquis_version.txt
├── run.nf
├── scr
│ ├── check_env.sh
│ ├── force_prefetch_containers.sh
│ ├── io.sh
│ └── prefetch_containers.sh
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work
│ ├── 01
│ │ └── 5fd9920526ce4cd891b6181ddb1a70
│ │ ├── any2fasta_stats.flag.txt
│ │ ├── right.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│ │ └── right.fastq.gz.stats
│ ├── 1b
│ │ └── 1d5bf40157d16bca1b352914141131
│ │ ├── jellyfish_version.flag.txt
│ │ └── version.txt -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/42/0283004d9768bf75bf2ff72fef1031/version.txt
│ ├── 22
│ │ └── 5d18f4291708f512b7a1be9148dd2a
│ │ └── any2fasta_version.flag.txt
│ ├── 39
│ │ └── e8f238662358a6b400104fd9dd0f0a
│ │ └── hifiadapterfilt_version.flag.txt
│ ├── 3c
│ │ └── 9475be059b3bd92f42172edda5d853
│ │ └── shhquis_version.flag.txt
│ ├── 42
│ │ └── 0283004d9768bf75bf2ff72fef1031
│ │ ├── jellyfish.flag.txt
│ │ ├── left.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│ │ ├── reads.jf
│ │ ├── right.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│ │ ├── version.txt
│ │ └── Xylecopa_micans.histo
│ ├── 43
│ │ └── 611cc48f608f83c5ddda5d8f8090ac
│ │ └── bbtools_version.flag.txt
│ ├── 51
│ │ └── 318c0111ff007513bca8b4b5df1416
│ │ └── bcftools_version.flag.txt
│ ├── 56
│ │ └── 7e48f7c96d4cefcdec1eda5bf4ac6b
│ │ ├── genomescope_version.flag.txt
│ │ └── version.txt -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/e8/22e87b3c09c5cd59a28b99cf21e31a/version.txt
│ ├── 5c
│ │ └── cbff3d6fd1b76022d7f1ec52a00934
│ │ └── busco_version.flag.txt
│ ├── 67
│ │ └── 766c5a6cc10fc153b8f5b37c40a826
│ │ └── hicstuff_version.flag.txt
│ ├── 81
│ │ └── 2b38330f70f55413734e6a140cb3dc
│ │ ├── any2fasta_stats.flag.txt
│ │ ├── left.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│ │ └── left.fastq.gz.stats
│ ├── a3
│ │ └── 1a6a8bf4fb272772353cd89c763cb1
│ │ └── ragtag_version.flag.txt
│ ├── b3
│ │ └── a1899efe376be7777d1f6fe8d64baf
│ │ └── samtools_version.flag.txt
│ ├── ba
│ │ └── fe4de41ba6a5e575d6079beef371f5
│ │ └── HiFiASM_version.flag.txt
│ ├── e8
│ │ └── 22e87b3c09c5cd59a28b99cf21e31a
│ │ ├── genomescope.flag.txt
│ │ ├── kcov.txt
│ │ ├── version.txt
│ │ ├── Xylecopa_micans
│ │ │ ├── fitted_hist.png
│ │ │ ├── linear_plot.png
│ │ │ ├── log_plot.png
│ │ │ ├── lookup_table.txt
│ │ │ ├── model.txt
│ │ │ ├── progress.txt
│ │ │ ├── summary.txt
│ │ │ ├── transformed_linear_plot.png
│ │ │ └── transformed_log_plot.png
│ │ └── Xylecopa_micans.histo -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/work/42/0283004d9768bf75bf2ff72fef1031/Xylecopa_micans.histo
│ ├── ec
│ │ └── efe6d84cbf60d95363237250212e5e
│ │ ├── check_fastq.flag.txt
│ │ ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
│ │ ├── JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz -> /90daydata/project/ag100pest/software/OTB_test/0_Xylecopa_micans/RawHiC/JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
│ │ └── out
├── left.fastq.gz -> ../JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R1.fastq.gz
└── right.fastq.gz -> ../JNBK_HiC_NA_NA_ACAGTG_Xylecopa_micans-Xylecopa_micans_HiC_I1143_L4_R2.fastq.gz
There's a lot going on here, so lets break it down. Let's run on the assumption that this user copied down the otb repo before adding that data they wanted processed.
At the top-level directory, we have the following:
0_Xylecopa_micans/
├── config/
├── .nextflow/
├── RawHiC/
├── RawHiFI/
├── reports/
├── results/
├── scr/
└── work/
The RawHiC
and RawHiFi
directories are user added directories to store data that they are going to use in thier otb run. config
holds files that are used to describe to otb what kind of cluster (or lack there of) otb is being run on. .nextflow
is a hidden directory which nextflow uses for history and cache. results
are where results are stored, otb creates this directory. scr
holds helper scripts for orb. work
is where each process in otb is actually computed. If we add in typical files we get this:
0_Xylecopa_micans/
├── config/
├── execute.slurm
├── LICENSE
├── .nextflow/
├── .nextflow.log
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh*
├── RawHiC/
├── RawHiFI/
├── README.md
├── results/
├── reports/
├── run.nf
├── scr/
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work/
└── Xylecopa_micans.nextflow.command.txt
Here we add execute.slurm
which the user added to execute otb on a work node on the cluster, instead of the head node, which is fairly typical of nexflow pipelines. LICENSE
which is the LICENSE for otb. .nextflow.log
which is nextflow's log file from running otb, and nextflow-Xylecopa_micans.log.txt
which is otb's log. otb.sh
is the script for running otb. README.md
which holds information on running otb. run.nf
is called by otb.sh, it is nexflow code to run nextflow. stderr.6595761.ceres19-compute-98
is the standard error of this user's run, and stdout.6595761.ceres19-compute-98
is the standard out of this users run. Xylecopa_micans.nextflow.command.txt
is the command call that otb used to call nextflow, this especially noteworthy when a nextflow run fails, because this command can be used with -resume
flag appeneded to it to restart nextflow in case of an error.
He're we see much the same, but with otb files linked out to files coressponding in the otb repo:
0_Xylecopa_micans
├── config
├── execute.slurm
├── LICENSE
├── nextflow-Xylecopa_micans.log.txt
├── otb.sh
├── RawHiC
├── RawHiFI
├── README.md
├── results
├── run.nf
├── scr
├── stderr.6595761.ceres19-compute-98
├── stdout.6595761.ceres19-compute-98
├── work
└── Xylecopa_micans.nextflow.command.txt
Relevant pages:
otb is in the public domain in the United States per 17 U.S.C. § 105