Usability improvements for the Bcbio-nextgen data analysis pipeline.
The directory structure of the wrapper:
├── bcbio_wrapper_scripts
├── deploy.sh
├── downstreamAnalysis
│ ├── bulk_rna_seq-downstream_analysis.R
│ ├── metadata.csv
│ ├── tryDownstreamAnalysis
│ └── tximport-counts.csv
├── downstreamAnalysisVariantCalling
│ ├── gene_annotation_variant_calling.R
│ ├── set_packages.R
│ ├── small_variants_annotation.sh
│ ├── structural_variants_annotation.sh
│ └── variant_annotation.sh
├── envrionment
│ ├── install_bcbio_nextgen.sh
│ ├── install_genome.sh
│ ├── parse_yaml.sh
│ ├── set_environment_variables.sh
│ ├── setup_environment_module.sh
│ ├── setup_python2_env.sh
│ └── setup_python3_env.sh
├── install_dependencies_interface.sh
├── main.py
├── main.yaml
├── utils
│ └── add_to_yaml.py
├── web
│ ├── about.html
│ ├── data
│ │ └── report_context.json
│ ├── downstream_report.html
│ ├── help.html
│ ├── hero
│ │ ├── banner.png
│ │ └── favicon.ico
│ ├── images
│ │ ├── gridspec_ex.webp
│ │ ├── plot_2.png
│ │ └── plot_3.png
│ ├── index.html
│ ├── multiqc_report.html
│ ├── run_config.html
│ ├── script.js
│ └── style.css
├── workflows
│ ├── config_module.sh
│ ├── run_atac_seq.sh
│ ├── run_bulk_rna_seq.sh
│ ├── run_variant_calling.sh
│ └── samples_module.sh
└── yaml_to_table.py
To run an analysis it is required to set up a yaml configuration file. See e.g. main.yaml.
- existing_version: ----> choose whether to install bcbio-nextgen from scratch (false) or to use an existing install (true)
- install_path: ----> the usage of an older installation of bcbio requires the path to the install
- development_branch: ----> choose the development branch to upgrade and install bcbio to
- total_cores: ----> number of cores to run bcbio with
- main_cores: ----> number of cores to run bcbio with
- install_path: ----> the isnatllation path must be specified
WARNING: the path shall be located in the home directory of the system
for the packages installed, paths that exceed 80 characters can not be processed - upgrade: ----> choose to upgrade bcbio_nextgen or not
- annotated_species: ----> choose if the analysis will run on an existing genome in bcbio or a custom genome
- genome_fasta: ----> the path toward the .fa file of the custom genome
sort gtf if no annotated species - transcriptome_gtf: ----> the path to the transcriptome of the custom genome
- species: ----> the annotated species
- genome: ----> the annotated genome
- vep_species: ----> species for usage of vep tool
- vep_assembly: ----> genome for usage of vep tool
- ensembl_ver: ----> vep tool version
- workflow: ----> name of the workflow
convention available:
** variant_calling for Variant calling and variant annotation
** atac_seq for ATAC-seq or ChIP-seq workflow
** rna_seq for RNA-seq or ChIP-seq workflow - variant_annotation: ----> when running variant calling workflow, there is the choice of running variant annotation also
- exclude_lcr: ----> when running variant calling workflow, there is the choice of performing exclusion of low complexity regions
- download_samples: ----> choose if download samples or get them from local system
- path_to_samples_on_sys: ----> the choice to use locally stored samples requires the path to the samples
- samples: ----> id of the samples to download
- samples_fastq: ----> name convention for each sample id in order without extension
if a sample has more than 1 file the names will be placed in order for _1 _2 or _3
if the samples are already on the system, write their names without the extension - csv_file_path: ----> the path toward the csv file for the analysis
Can be executed like this:
$ bash deploy.sh <your_yaml_configuration_file>.yaml 2>&1 | tee -a deploy.log