Skip to content

Variant calling workflow for PacBio HiFi data. Part of the LUMC PGx project.

License

Notifications You must be signed in to change notification settings

LUMC/PacBio-variantcalling

Repository files navigation

PacBio-variantcalling

Tests image image DOI


Documentation

To download the pipeline and all associated files, you can run

git clone https://github.com/LUMC/PacBio-variantcalling.git
cd PacBio-variantcalling
git submodule update --init --recursive

Next, install the conda environment that is used to execute the pipeline, and activate the environment

conda env create --file environment.yml
conda activate PacBio-variantcalling

You can test the workflow using the following two commands. The first command runs the sanity checks to make sure everything required is installed. The second command will run the integration tests.

pytest --kwd tests --tag sanity
pytest --kwd tests --tag integration

To get a better sense of the pipeline and its inputs, you can manually run the most general test case

cromwell run \
    --options tests/data/config/cromwell.options.json \
    --inputs tests/data/config/variant_calling.json \
    PacBio-variantcalling.wdl

This will run the pipeline for you using the tests/data/config/variant_calling.json example configuration file. After the pipeline has completed, you can find the full execution folder in cromwell-executions. The workflow outputs have also been copied to the test-output folder in the current directory, as is specified in the tests/data/config/cromwell.options.json options file.

Pipeline configuration file

To generate an input configuration file for the PacBio pipeline, please run the following command.

womtool inputs --optional-inputs false PacBio-variantcalling.wdl
{
  "VariantCalling.samples": "Array[WomCompositeType {\n name -> String\nbamfiles -> Array[File]+ \n}]+",
  "VariantCalling.referenceFileDict": "File",
  "VariantCalling.referenceFileIndex": "File",
  "VariantCalling.referenceFile": "File",
  "VariantCalling.referencePrefix": "String"
}

If you also want to see the optional pipeline inputs, you can leave out the --optional-inputs false argument.

Common configuration options

Setting Type Required Description
VariantCalling.samples Array Required One or more sample structs.
VariantCalling.referenceFileDict File Required The picard dictionary file for the reference.
VariantCalling.referenceFileIndex File Required The samtools index file for the reference.
VariantCalling.referenceFile File Required The fasta reference file.
VariantCalling.referencePrefix String Required The name of the reference.
VariantCalling.useDeepVariant Boolean Optional Use DeepVariant instead of GATK4 for variant calling.
VariantCalling.generateGVCF Boolean Optional Generate g.vcf files for all sample. This is extremely slow when used in combination with VariantCalling.useDeepVariant.
VariantCalling.targetGenes File Optional Bed file containing the target genes. Used to determine the PGx phasing and Picard HsMetrics.
VariantCalling.dbsnp File Optional dbSNP file used to annotate the discovered variants. The results are displayed in the MultiQC report.
VariantCalling.dbsnpIndex File Optional Index for the dbSNP file, required when VariantCalling.dbsnp is specified.

Check your configuration file

If you have create your own configuration file, you can use the following command to make sure all inputs are valid. Replace tests/data/config/variant_calling.json with the path to your own configuration file.

womtool validate --inputs tests/data/config/variant_calling.json PacBio-variantcalling.wdl
Success!