v2.5
Version 2.5
This release includes a standard Nextflow workflow that takes unaligned PacBio HiFi
reads from AAV sequencing as input, aligns them to genome reference sequences in the
recommended way, and then performs processing as before to generate the AAV report.
This workflow includes configuration to run locally, on Google Cloud, and on the Form
Bio platform. The individual Nextflow processes (under modules/local/laava.nf) are
designed to be easily included in other Nextflow workflows, for extensibility and
customization.
Packaging
- Dockerfiles are provided to create images 'laava', for typical use within the
workflows and interactively, and 'laava_dev', for development. The latter excludes the
AAV processing scripts and includes more dependencies. The corresponding conda
environments have matching names. - The 'laava' container image is now publicly hosted on GitHub Packages, linked to this
source repo. - The semi-automated test suite includes a small BAM file in the repo, downsampled from
the public PacBio scAAV example dataset. This enables end-to-end testing of the
pipeline.
Processing scripts
- New script
prepare_annotation.py
generates the specialized "annotation.txt" file
from vector annotations in standard BED format and a simple list of additional
non-vector sequence names (e.g. host genome, helper and repcap plasmids).
Report
- Updated the overview figures to include a "snapback" classification for certain reads
in ssAAV samples. - Updated the text in the AAV type/subtype definition tables, also including the
"snapback" read-level definition. - In the first two data tables, list "Frequency in AAV" and "Total Frequency" as
separate columns. - Resolved a few quirks in rendering tables and conditional subsections.
- Deleted legacy report generation script
plotAAVreport.R
. - Updated the "Methods" section text to describe the end-to-end workflow.
Full Changelog: v2.1...v2.5