MiXCR v4.2.0
Built-in support for new protocols
-
BD Rpahsody full-length protocol
-
Smart-Seq2 single cell RNA-Seq protocol
-
Oxford Nanopore long-read technology
Sample barcodes
Complete support of sample barcodes that may be picked up from all possible sources:
- from names of input files;
- from index I1/I2 FASTQ files;
- from sequence header lines;
- from inside the tag pattern.
Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.
Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align
stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones
step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.
New robust filters for single cell and molecular barcoded data
For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.
For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.
List of all changes
Sample barcodes
- support for more than two
fastq
files as input (I1
andI2
reads support) - multiple possible sources of data for sample resolution:
- sequences extracted with tag pattern (including those coming from
I1
andI2
reads) - samples can be based on specific pattern variant (with multi-variant patterns, separated by
||
, allows to easily adopt MiGEC-style-like sample files) - parts of file names (extracted using file name expansion mechanism)
- sequences extracted with tag pattern (including those coming from
- flexible sample table matching criteria
- matching multiple tags
- matching variant id from multi-variant tag patterns
- special
--sample-table
mixin option allowing for flexible sample table definition in a tab-delimited table form - special
--infer-sample-table
mixin option to infer sample table for sample tags from file name expansion - special generic presets for multiplexed data analysis scenarios (e.g.
generic-tcr-amplicon-separate-samples-umi
) align
command now optionally allows to split output alignments by sample into separatevdjca
filesexportClones
command now supports splitting the output into multiple files by sampleanalyze
command supports new splitting behaviour of thealign
command, separately running all the analysis steps for all the output files (if splitting is enabled)
Filters and error correction
- preset for 10X VDJ BCR enhanced with k-mer-based filter to eliminate rare cross-cell contamination from plasmatic cells
- new advanced thresholding algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering
- rework of clustering step aimed at PCR / reverse-transcription error correction in
assemble
, now it correctly handles any possible tag combination (sample, cell or molecule) - new feature to add histogram preprocessing steps in automated thresholding
Quality trimming
- turn on default quality trimming (
trimmingQualityThreshold
changed from0
to10
), this setting showed better performance in many real world use-cases
Reference library
- reference V/D/J/C gene library upgrade to repseqio v2.1 (see changelog)
New commands
- added command
exportReportsTable
that prints file in tabular format with report data from commands that were run
Other
- optimized aligner parameters for long-read data
- fixed system temp folder detection behaviour, now mixcr respects
TMPDIR
environment variable - rework of preset-mixin logic, now external presets (like those starting from
local:...
) are packed into the output*.vdjca
file onalign
step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics. - new mixin options to adjust tag refinement whitelists with
analyze
:--set-whitelist
and--reset-whitelist
- removed
refineTagsAndSort
options-w
and--whitelist
; corresponding deprecation error message printed if used - new grouping feature for
exportClones
, allowing to normalize values for-readFraction
and-uniqueTagFraction ...
columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell. - new mixin options
--add-export-clone-table-splitting
,--reset-export-clone-table-splitting
,--add-export-clone-grouping
and--reset-export-clone-grouping
- improved sensitivity of
findAlleles
command - add tags info in
exportAlignmentsPretty
andexportClonesPretty
- add
--chains
filter forexportShmTrees
,exportShmTreesWithNodes
,exportShmTreesNewick
andexportPlots shmTrees
commands - fixed old bug #353, now all aligners favor leftmost J gene in situations where multiple genes can ve found in the sequence (i.e. mis-spliced mRNA)
- fixes exception in
align
happening for not-parsed sequences withwriteFailedAlignments=true
- new filter and parameter added in
assemblePartial
; parameter name isminimalNOverlapShare
, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangement - default paired-end overlap parameters changed to slightly more relaxed version
- better criteria for alignments to be accepted for the
assemblePartial
procedure - fixed NPE in
assemblePartial
executed for the data without C-gene alignment settings - fixed rare exception in
exportAirr
command - by default exports show messages like 'region_not_covered' for data that can't be extracted (requesting
-nFeature
for not covered region or not existed tag). Option--not-covered-as-empty
will save previous behaviour - info about genes with enough data to find allele was added into report of
findAlleles
and description of alleles - fixed error message appearing when analysis parameter already assigned to
null
is overridden bynull
using the-O...
option - fixed wrong reporting of number of trimmed letters from the right side of R1 and R2 sequence
- fixed error message about repeated generic mixin overrides
- fixed error of
exportClones
with some arguments - fixes for report indention artefacts
- fixed bug when chains filter set to
ALL
inexportAlignments
was preventing not-aligned records to be exported - fixed runtime exception in
assemble
rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly - fixed bug leading to incorrect mixin option ordering during it's application to parameters bundle
- minor change to the contigAssembly filtering parametrization
- added mix-in
--export-productive-clones-only
- warning message about automatically set
-Xmx..
JVM option inmixcr
script - safer automatic value for
-Xms..
- fix: added
species
flag to 10x, nanopore and smart-seq2 presets