Built-in support for new protocols

BD Rpahsody full-length protocol
Smart-Seq2 single cell RNA-Seq protocol
Oxford Nanopore long-read technology

Sample barcodes

Complete support of sample barcodes that may be picked up from all possible sources:

from names of input files;
from index I1/I2 FASTQ files;
from sequence header lines;
from inside the tag pattern.

Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.

Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.

New robust filters for single cell and molecular barcoded data

For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.

For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.

List of all changes

Sample barcodes

support for more than two fastq files as input (I1 and I2 reads support)
multiple possible sources of data for sample resolution:
- sequences extracted with tag pattern (including those coming from I1 and I2 reads)
- samples can be based on specific pattern variant (with multi-variant patterns, separated by ||, allows to easily adopt MiGEC-style-like sample files)
- parts of file names (extracted using file name expansion mechanism)
flexible sample table matching criteria
- matching multiple tags
- matching variant id from multi-variant tag patterns
special --sample-table mixin option allowing for flexible sample table definition in a tab-delimited table form
special --infer-sample-table mixin option to infer sample table for sample tags from file name expansion
special generic presets for multiplexed data analysis scenarios (e.g. generic-tcr-amplicon-separate-samples-umi)
align command now optionally allows to split output alignments by sample into separate vdjca files
exportClones command now supports splitting the output into multiple files by sample
analyze command supports new splitting behaviour of the align command, separately running all the analysis steps for all the output files (if splitting is enabled)

Filters and error correction

preset for 10X VDJ BCR enhanced with k-mer-based filter to eliminate rare cross-cell contamination from plasmatic cells
new advanced thresholding algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering
rework of clustering step aimed at PCR / reverse-transcription error correction in assemble, now it correctly handles any possible tag combination (sample, cell or molecule)
new feature to add histogram preprocessing steps in automated thresholding

Quality trimming

turn on default quality trimming (trimmingQualityThreshold changed from 0 to 10), this setting showed better performance in many real world use-cases

Reference library

reference V/D/J/C gene library upgrade to repseqio v2.1 (see changelog)

New commands

added command exportReportsTable that prints file in tabular format with report data from commands that were run

Other

optimized aligner parameters for long-read data
fixed system temp folder detection behaviour, now mixcr respects TMPDIR environment variable
rework of preset-mixin logic, now external presets (like those starting from local:...) are packed into the output *.vdjca file on align step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics.
new mixin options to adjust tag refinement whitelists with analyze: --set-whitelist and --reset-whitelist
removed refineTagsAndSort options -w and --whitelist; corresponding deprecation error message printed if used
new grouping feature for exportClones, allowing to normalize values for -readFraction and -uniqueTagFraction ... columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell.
new mixin options --add-export-clone-table-splitting, --reset-export-clone-table-splitting, --add-export-clone-grouping and --reset-export-clone-grouping
improved sensitivity of findAlleles command
add tags info in exportAlignmentsPretty and exportClonesPretty
add --chains filter for exportShmTrees, exportShmTreesWithNodes, exportShmTreesNewick and exportPlots shmTrees commands
fixed old bug #353, now all aligners favor leftmost J gene in situations where multiple genes can ve found in the sequence (i.e. mis-spliced mRNA)
fixes exception in align happening for not-parsed sequences with writeFailedAlignments=true
new filter and parameter added in assemblePartial; parameter name is minimalNOverlapShare, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangement
default paired-end overlap parameters changed to slightly more relaxed version
better criteria for alignments to be accepted for the assemblePartial procedure
fixed NPE in assemblePartial executed for the data without C-gene alignment settings
fixed rare exception in exportAirr command
by default exports show messages like 'region_not_covered' for data that can't be extracted (requesting -nFeature for not covered region or not existed tag). Option --not-covered-as-empty will save previous behaviour
info about genes with enough data to find allele was added into report of findAlleles and description of alleles
fixed error message appearing when analysis parameter already assigned to null is overridden by null using the -O... option
fixed wrong reporting of number of trimmed letters from the right side of R1 and R2 sequence
fixed error message about repeated generic mixin overrides
fixed error of exportClones with some arguments
fixes for report indention artefacts
fixed bug when chains filter set to ALL in exportAlignments was preventing not-aligned records to be exported
fixed runtime exception in assemble rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly
fixed bug leading to incorrect mixin option ordering during it's application to parameters bundle
minor change to the contigAssembly filtering parametrization
added mix-in --export-productive-clones-only
warning message about automatically set -Xmx.. JVM option in mixcr script
safer automatic value for -Xms..
fix: added species flag to 10x, nanopore and smart-seq2 presets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiXCR v4.2.0

Built-in support for new protocols

Sample barcodes

New robust filters for single cell and molecular barcoded data

List of all changes

Sample barcodes

Filters and error correction

Quality trimming

Reference library

New commands

Other