Skip to content

MiXCR v4.2.0

Compare
Choose a tag to compare
@github-actions github-actions released this 26 Jan 20:06
b0f194e

Built-in support for new protocols

Sample barcodes

Complete support of sample barcodes that may be picked up from all possible sources:

  • from names of input files;
  • from index I1/I2 FASTQ files;
  • from sequence header lines;
  • from inside the tag pattern.

Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.

Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.

New robust filters for single cell and molecular barcoded data

For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.

For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.

List of all changes

Sample barcodes

  • support for more than two fastq files as input (I1 and I2 reads support)
  • multiple possible sources of data for sample resolution:
    • sequences extracted with tag pattern (including those coming from I1 and I2 reads)
    • samples can be based on specific pattern variant (with multi-variant patterns, separated by ||, allows to easily adopt MiGEC-style-like sample files)
    • parts of file names (extracted using file name expansion mechanism)
  • flexible sample table matching criteria
    • matching multiple tags
    • matching variant id from multi-variant tag patterns
  • special --sample-table mixin option allowing for flexible sample table definition in a tab-delimited table form
  • special --infer-sample-table mixin option to infer sample table for sample tags from file name expansion
  • special generic presets for multiplexed data analysis scenarios (e.g. generic-tcr-amplicon-separate-samples-umi)
  • align command now optionally allows to split output alignments by sample into separate vdjca files
  • exportClones command now supports splitting the output into multiple files by sample
  • analyze command supports new splitting behaviour of the align command, separately running all the analysis steps for all the output files (if splitting is enabled)

Filters and error correction

  • preset for 10X VDJ BCR enhanced with k-mer-based filter to eliminate rare cross-cell contamination from plasmatic cells
  • new advanced thresholding algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering
  • rework of clustering step aimed at PCR / reverse-transcription error correction in assemble, now it correctly handles any possible tag combination (sample, cell or molecule)
  • new feature to add histogram preprocessing steps in automated thresholding

Quality trimming

  • turn on default quality trimming (trimmingQualityThreshold changed from 0 to 10), this setting showed better performance in many real world use-cases

Reference library

  • reference V/D/J/C gene library upgrade to repseqio v2.1 (see changelog)

New commands

  • added command exportReportsTable that prints file in tabular format with report data from commands that were run

Other

  • optimized aligner parameters for long-read data
  • fixed system temp folder detection behaviour, now mixcr respects TMPDIR environment variable
  • rework of preset-mixin logic, now external presets (like those starting from local:...) are packed into the output *.vdjca file on align step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics.
  • new mixin options to adjust tag refinement whitelists with analyze: --set-whitelist and --reset-whitelist
  • removed refineTagsAndSort options -w and --whitelist; corresponding deprecation error message printed if used
  • new grouping feature for exportClones, allowing to normalize values for -readFraction and -uniqueTagFraction ... columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell.
  • new mixin options --add-export-clone-table-splitting, --reset-export-clone-table-splitting, --add-export-clone-grouping and --reset-export-clone-grouping
  • improved sensitivity of findAlleles command
  • add tags info in exportAlignmentsPretty and exportClonesPretty
  • add --chains filter for exportShmTrees, exportShmTreesWithNodes, exportShmTreesNewick and exportPlots shmTrees commands
  • fixed old bug #353, now all aligners favor leftmost J gene in situations where multiple genes can ve found in the sequence (i.e. mis-spliced mRNA)
  • fixes exception in align happening for not-parsed sequences with writeFailedAlignments=true
  • new filter and parameter added in assemblePartial; parameter name is minimalNOverlapShare, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangement
  • default paired-end overlap parameters changed to slightly more relaxed version
  • better criteria for alignments to be accepted for the assemblePartial procedure
  • fixed NPE in assemblePartial executed for the data without C-gene alignment settings
  • fixed rare exception in exportAirr command
  • by default exports show messages like 'region_not_covered' for data that can't be extracted (requesting -nFeature for not covered region or not existed tag). Option --not-covered-as-empty will save previous behaviour
  • info about genes with enough data to find allele was added into report of findAlleles and description of alleles
  • fixed error message appearing when analysis parameter already assigned to null is overridden by null using the -O... option
  • fixed wrong reporting of number of trimmed letters from the right side of R1 and R2 sequence
  • fixed error message about repeated generic mixin overrides
  • fixed error of exportClones with some arguments
  • fixes for report indention artefacts
  • fixed bug when chains filter set to ALL in exportAlignments was preventing not-aligned records to be exported
  • fixed runtime exception in assemble rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly
  • fixed bug leading to incorrect mixin option ordering during it's application to parameters bundle
  • minor change to the contigAssembly filtering parametrization
  • added mix-in --export-productive-clones-only
  • warning message about automatically set -Xmx.. JVM option in mixcr script
  • safer automatic value for -Xms..
  • fix: added species flag to 10x, nanopore and smart-seq2 presets