07 Aug 19:37

github-actions

976ba14

MiXCR v4.7.0 Latest

Latest

❗ Breaking changes

Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option --assemble-clonotypes-by [feature]or --assemble-contigs-by [feature] for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option --assemble-contigs-by-cell for single-cell data or --assemble-longest-contigs for RNA-seq/Exom-seq data.

🚀 Major fixes and upgrades

Fixed assemble behavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated.
Significant improvement of V genes assignment precision. To facilitate this improvement assemble and assembleContigs steps now have individual relativeMeanScore and maxHits parameters.
Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in assemble now is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data).
Now options --dont-correct-tag-with-name <tag_name> or --dont-correct-tag-type (Molecule|Cell|Sample) can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes.
Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters rightForceRealignmentTrigger or leftForceRealignmentTrigger in cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell).
MiTool-based contig pre-assembly step integrated into 10x-sc-xcr-vdj preset, significantly improving overall analysis performance.

🛠️ Other improvements & fixes

Default input quality filter in assemble (badQualityThreshold) stage was decreased to 10, improving total analysis yield
Added validation for assembleCells that input files should be assembled by fixed feature
Export of trees and tree nodes now support imputed features
Fixed parsing of optional arguments for exportShmTreesWithNodes: -nMutationsRelative, -aaMutations, -nMutations, -aaMutationsRelative, -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed parsing of optional arguments for exportClones and exportAlignments: -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed possible errors on exporting amino acid mutations in exportShmTreesWithNodes
Fixed list of required options in listPresets command
Fixed error on building trees in case of JBeginTrimmed started before CDR3Begin
Fixed usage --remove-step qc
Added --remove-qc-check option
Remove -topChains field from exportShmTreesWithNodes command. Use -chains instead
Removed default splitting clones by V and J for presets where clones are assembled by full-length.
Fixed NullPointerException in some cases of building trees by SC+bulk data
Fixed java.lang.IllegalArgumentException: While adding VEndTrimmed in exportClones
Fixed combination trees step in findShmTrees: in some cases trees weren't combined even if it could be
Fixed NoSuchElementException in some cases of SC combining of trees
Fixed export of -jBestIdentityPercent in exportShmTreesWithNodes
Added validation on export -aaFeature for features containing UTR
Fixed usage of command exportPlots shmTrees
Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from findShmTrees
Fixed IllegalStateException on removing unnecessary genes on findAlleles
Added --dont-remove-unused-genes option to findAlleles command
Adjustment consensus assembly (in assemble) parameters for single cell presets
Command groupClones was renamed to assembleCells. Old name is working, but it's hidden from help. Also report and output file names in analyze step were renamed accordingly.
Fixed calculation of germline for VCDR3Part and JCDR3Part in case of indels inside CDR3
Fixed export of trees if data assembled by a feature with reference point having offset
Export of VJJunction gemline in shmTrees exports now export mrca as most plausible content
Fixed parsing and alignment of reads longer than 30 Kbase
downsample now supports molecule variant in --downsampling option
Fixed naming of output files of downsample command
--output-not-used-reads of analyze command now works with bam input files too, alongside --not-aligned-(R1|R2) and --not-parsed-(R1|R2) of align command
Fix replaceWildcards behaviour on parsing BAM. Previous behaviour resulted in discarding of the quality scores on align
v_call, d_call, j_call and c_call columns in AIRR now output only best hit, not the whole list
Stable behavior of replaceWildcards. Before it depended on the position of read in a file, now it depends on read content only
If sample sheet supplied by --sample-sheet[-strict] option has * symbol after tag name, it will be preserved

🧬 Reference gene library changes

IG reference for new species:
- Rabbit (IGH, IGK, IGL)
- Sheep (IGH, IGK, IGL)
Human reference corrections:
- Duplicated entries removed: IGHV1-69*00, IGHV1-69*01, IGHV3-23*00, IGHV3-23*01
- Fix for CDR3Begin position in IGHV4-30-4
- Fix for FR1Begin position in TRBV21-1
- Names of the following human TRAV genes were changed:
  - TRAV14DV4 -> TRAV14/DV4
  - TRAV23DV6 -> TRAV23/DV6
  - TRAV29DV5 -> TRAV29/DV5
  - TRAV36DV7 -> TRAV36/DV7
  - TRAV38-1DV8 -> TRAV38-1/DV8
Correct mapping of V-gene UTRs in Alpaca reference

📚 New Presets

Added preset takara-mouse-rna-bcr-umi-smarseq for new Takara SMART-Seq Mouse BCR (with UMIs) kit
Added preset idt-human-rna-bcr-umi-archer and idt-human-rna-tcr-umi-archer for IDT Archer kits
Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics: cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1, cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1, cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1

Assets 3

09 Dec 21:17

github-actions

v4.6.0

c9fafa4

MiXCR v4.6.0

🖇️ Combined Heavy+Light Somatic Hypermutation Trees from Single-Cell data

A special step is added in findShmTrees to combine heavy and light SHM trees utilizing information added to clonotypes by groupClones command. Nodes in resulting tree will contain both light and heavy chains. If there is no connection to a clone from a companion chain, a reconstructed sequence will be added.
Behaviour can be disabled with --dont-combine-tree-by-cells option to reconstruct separate heavy and light SHM trees
Added exportShmSingleCellTrees command that export one node per line. It there is several roots in a tree, data will be exported in a different columns.
Added -subtreeId to tree exports to differentiate part of trees from different chains
exportShmTreesWithNodes and exportShmTrees commands will export subtrees with different chains at separate rows.

🚀 Other major upgrades

Changes in `groupClones` command

Previous algorithm replaced with a new one that have better way of working with contamination, can detect multi-mappers (when one cell barcode marks two different cells) and can work with non-functional clones
Some clones are now explicitly marked as contamination. This information is available as a separate label in exportClones in groupId column. Such clones can be filtered out from export by --filter-out-group-types contamination
More important algorithm performance metrics are added to the report
Fix for behaviour leading to clones with undefiened group being split by cell barcodes

New characteristics in SHM trees exports

-subtreeId for determination of different chains in the same tree
-numberOfClonesInTree [forChain] Number of uniq clones in the SHM tree.
-numberOfNodesWithClones Number of nodes with clones, i.e. nodes with different clone sequences.
-totalReadsCountInTree [forChain] Total sum of read counts of clones in the SHM tree.
-totalUniqueTagCountInTree (Molecule|Cell|Sample) [forChain] Total count of unique tags in the SHM tree with specified type.
-chains Chain type of the tree
-treeHeight Height of the tree
-vGene, -jGene, -vFamily, -jFamily - in previous version thous were exported only for nodes with clones
-vBestIdentityPercent, -jBestIdentityPercent, -isOOF and -isProductive now exported for reconstructed nodes too

New characteristics in clonotype export

-aaLength and -allAALength is available alongside -nLength and -allNLength
-aaMutationsRate is available alongside -nMutationsRate
Added optional arg germline in -nFeature, -aaFeature, -nLength, -aaLength in exportClones, exportAlignments and exportCloneGroups. It allows to export a sequence of the germline instead of a sequence of the gene.
For all mutation exports (excluding -mutationsDetailed ) added optional filter by mutation type: ... [(substitutions|indels|inserts|deletions)]
Added -nMutationsCount, -aaMutationsCount, -allNMutationsCount, -allAAMutationsCount for all relatable exports
For mutation exports in exportShmTreesWithNodes (germline|mrca|parent) option is now optional. Will be export mutations from germline by default
Added --export-clone-groups-sort-chains-by mixin
Nucleotide mutations now could be exported for features that contain VCDR3Part, DCDR3Part or JCDR3Part
Now -nLength, -nMutationsCount, -nMutationsRate can be calculated for multiple gene features (e.g. -nMutationsRate VRegionTrimmed,JRegionTrimmed)
Added --export-clone-groups-sort-chains-by mixin with type of sorting of clones for determination of the primary and the secondary chains. It applies to exportCloneGroups command. By default, it's Auto (by UMI if it's available, by Read otherwise; previous default value was Read)
Added --filter-out-group-types mixin to filter-out clones having certain clone group assignment kind: found, undefined or contamination. It applies to exportClones command
Now exportCloneGroups by default will export groups in separate files for IG, TRAB, TRGD and mixed. This behaviour could be switched off by using --reset-export-clone-table-splitting or single --export-clone-groups-for-cell-type. In case of several --export-clone-groups-for-cell-type every cell type will be exported in separate file.
In case of --export-clone-groups-for-cell-type in exportCloneGroups all mixed or unmatched groups will be filtered out.
Added read and Molecule fraction columns to single cell exportClones output.

🧬 Reference library upgrades

Previous TRAD meta-chain split into TRA and TRD as it should be. Chain assignment for clonotypes based on J genes.
Rebuild allelic reference for human IGH, TRB, TRA and TRD chains. Now allelic names correspond to the IUIS nomenclature.
Human IGK Vend coordinates corrected.
UTR5Begin coordinates added to the following mouse genes: IGKV23-1, IGKV20-101-2, IGKV14-130, IGKV8-28, TRGV2

📚 Preset updates

The milab-human-rna-tcr-umi-race preset has been updated: now clones are assembled by default based on the CDR3, in line with the manufacturer's recommended read length.
The flairr-seq-bcr preset has been updated: now the preset sets species to human by default according to a built-in tag pattern with primer sequences.
The following presets have been added to cover Ivivoscribe assay panels: invivoscribe-human-dna-trg-lymphotrack,invivoscribe-human-dna-trb-lymphotrack, invivoscribe-human-dna-igk-lymphotrack,invivoscribe-human-dna-ighv-leader-lymphotrack,invivoscribe-human-dna-igh-fr3-lymphotrack, invivoscribe-human-dna-igh-fr2-lymphotrack,invivoscribe-human-dna-igh-fr1-lymphotrack,invivoscribe-human-dna-igh-fr123-lymphotrack.
The following presets have been added for mouse Thermofisher assays: thermofisher-mouse-rna-tcb-ampliseq-sr,thermofisher-mouse-dna-tcb-ampliseq-sr,thermofisher-mouse-rna-igh-ampliseq-sr,thermofisher-mouse-dna-igh-ampliseq-sr.
Preset for SMARTer Human scTCR a/b Profiling Kit: takara-sc-human-rna-tcr-smarter
The milab-human-rna-ig-umi-multiplex preset has been updated: the pattern now trims fewer nucleotides, which facilitates CDR1 identification. The splits by V and J genes have been removed as redundant due to the full-length assembling feature.

🛠️ Minor improvements & fixes

More strict Combining trees step in findShmTrees command
Better calculation of indel mutations between nodes in process of building shm trees
Increased percent of successful alignment-aided overlaps by removing unnecessary overlap region quality sum threshold
Impossible export of germline sequence for VJJunction in shmTrees exports now produces an error
Parameter validation fix in -nMutationsRate
Fix for -nMutationsRate if region is not covered for the clone
Fix for the formal of exportAlignmentsPretty broken in the previous version
Fix for IllegalArgumentException in exportAlignmentsPretty for cases where translation can't be performed
Fix error for analyze executed with -f and --output-not-used-reads at the same time
Resolutions of wildcards are excluded from calculation of -nMutationsRate for CDR3 in exportShmTreesWithNodes
Fix OutOfMemory exception in command extend with .vdjca input
In findShmTrees filter for productive only clones now check for stop codons in all features, not only in CDR3
Change default value for filter for productive clones in findShmTrees to false (was true before)
Add option --productive-only to findShmTrees
Fixed parsing of --export-clone-groups-for-cell-type parameter
Fixed usage of slice command on clnx files that weren't ordered by id.
In slice now default behaviour is to keep original ids. Previous behaviour available with --reassign-ids option
Fixed parsing of composite gene features with offsets like --assemble-clonotypes-by [VDJRegion,CBegin(0,10)]
Fixed parent directory creation for output of exportClonesOverlap
Fixed exportAirr in case of a clone with CDR3 that don't have VCDR3Part and JCDR3Part
Optimize calculation of ranks in clone set. Speeds up export with tags and several other places.
Added clone_id column in exportAirr
Fixed exportClones in case of splitting file by tag:... if there is a clone that have several tags of requested level
Fixed calculation of -nMutationsCount, -nMutationsRate, -aaMutationsCount and -aaMutationsRate. Previously in some cases it was calculated on different region, from what was requested.
Added CellBarcodesWithFoundGroups for groupClones QC checks
New filter --no-feature in exportAlignmentsPretty
Fixed reporting in align, now coverage takes into account alignment-aided overlap

❗ Breaking changes

Option --build-from <path> was removed from findShmTrees command

Deprecations of export options

-lengthOf now is deprecated, use -nLength instead
-allLengthOf now is deprecated, use -allNLength instead
-mutationRate now is deprecated, use -nMutationsRate instead

Assets 3

22 Sep 12:58

github-actions

v4.5.0

cdb24b4

MiXCR v4.5.0

🚀 New features

Multi-chain clone assembly for single-cell data

Now MiXCR calculates Heavy-Light antibody and Alpha-Beta and Gamma-Delta TCR combined clones for single-cell data. Two new commands were introduced to enable this functionality:

groupClones: calculates multi-chain clones from assembled clonotypes and writes result in a binary format;
exportCloneGroups: export information about combined clonotypes.

All single-cell presets now automatically produce combined multi-chain output in both binary and textual formats, see files with names matching *.clone.groups.tsv pattern in the output folder.

New characteristics in clonotype export

Export biochemical properties of gene regions with -biochemicalProperty <geneFeature> <property> or -baseBiochemicalProperties <geneFeature> export options. Available in export for alignments, clones and SHM tree nodes. Available properties: Hydrophobicity, Charge, Polarity, Volume, Strength, MjEnergy, Kf1, Kf2, Kf3, Kf4, Kf5, Kf6, Kf7, Kf8, Kf9, Kf10, Rim, Surface, Turn, Alpha, Beta, Core, Disorder, N2Strength, N2Hydrophobicity, N2Volume, N2Surface.
Export isotype with -isotype [<(primary|subclass|auto)>]
Export -mutationRate [<gene_feature>] in exportShmTreesWithNodes, exportClones and exportCloneGroups command: number of mutations relative to corresponding germline divided by the target sequence size. For exportClones and exportCloneGroups CDR3 is not included in calculation.

Support for wider set of input formats

Support for cram files as input for analyze and align commands. Optionally, a reference to the genome can be specified by --reference-for-cram
Fixed usage of BAM input for analyze and align, if file contains both paired and single reads

Algorithm enhancements

Global consensus assembly algorithm, applied in assemble to collapse UMI/Cell groups into contigs, now have much better seed selection empirical step for multi-consensus assembly scenarios. This significantly increases sensitivity during assembly of secondary consensuses from the same group of sequences.
New constrain in low-quality reads mapping procedure preventing cross-cell read mapping.

📚 Preset updates

Additional improvement of clone filters in 10x-sc-xcr-vdj preset.
Tag pattern upgrade for cellecta-human-rna-xcr-umi-drivermap-air. Now UMI includes a part of the C-gene primer to increase diversity, and R2 is also used for payload.
Assembling feature fix for irepertoire-human-rna-xcr-repseq-plus preset. Now {CDR2Begin:FR4End}.
New preset for BD full-length protocol with enhanced beads V2 featuring B384 whitelists: bd-sc-xcr-rhapsody-full-length-enhanced-bead-v2.
New preset for Takara Bio SMART-Seq Mouse TCR (with UMIs): takara-mouse-rna-tcr-umi-smarseq.
Presets for new Cellecta kits: cellecta-human-dna-xcr-umi-drivermap-air, cellecta-human-rna-xcr-full-length-umi-drivermap-air, cellecta-mouse-rna-xcr-umi-drivermap-air.
Presets for iRepertoire RepSeq+ kits with UMI: irepertoire-mouse-rna-xcr-repseq-plus-umi-pe, irepertoire-human-rna-xcr-repseq-plus-umi-se,irepertoire-human-rna-xcr-repseq-plus-umi-pe.
isotype field added to exportClones for presets supporting isotype identification.
Split by C-gene enabled in thermofisher-human-rna-igh-oncomine-lr and cellecta-human-rna-xcr-umi-drivermap-air presets to facilitate isotype separation.
Default consensus assembly parameters maxNormalizedAlignmentPenalty and altSeedPenaltyTolerance are adjusted to increase sensitivity.
The --split-by-sample option is now set to true by default for all align presets, as well as all presets that inherit from it. This new default behavior applies unless it is directly overridden in the preset or with --dont-split-by-sample mix-in.
exportAlignments now reports UMI and/or Cell barcodes by default for presets with barcodes.

🛠️ Minor improvements & fixes

Fixed possible crash with --dry-run option in analyze
More informative help message that appears when using a deprecated preset and incorrectly suggests using --assemble-contigs-by instead of --assemble-clonotypes-by.
When split-by-tags is enabled, exportClone and exportShmTreesWithNodes now output read count as the sum of reads for given tags selection, more complicated formula was used in previous versions
exportAlignments by default now include the column topChains. exportClones function reports topChains for single cell presets.
Fixed calculation of geneFamilyName for genes like IGHA*00 (without the number before * symbol)
Better formatting in listPresets command. Added grouping by vendor, labels and optional filtering
Validation of input types in align or analyze by given tag pattern

Assets 3

30 Jul 08:30

github-actions

v4.4.2

cd7ea52

MiXCR v4.4.2

🚀 New features

Two-fold align step speedup for most of the protocol-specific presets (see the list below)
Import tags from sequence headers by parsing its content with regular expressions
Highly optimized generic presets for amplicon/rna-seq sequence by long-read sequencers such as Pacific Biosciences and Oxford Nanopore: generic-ont, generic-ont-with-umi, generic-pacbio, generic-pacbio-with-umi

🐞 Bug fixes

Fixes crush when input contains quality scores > 70
Fixes excessive memory consumption issue for long read data
Fix for crush in assemble with UMI tags but with consensus assembler turned off

👷 Other minor adjustments

Long-read J gene aligner optimization
FLAIRR-seq preset optimized with new long-read-optimized aligner
Quality trimming is disabled for long-read aligner
Removed qc reports for clustered alignments and clones
The following presets have been optimized by specifying a single reverse/direct alignment mode and now work faster:
takara-human-rna-bcr-umi-smartseq, takara-human-rna-bcr-umi-smarter,takara-human-rna-tcr-umi-smartseq,takara-human-rna-tcr-umi-smarter-v2,takara-human-rna-tcr-smarter,takara-mouse-rna-bcr-smarter,takara-mouse-rna-tcr-smarter,10x-sc-xcr-vdj,10x-sc-5gex,abhelix-human-rna-xcr,bd-human-sc-xcr-rhapsody-cdr3,bd-mouse-sc-xcr-rhapsody-cdr3,bd-sc-xcr-rhapsody-full-length,cellecta-human-rna-xcr-umi-drivermap-air,illumina-human-rna-trb-ampliseq-sr,illumina-human-rna-trb-ampliseq-plus,irepertoire-human-rna-xcr-repseq-sr,irepertoire-human-rna-xcr-repseq-lr,irepertoire-mouse-rna-xcr-repseq-sr,irepertoire-mouse-rna-xcr-repseq-lr,irepertoire-human-rna-xcr-repseq-plus,irepertoire-mouse-rna-xcr-repseq-plus,irepertoire-human-dna-xcr-repseq-sr,irepertoire-human-dna-xcr-repseq-lr,milab-human-rna-ig-umi-multiplex,milab-human-rna-tcr-umi-race,milab-human-rna-tcr-umi-multiplex,milab-human-dna-tcr-multiplex,milab-human-dna-xcr-7genes-multiplex,milab-mouse-rna-tcr-umi-race,neb-human-rna-xcr-umi-nebnext,qiagen-human-rna-tcr-umi-qiaseq

Assets 3

06 Jul 17:49

github-actions

v4.4.1

f7cd556

MiXCR v4.4.1

🐞 Bug fixes

resolves issue encountered while executing MiXCR on Windows OS

Assets 3

05 Jul 18:51

github-actions

v4.4.0

29721c9

MiXCR v4.4.0

🚀 New features

Built-in alleles database

MiXCR features robust support for inferring donor-specific allelic variants of V and J genes from NGS data, using the findAlleles command. With this new release, we introduce a comprehensive built-in database of human alleles. Now, the findAlleles command will utilize known allele names from this integrated library. Feel free to explore our database at https://vdj.online/library.

New rigorous quality checks

MiXCR now offers detailed insights into the quality of input data with its new quality control (QC) checks. A comprehensive list of checks provides complete information about the data and facilitates immediate feedback to the wet lab if any issues are detected

Convenient way to build custom libraries

Now one can build gene segment reference library for de-novo libraries or for chimeric model animals with just a single buildLibrary command. Check out our updated guide.

mixcr buildLibrary \
    --v-genes-from-fasta v-genes.IGH.fasta \
    --v-gene-feature VRegion \
    --j-genes-from-fasta j-genes.IGH.fasta \
    --d-genes-from-fasta d-genes.IGH.fasta \ # optional
    --c-genes-from-fasta c-genes.IGH.fasta \ # optional
    --chain IGH \
    --species phocoena \
    phocoena-IGH.json.gz

Comprehensive support of sample sheets

Now one can pass sample sheet directly to MiXCR analyze command as input. This way one can easily run MiXCR for arbitrary structure of input files, demultiplexed or not, with any type of multiplexing used:

mixcr analyze generic-sc-ht-vdj-amplicon --species hsa \
    sample-sheet.csv \
    output_prefix

🤩 New presets

Support of MiLaboratories Human 7 Genes DNA Multiplex: milab-human-dna-xcr-7genes-multiplex
Support of Parse Bio Evercode Whole Transcriptome presets: parsebio-sc-3gex-evercode-wt-mini, parsebio-sc-3gex-evercode-wt and parsebio-sc-3gex-evercode-wt-mega
Support of FLAIRR-Seq protocol via flairr-seq preset
New generic single cell presets:
- Low throughput (e.g. micro-wells) amplicon-based single cell:
  - No UMIs: generic-sc-lt-vdj-amplicon
  - With UMIs: generic-sc-lt-vdj-amplicon-umi
- Low throughput (e.g. micro-wells) single cell with fragmentation (RNA-Seq):
  - No UMIs: generic-sc-lt-vdj-fragmented
  - With UMIs: generic-sc-lt-vdj-fragmented-umi
- High throughput (e.g. droplets) amplicon-based single cell:
  - No UMIs: generic-sc-ht-vdj-amplicon
  - With UMIs: generic-sc-ht-vdj-amplicon-umi
- High throughput (e.g. droplets) single cell with fragmentation (RNA-Seq):
  - No UMIs: generic-sc-ht-vdj-fragmented
  - With UMIs: generic-sc-ht-vdj-fragmented-umi
- Reconstructing VDJ from generic gene expression data:
  - No UMIs: generic-sc-gex
  - With UMIs: generic-sc-gex-umi
New Biomed2 primer sets: biomed2-human-rna-igkl, biomed2-human-rna-trbdg.

💪 Major changes

Improved aligner parameters for all protocols. We spent in total more than 100,000 CPU/hours running optimization. As a result alignment rate is better for most of the protocols, especially in the case of average data quality.
Adds new minSequenceCount parameter for k-mer filter, allowing construction of more flexible filtering pipelines with better fallback behaviour for under-sequenced libraries.
Now full sample sheet with input file names can be provided as an input to the pipeline.
Sample sheets provided both with --sample-sheet mixin and as a pipeline input, will be fuzzy matched against the data, allowing for one substitutions in unambiguous cases. This behaviour can be turned off by using --sample-sheet-strict mixin instead, or by adding a --strict-sample-sheet-matching option if full sample sheet input is used as pipeline input.
New commands: mixcr qc, mixcr buildLibrary) , mixcr mergeLibrary, mixcr debugLibrary)
Various major improvements to sequencing and PCR error correction algorithms for tags and clonotypes:
- tag refinement now uses average quality in statistical inference; this is the correct approach from the mathematical point of view, and it slightly increases performance judging by better consensus assembly downstream
- statistical inference in PCR error correction redone from scratch, now it takes into account aggregated quality scores of clonotypes, which makes the procedure automatically adapt to low quality samples and better perform in many marginal cases in both UMI and non-UMI protocols
- better algorithm for quality score aggregation in clonotype assembly
- better algorithm for quality score aggregation in consensus assembly
Mechanism to apply different tag transformations on the align step. Transformations include mappings, string and sequence manipulations and various arithmetic operations. This feature allows to fit single-cell scenarios where multiple well-known barcodes marks the same cell, allows to convert sequence barcodes to textual representation to adopt different barcode naming schemas used in some protocols, convert multiple barcodes to single cell id. Feature is currently used in presets for analysis of data from Parse Bioscience and BD Rhapsody single-cell platforms.
Special mechanism to allow for NaN values in metrics in group filters (used in minSequenceCount parameter in k-mer filter, see below).
Added fallback behaviour for under-sequenced libraries

🐞 Bug fixes

Fix for naming of intermediate files and reports produced by analyze if target folder is specified
Tag pattern now is also searched in reverse strand for single-ended input with --tag-parse-unstranded
fix for value in report line Reads dropped due to low quality, percent of total report string
Fixed bug not allowing to parse more than two reads with tag pattern
Fixed bug when --chains is used with exportClonesOverlap
Fixed for export... - tag quality field added back to export columns

👷 Minor fixes and improvements

Added gene feature coverage in alignment report
On Linux platforms default calculation of -Xmx now based on "available" memory (previously "free" was used)
New gene aligner parameter edgeRealignmentMinScoreOverride for more sensitive alignments for short paired-end reads
Report values downstream align now calculate percents relative to the number of reads in the sample rather than the
total number of reads in multi-sample analysis
Options helping with advanced analysis of data quality and consensus assembly process added
to assemble (--consensus-alignments, --consensus-state-stat, --downsample-consensus-state-stat)
and analyze (--output-consensus-alignments, --output-consensus-state-stat, --downsample-consensus-state-stat)
Better tag pattern search projection representation in reports
findAlleles now recalculate functionality of de novo found alleles
Better algorithm to calculating checksum of VDJC library
Additional report string "Aligned reads processed" in assemble report
Added options --by-feature and --by-gene to sortClones
Added options -rankByReads and -rankByTag <(Molecule|Cell|Sample)> to exportClones and exportShmTreesWithNodes.txt
Export readIds in exportAlignments by default
Added recalculation functionality for de-novo found alleles in findAlleles
Add info about CDR3 in generated hash for de-novo alleles
Remove de-novo alleles that are actually the same
findAlleles will remove not used genes from the library (genes that not represented in given donor)
Make --chains optional in downsampling command and allow multiple input
Write empty file on exportClones if file doesn't contain any clones
Better exception messages on incorrect inputs for export commands
In exportClones write no_d_gene if requested VDJunction, DCDR3Part or DJJunction in absence of D hit
Columns in exportReportsTable now covers most of significant statistics from reports

🐬 Docker image changes

Custom entry-point of the image removed, and now is set to /bin/bash. Now one needs to specify mixcr command at the beginning of argument list:

Old: docker run ghcr.io/milaboratory/mixcr/mixcr analyze ...

New: docker run ghcr.io/milaboratory/mixcr/mixcr mixcr analyze ...
New image is based on Amazon Corretto which in turn is based on Amazon Linux 2. If customization is required for the image, one now need to use yum package manager instead of apt/apt-get.

With old image:
```
FROM ghcr.io/milaboratory/mixcr/mixcr:4.3.2
# ...
RUN apt-get install -y wget
# ...
```
With new image:
```
FROM ghcr.io/milaboratory/mixcr/mixcr:4.4....
```

Assets 3

11 Apr 18:50

github-actions

v4.3.2

123d699

MiXCR v4.3.2

🐞 This update addresses a significant issue that first appeared in version 4.3.0, which caused incorrect column names for FR4 nucleotide and amino acid sequences in export tables (e.g. nSeqJGeneWithoutCDR3Part instead of nSeqFR4).

Minor improvements

findAlleles now works much faster for extremely diverse samples

Other bug fixes

fixed inconsistency in reports and behaviour for assemble when badQualityThreshold=0
fixes X axes label for k-mer filters in tags filtering QC plots
adds threshold lines for tags filtering QC plots for composite operators (like operators with cumtop fallbacks)
fixes NPE crash for chain usage plots if chimeric sequences present in the stats

Assets 3

27 Mar 13:45

github-actions

v4.3.1

6018130

MiXCR v4.3.1

Minor improvements

added -isOOF <gene_feature> column to export
added -hasStops <gene_feature> column to export
added -isProductive <gene_feature> column to export
improvements of report and alleles description table for findAlleles command
removing of unused genes from result library in findAlleles command
findAlleles now more resilient to case when most allele variants of donor differ from *00 alleles in a library

Bug fixes

fixed AssertionError in findAlleles command with --output-template argument
fixed wrong behaviour with inferMinRecordsPerConsensus == true and cell level assembly
fixed minRecordsPerConsensus inference mechanism for new filtering features introduced in previous version (4.3.0)

Assets 3

17 Mar 16:46

github-actions

v4.3.0

96be4ef

MiXCR v4.3.0

Key changes

Improved Otsu's method with less stringency for automated histogram thresholding for barcoded data. It allows to recover more "good" UMI groups. The old filter was replaced by new one in all presets for airr-seq and single cell V(D)J protocols that utilize UMI: Cellecta, Milaboratories, NEB, Qiagen, Takara, 10x Genomics, BD, Singleron.
New group filter operators allowing to mix thresholds form multiple operators, taking lowest or highest value and applying it. This allows to create more universal filtering strategies, robust to edge cases like undersequencing of barcodes.
Added default fallback threshold for UMI filtering: if automated UMI thresholding leaves less than 85% of reads, then MiXCR will preserve UMIs to always keep minimum 85% of reads.

Presets

New preset for Seq-Well VDJ data
New presets for NEBNext® Immune Sequencing Kit TCR and BCR profiling for data with both TCR and BCR.
Improved Takara human TCR and BCR presets

Reference Library

New IGHV genes added to human reference: IGHV3-30-3, IGHV4-30-4, IGHV1-69-2, IGHV2-70D, IGHV3-30-5
IGHV1-69D renamed to IGHV1-69

Minor improvements

Threshold rounding in cumtop and top-n filters
Support of sequence-end token ($) in tag pattern matching algorithm
Added discardAmbiguousNucleotideCalls parameters for contig assembly
Added field -cellId in commands exportClones and exportAlignments
Added fields cell_id, umi_count and consensus_count to exportAirr command
Better text descriptions in align and assemble reports
exportAirr command now split clones by cells if there is cell barcodes in the data
Replace analyze options --not-aligned-.. and --not-parsed-.. with one option --output-not-used-reads
Fix comma-separated chains input in postanalysis --chainsoption
Split column with tagValue (like tagValueCELL) to two columns: tagValue<tag_name> and tagQuality<tag_name>
Support of system proxy settings for license
# character now can be used to separate groupName from group matcher in file expansion mechanist (additionally to :), allowing multi-sample analysis on Windows
Fixed usage of composite features for --assemble-contigs-by
Removed some restrictions for possible combinations of gene features used in analysis and export
Fixed behaviour of empirical alignment assignment in assemble if --write-all was used in align

Assets 3

26 Jan 20:06

github-actions

v4.2.0

b0f194e

MiXCR v4.2.0

Built-in support for new protocols

BD Rpahsody full-length protocol
Smart-Seq2 single cell RNA-Seq protocol
Oxford Nanopore long-read technology

Sample barcodes

Complete support of sample barcodes that may be picked up from all possible sources:

from names of input files;
from index I1/I2 FASTQ files;
from sequence header lines;
from inside the tag pattern.

Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.

Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.

New robust filters for single cell and molecular barcoded data

For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.

For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.

List of all changes

Sample barcodes

support for more than two fastq files as input (I1 and I2 reads support)
multiple possible sources of data for sample resolution:
- sequences extracted with tag pattern (including those coming from I1 and I2 reads)
- samples can be based on specific pattern variant (with multi-variant patterns, separated by ||, allows to easily adopt MiGEC-style-like sample files)
- parts of file names (extracted using file name expansion mechanism)
flexible sample table matching criteria
- matching multiple tags
- matching variant id from multi-variant tag patterns
special --sample-table mixin option allowing for flexible sample table definition in a tab-delimited table form
special --infer-sample-table mixin option to infer sample table for sample tags from file name expansion
special generic presets for multiplexed data analysis scenarios (e.g. generic-tcr-amplicon-separate-samples-umi)
align command now optionally allows to split output alignments by sample into separate vdjca files
exportClones command now supports splitting the output into multiple files by sample
analyze command supports new splitting behaviour of the align command, separately running all the analysis steps for all the output files (if splitting is enabled)

Filters and error correction

preset for 10X VDJ BCR enhanced with k-mer-based filter to eliminate rare cross-cell contamination from plasmatic cells
new advanced thresholding algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering
rework of clustering step aimed at PCR / reverse-transcription error correction in assemble, now it correctly handles any possible tag combination (sample, cell or molecule)
new feature to add histogram preprocessing steps in automated thresholding

Quality trimming

turn on default quality trimming (trimmingQualityThreshold changed from 0 to 10), this setting showed better performance in many real world use-cases

Reference library

reference V/D/J/C gene library upgrade to repseqio v2.1 (see changelog)

New commands

added command exportReportsTable that prints file in tabular format with report data from commands that were run

Other

optimized aligner parameters for long-read data
fixed system temp folder detection behaviour, now mixcr respects TMPDIR environment variable
rework of preset-mixin logic, now external presets (like those starting from local:...) are packed into the output *.vdjca file on align step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics.
new mixin options to adjust tag refinement whitelists with analyze: --set-whitelist and --reset-whitelist
removed refineTagsAndSort options -w and --whitelist; corresponding deprecation error message printed if used
new grouping feature for exportClones, allowing to normalize values for -readFraction and -uniqueTagFraction ... columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell.
new mixin options --add-export-clone-table-splitting, --reset-export-clone-table-splitting, --add-export-clone-grouping and --reset-export-clone-grouping
improved sensitivity of findAlleles command
add tags info in exportAlignmentsPretty and exportClonesPretty
add --chains filter for exportShmTrees, exportShmTreesWithNodes, exportShmTreesNewick and exportPlots shmTrees commands
fixed old bug #353, now all aligners favor leftmost J gene in situations where multiple genes can ve found in the sequence (i.e. mis-spliced mRNA)
fixes exception in align happening for not-parsed sequences with writeFailedAlignments=true
new filter and parameter added in assemblePartial; parameter name is minimalNOverlapShare, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangement
default paired-end overlap parameters changed to slightly more relaxed version
better criteria for alignments to be accepted for the assemblePartial procedure
fixed NPE in assemblePartial executed for the data without C-gene alignment settings
fixed rare exception in exportAirr command
by default exports show messages like 'region_not_covered' for data that can't be extracted (requesting -nFeature for not covered region or not existed tag). Option --not-covered-as-empty will save previous behaviour
info about genes with enough data to find allele was added into report of findAlleles and description of alleles
fixed error message appearing when analysis parameter already assigned to null is overridden by null using the -O... option
fixed wrong reporting of number of trimmed letters from the right side of R1 and R2 sequence
fixed error message about repeated generic mixin overrides
fixed error of exportClones with some arguments
fixes for report indention artefacts
fixed bug when chains filter set to ALL in exportAlignments was preventing not-aligned records to be exported
fixed runtime exception in assemble rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly
fixed bug leading to incorrect mixin option ordering during it's application to parameters bundle
minor change to the contigAssembly filtering parametrization
added mix-in --export-productive-clones-only
warning message about automatically set -Xmx.. JVM option in mixcr script
safer automatic value for -Xms..
fix: added species flag to 10x, nanopore and smart-seq2 presets

Assets 3

Releases: milaboratory/mixcr

MiXCR v4.7.0

❗ Breaking changes

🚀 Major fixes and upgrades

🛠️ Other improvements & fixes

🧬 Reference gene library changes

📚 New Presets

MiXCR v4.6.0

🖇️ Combined Heavy+Light Somatic Hypermutation Trees from Single-Cell data

🚀 Other major upgrades

Changes in groupClones command

New characteristics in SHM trees exports

New characteristics in clonotype export

🧬 Reference library upgrades

📚 Preset updates

🛠️ Minor improvements & fixes

❗ Breaking changes

Deprecations of export options

MiXCR v4.5.0

🚀 New features

Multi-chain clone assembly for single-cell data

New characteristics in clonotype export

Support for wider set of input formats

Algorithm enhancements

📚 Preset updates

🛠️ Minor improvements & fixes

MiXCR v4.4.2

🚀 New features

🐞 Bug fixes

👷 Other minor adjustments

MiXCR v4.4.1

🐞 Bug fixes

MiXCR v4.4.0

🚀 New features

Built-in alleles database

New rigorous quality checks

Convenient way to build custom libraries

Comprehensive support of sample sheets

🤩 New presets

💪 Major changes

🐞 Bug fixes

👷 Minor fixes and improvements

🐬 Docker image changes

MiXCR v4.3.2

Minor improvements

Other bug fixes

MiXCR v4.3.1

Minor improvements

Bug fixes

MiXCR v4.3.0

Key changes

Presets

Reference Library

Minor improvements

MiXCR v4.2.0

Built-in support for new protocols

Sample barcodes

New robust filters for single cell and molecular barcoded data

List of all changes

Sample barcodes

Filters and error correction

Quality trimming

Reference library

New commands

Other

Changes in `groupClones` command