Release MiXCR v4.7.0 · milaboratory/mixcr

❗ Breaking changes

Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option --assemble-clonotypes-by [feature]or --assemble-contigs-by [feature] for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option --assemble-contigs-by-cell for single-cell data or --assemble-longest-contigs for RNA-seq/Exom-seq data.

🚀 Major fixes and upgrades

Fixed assemble behavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated.
Significant improvement of V genes assignment precision. To facilitate this improvement assemble and assembleContigs steps now have individual relativeMeanScore and maxHits parameters.
Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in assemble now is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data).
Now options --dont-correct-tag-with-name <tag_name> or --dont-correct-tag-type (Molecule|Cell|Sample) can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes.
Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters rightForceRealignmentTrigger or leftForceRealignmentTrigger in cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell).
MiTool-based contig pre-assembly step integrated into 10x-sc-xcr-vdj preset, significantly improving overall analysis performance.

🛠️ Other improvements & fixes

Default input quality filter in assemble (badQualityThreshold) stage was decreased to 10, improving total analysis yield
Added validation for assembleCells that input files should be assembled by fixed feature
Export of trees and tree nodes now support imputed features
Fixed parsing of optional arguments for exportShmTreesWithNodes: -nMutationsRelative, -aaMutations, -nMutations, -aaMutationsRelative, -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed parsing of optional arguments for exportClones and exportAlignments: -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed possible errors on exporting amino acid mutations in exportShmTreesWithNodes
Fixed list of required options in listPresets command
Fixed error on building trees in case of JBeginTrimmed started before CDR3Begin
Fixed usage --remove-step qc
Added --remove-qc-check option
Remove -topChains field from exportShmTreesWithNodes command. Use -chains instead
Removed default splitting clones by V and J for presets where clones are assembled by full-length.
Fixed NullPointerException in some cases of building trees by SC+bulk data
Fixed java.lang.IllegalArgumentException: While adding VEndTrimmed in exportClones
Fixed combination trees step in findShmTrees: in some cases trees weren't combined even if it could be
Fixed NoSuchElementException in some cases of SC combining of trees
Fixed export of -jBestIdentityPercent in exportShmTreesWithNodes
Added validation on export -aaFeature for features containing UTR
Fixed usage of command exportPlots shmTrees
Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from findShmTrees
Fixed IllegalStateException on removing unnecessary genes on findAlleles
Added --dont-remove-unused-genes option to findAlleles command
Adjustment consensus assembly (in assemble) parameters for single cell presets
Command groupClones was renamed to assembleCells. Old name is working, but it's hidden from help. Also report and output file names in analyze step were renamed accordingly.
Fixed calculation of germline for VCDR3Part and JCDR3Part in case of indels inside CDR3
Fixed export of trees if data assembled by a feature with reference point having offset
Export of VJJunction gemline in shmTrees exports now export mrca as most plausible content
Fixed parsing and alignment of reads longer than 30 Kbase
downsample now supports molecule variant in --downsampling option
Fixed naming of output files of downsample command
--output-not-used-reads of analyze command now works with bam input files too, alongside --not-aligned-(R1|R2) and --not-parsed-(R1|R2) of align command
Fix replaceWildcards behaviour on parsing BAM. Previous behaviour resulted in discarding of the quality scores on align
v_call, d_call, j_call and c_call columns in AIRR now output only best hit, not the whole list
Stable behavior of replaceWildcards. Before it depended on the position of read in a file, now it depends on read content only
If sample sheet supplied by --sample-sheet[-strict] option has * symbol after tag name, it will be preserved

🧬 Reference gene library changes

IG reference for new species:
- Rabbit (IGH, IGK, IGL)
- Sheep (IGH, IGK, IGL)
Human reference corrections:
- Duplicated entries removed: IGHV1-69*00, IGHV1-69*01, IGHV3-23*00, IGHV3-23*01
- Fix for CDR3Begin position in IGHV4-30-4
- Fix for FR1Begin position in TRBV21-1
- Names of the following human TRAV genes were changed:
  - TRAV14DV4 -> TRAV14/DV4
  - TRAV23DV6 -> TRAV23/DV6
  - TRAV29DV5 -> TRAV29/DV5
  - TRAV36DV7 -> TRAV36/DV7
  - TRAV38-1DV8 -> TRAV38-1/DV8
Correct mapping of V-gene UTRs in Alpaca reference

📚 New Presets

Added preset takara-mouse-rna-bcr-umi-smarseq for new Takara SMART-Seq Mouse BCR (with UMIs) kit
Added preset idt-human-rna-bcr-umi-archer and idt-human-rna-tcr-umi-archer for IDT Archer kits
Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics: cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1, cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1, cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MiXCR v4.7.0

❗ Breaking changes

🚀 Major fixes and upgrades

🛠️ Other improvements & fixes

🧬 Reference gene library changes

📚 New Presets