You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
❗ Breaking changes
Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option --assemble-clonotypes-by [feature]or --assemble-contigs-by [feature] for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option --assemble-contigs-by-cell for single-cell data or --assemble-longest-contigs for RNA-seq/Exom-seq data.
🚀 Major fixes and upgrades
Fixed assemble behavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated.
Significant improvement of V genes assignment precision. To facilitate this improvement assemble and assembleContigs steps now have individual relativeMeanScore and maxHits parameters.
Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in assemble now is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data).
Now options --dont-correct-tag-with-name <tag_name> or --dont-correct-tag-type (Molecule|Cell|Sample) can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes.
Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters rightForceRealignmentTrigger or leftForceRealignmentTrigger in cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell).
Default input quality filter in assemble (badQualityThreshold) stage was decreased to 10, improving total analysis yield
Added validation for assembleCells that input files should be assembled by fixed feature
Export of trees and tree nodes now support imputed features
Fixed parsing of optional arguments for exportShmTreesWithNodes: -nMutationsRelative, -aaMutations, -nMutations, -aaMutationsRelative, -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed parsing of optional arguments for exportClones and exportAlignments: -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
Fixed possible errors on exporting amino acid mutations in exportShmTreesWithNodes
Fixed list of required options in listPresets command
Fixed error on building trees in case of JBeginTrimmed started before CDR3Begin
Fixed usage --remove-step qc
Added --remove-qc-check option
Remove -topChains field from exportShmTreesWithNodes command. Use -chains instead
Removed default splitting clones by V and J for presets where clones are assembled by full-length.
Fixed NullPointerException in some cases of building trees by SC+bulk data
Fixed java.lang.IllegalArgumentException: While adding VEndTrimmed in exportClones
Fixed combination trees step in findShmTrees: in some cases trees weren't combined even if it could be
Fixed NoSuchElementException in some cases of SC combining of trees
Fixed export of -jBestIdentityPercent in exportShmTreesWithNodes
Added validation on export -aaFeature for features containing UTR
Fixed usage of command exportPlots shmTrees
Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from findShmTrees
Fixed IllegalStateException on removing unnecessary genes on findAlleles
Added --dont-remove-unused-genes option to findAlleles command
Adjustment consensus assembly (in assemble) parameters for single cell presets
Command groupClones was renamed to assembleCells. Old name is working, but it's hidden from help. Also report and output file names in analyze step were renamed accordingly.
Fixed calculation of germline for VCDR3Part and JCDR3Part in case of indels inside CDR3
Fixed export of trees if data assembled by a feature with reference point having offset
Export of VJJunction gemline in shmTrees exports now export mrca as most plausible content
Fixed parsing and alignment of reads longer than 30 Kbase
downsample now supports molecule variant in --downsampling option
Fixed naming of output files of downsample command
--output-not-used-reads of analyze command now works with bam input files too, alongside --not-aligned-(R1|R2) and --not-parsed-(R1|R2) of align command
Fix replaceWildcards behaviour on parsing BAM. Previous behaviour resulted in discarding of the quality scores on align
v_call, d_call, j_call and c_call columns in AIRR now output only best hit, not the whole list
Stable behavior of replaceWildcards. Before it depended on the position of read in a file, now it depends on read content only
If sample sheet supplied by --sample-sheet[-strict] option has * symbol after tag name, it will be preserved
Names of the following human TRAV genes were changed:
TRAV14DV4 -> TRAV14/DV4
TRAV23DV6 -> TRAV23/DV6
TRAV29DV5 -> TRAV29/DV5
TRAV36DV7 -> TRAV36/DV7
TRAV38-1DV8 -> TRAV38-1/DV8
Correct mapping of V-gene UTRs in Alpaca reference
📚 New Presets
Added preset takara-mouse-rna-bcr-umi-smarseq for new Takara SMART-Seq Mouse BCR (with UMIs) kit
Added preset idt-human-rna-bcr-umi-archer and idt-human-rna-tcr-umi-archer for IDT Archer kits
Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics: cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1, cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1, cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1