Skip to content

Releases: milaboratory/mixcr

MiXCR v4.7.0

07 Aug 19:37
976ba14
Compare
Choose a tag to compare

❗ Breaking changes

  • Starting from version 4.7.0 of MiXCR, users are required to specify the assembling feature for all presets in cases where it's not defined by the protocol. This can be achieved using either the option --assemble-clonotypes-by [feature]or --assemble-contigs-by [feature] for fragmented data (such as RNA-seq or 10x VDJ data). This ensures consistency in assembling features when integrating various samples or types of samples, such as 10x single-cell VDJ and AIRR sequencing data, for downstream analyses like inferring alleles or building SHM trees. The previous behavior for fragmented data, which aimed to assemble as long sequences as possible, can still be achieved with either the option --assemble-contigs-by-cell for single-cell data or --assemble-longest-contigs for RNA-seq/Exom-seq data.

🚀 Major fixes and upgrades

  • Fixed assemble behavior for single-cell data, before the fix, in rare cases consensuses were assembled from reads coming from different cells. Now reads from different cells are strictly isolated.
  • Significant improvement of V genes assignment precision. To facilitate this improvement assemble and assembleContigs steps now have individual relativeMeanScore and maxHits parameters.
  • Improved robustness against expression level differences between TCR/IG chains. Consensus assembly in assemble now is performed separately for each chain. This change is specifically important for single-cell presets with cell-level assembly (most of the MiXCR presets for single-cell data).
  • Now options --dont-correct-tag-with-name <tag_name> or --dont-correct-tag-type (Molecule|Cell|Sample) can be specified to skip tag correction. It will trade off some analysis quality and error correction performance, for significantly lower memory and analysis time requirements, in deeply sequenced datasets with many Cell and Molecular barcodes.
  • Ability to trigger realignments of left or right reads boundaries with global alignment algorithm using parameters rightForceRealignmentTrigger or leftForceRealignmentTrigger in cases where reads do not cover the CDR3 regions (rescue alignments in case of fragmented data, like single-cell).
  • MiTool-based contig pre-assembly step integrated into 10x-sc-xcr-vdj preset, significantly improving overall analysis performance.

🛠️ Other improvements & fixes

  • Default input quality filter in assemble (badQualityThreshold) stage was decreased to 10, improving total analysis yield
  • Added validation for assembleCells that input files should be assembled by fixed feature
  • Export of trees and tree nodes now support imputed features
  • Fixed parsing of optional arguments for exportShmTreesWithNodes: -nMutationsRelative, -aaMutations, -nMutations, -aaMutationsRelative, -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
  • Fixed parsing of optional arguments for exportClones and exportAlignments: -allNMutations, -allAAMutations, -allNMutationsCount, -allAAMutationsCount.
  • Fixed possible errors on exporting amino acid mutations in exportShmTreesWithNodes
  • Fixed list of required options in listPresets command
  • Fixed error on building trees in case of JBeginTrimmed started before CDR3Begin
  • Fixed usage --remove-step qc
  • Added --remove-qc-check option
  • Remove -topChains field from exportShmTreesWithNodes command. Use -chains instead
  • Removed default splitting clones by V and J for presets where clones are assembled by full-length.
  • Fixed NullPointerException in some cases of building trees by SC+bulk data
  • Fixed java.lang.IllegalArgumentException: While adding VEndTrimmed in exportClones
  • Fixed combination trees step in findShmTrees: in some cases trees weren't combined even if it could be
  • Fixed NoSuchElementException in some cases of SC combining of trees
  • Fixed export of -jBestIdentityPercent in exportShmTreesWithNodes
  • Added validation on export -aaFeature for features containing UTR
  • Fixed usage of command exportPlots shmTrees
  • Fixed topology of trees: before common V and J mutations were included in the root node, now root includes only reconstructed NDN. Previous behavior lead to underestimated distance from the germline. Now sequence for the germline exports with common mutations. To fully apply the fix to previously analyzed data, rerun the pipeline starting from findShmTrees
  • Fixed IllegalStateException on removing unnecessary genes on findAlleles
  • Added --dont-remove-unused-genes option to findAlleles command
  • Adjustment consensus assembly (in assemble) parameters for single cell presets
  • Command groupClones was renamed to assembleCells. Old name is working, but it's hidden from help. Also report and output file names in analyze step were renamed accordingly.
  • Fixed calculation of germline for VCDR3Part and JCDR3Part in case of indels inside CDR3
  • Fixed export of trees if data assembled by a feature with reference point having offset
  • Export of VJJunction gemline in shmTrees exports now export mrca as most plausible content
  • Fixed parsing and alignment of reads longer than 30 Kbase
  • downsample now supports molecule variant in --downsampling option
  • Fixed naming of output files of downsample command
  • --output-not-used-reads of analyze command now works with bam input files too, alongside --not-aligned-(R1|R2) and --not-parsed-(R1|R2) of align command
  • Fix replaceWildcards behaviour on parsing BAM. Previous behaviour resulted in discarding of the quality scores on align
  • v_call, d_call, j_call and c_call columns in AIRR now output only best hit, not the whole list
  • Stable behavior of replaceWildcards. Before it depended on the position of read in a file, now it depends on read content only
  • If sample sheet supplied by --sample-sheet[-strict] option has * symbol after tag name, it will be preserved

🧬 Reference gene library changes

  • IG reference for new species:
    • Rabbit (IGH, IGK, IGL)
    • Sheep (IGH, IGK, IGL)
  • Human reference corrections:
    • Duplicated entries removed: IGHV1-69*00, IGHV1-69*01, IGHV3-23*00, IGHV3-23*01
    • Fix for CDR3Begin position in IGHV4-30-4
    • Fix for FR1Begin position in TRBV21-1
    • Names of the following human TRAV genes were changed:
      • TRAV14DV4 -> TRAV14/DV4
      • TRAV23DV6 -> TRAV23/DV6
      • TRAV29DV5 -> TRAV29/DV5
      • TRAV36DV7 -> TRAV36/DV7
      • TRAV38-1DV8 -> TRAV38-1/DV8
  • Correct mapping of V-gene UTRs in Alpaca reference

📚 New Presets

  • Added preset takara-mouse-rna-bcr-umi-smarseq for new Takara SMART-Seq Mouse BCR (with UMIs) kit
  • Added preset idt-human-rna-bcr-umi-archer and idt-human-rna-tcr-umi-archer for IDT Archer kits
  • Presets for Cellecta kits that include TCR/BCR Spike-in mix QC metrics: cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-1-1-1, cellecta-human-rna-xcr-umi-drivermap-air-bcr-spikein-16-4-1, cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-1-1-1,cellecta-human-rna-xcr-umi-drivermap-air-tcr-spikein-16-4-1

MiXCR v4.6.0

09 Dec 21:17
Compare
Choose a tag to compare

🖇️ Combined Heavy+Light Somatic Hypermutation Trees from Single-Cell data

  • A special step is added in findShmTrees to combine heavy and light SHM trees utilizing information added to clonotypes by groupClones command. Nodes in resulting tree will contain both light and heavy chains. If there is no connection to a clone from a companion chain, a reconstructed sequence will be added.
  • Behaviour can be disabled with --dont-combine-tree-by-cells option to reconstruct separate heavy and light SHM trees
  • Added exportShmSingleCellTrees command that export one node per line. It there is several roots in a tree, data will be exported in a different columns.
  • Added -subtreeId to tree exports to differentiate part of trees from different chains
  • exportShmTreesWithNodes and exportShmTrees commands will export subtrees with different chains at separate rows.

🚀 Other major upgrades

Changes in groupClones command

  • Previous algorithm replaced with a new one that have better way of working with contamination, can detect multi-mappers (when one cell barcode marks two different cells) and can work with non-functional clones
  • Some clones are now explicitly marked as contamination. This information is available as a separate label in exportClones in groupId column. Such clones can be filtered out from export by --filter-out-group-types contamination
  • More important algorithm performance metrics are added to the report
  • Fix for behaviour leading to clones with undefiened group being split by cell barcodes

New characteristics in SHM trees exports

  • -subtreeId for determination of different chains in the same tree
  • -numberOfClonesInTree [forChain] Number of uniq clones in the SHM tree.
  • -numberOfNodesWithClones Number of nodes with clones, i.e. nodes with different clone sequences.
  • -totalReadsCountInTree [forChain] Total sum of read counts of clones in the SHM tree.
  • -totalUniqueTagCountInTree (Molecule|Cell|Sample) [forChain] Total count of unique tags in the SHM tree with specified type.
  • -chains Chain type of the tree
  • -treeHeight Height of the tree
  • -vGene, -jGene, -vFamily, -jFamily - in previous version thous were exported only for nodes with clones
  • -vBestIdentityPercent, -jBestIdentityPercent, -isOOF and -isProductive now exported for reconstructed nodes too

New characteristics in clonotype export

  • -aaLength and -allAALength is available alongside -nLength and -allNLength
  • -aaMutationsRate is available alongside -nMutationsRate
  • Added optional arg germline in -nFeature, -aaFeature, -nLength, -aaLength in exportClones, exportAlignments and exportCloneGroups. It allows to export a sequence of the germline instead of a sequence of the gene.
  • For all mutation exports (excluding -mutationsDetailed ) added optional filter by mutation type: ... [(substitutions|indels|inserts|deletions)]
  • Added -nMutationsCount, -aaMutationsCount, -allNMutationsCount, -allAAMutationsCount for all relatable exports
  • For mutation exports in exportShmTreesWithNodes (germline|mrca|parent) option is now optional. Will be export mutations from germline by default
  • Added --export-clone-groups-sort-chains-by mixin
  • Nucleotide mutations now could be exported for features that contain VCDR3Part, DCDR3Part or JCDR3Part
  • Now -nLength, -nMutationsCount, -nMutationsRate can be calculated for multiple gene features (e.g. -nMutationsRate VRegionTrimmed,JRegionTrimmed)
  • Added --export-clone-groups-sort-chains-by mixin with type of sorting of clones for determination of the primary and the secondary chains. It applies to exportCloneGroups command. By default, it's Auto (by UMI if it's available, by Read otherwise; previous default value was Read)
  • Added --filter-out-group-types mixin to filter-out clones having certain clone group assignment kind: found, undefined or contamination. It applies to exportClones command
  • Now exportCloneGroups by default will export groups in separate files for IG, TRAB, TRGD and mixed. This behaviour could be switched off by using --reset-export-clone-table-splitting or single --export-clone-groups-for-cell-type. In case of several --export-clone-groups-for-cell-type every cell type will be exported in separate file.
  • In case of --export-clone-groups-for-cell-type in exportCloneGroups all mixed or unmatched groups will be filtered out.
  • Added read and Molecule fraction columns to single cell exportClones output.

🧬 Reference library upgrades

  • Previous TRAD meta-chain split into TRA and TRD as it should be. Chain assignment for clonotypes based on J genes.
  • Rebuild allelic reference for human IGH, TRB, TRA and TRD chains. Now allelic names correspond to the IUIS nomenclature.
  • Human IGK Vend coordinates corrected.
  • UTR5Begin coordinates added to the following mouse genes: IGKV23-1, IGKV20-101-2, IGKV14-130, IGKV8-28, TRGV2

📚 Preset updates

  • The milab-human-rna-tcr-umi-race preset has been updated: now clones are assembled by default based on the CDR3, in line with the manufacturer's recommended read length.
  • The flairr-seq-bcr preset has been updated: now the preset sets species to human by default according to a built-in tag pattern with primer sequences.
  • The following presets have been added to cover Ivivoscribe assay panels: invivoscribe-human-dna-trg-lymphotrack,invivoscribe-human-dna-trb-lymphotrack, invivoscribe-human-dna-igk-lymphotrack,invivoscribe-human-dna-ighv-leader-lymphotrack,invivoscribe-human-dna-igh-fr3-lymphotrack, invivoscribe-human-dna-igh-fr2-lymphotrack,invivoscribe-human-dna-igh-fr1-lymphotrack,invivoscribe-human-dna-igh-fr123-lymphotrack.
  • The following presets have been added for mouse Thermofisher assays: thermofisher-mouse-rna-tcb-ampliseq-sr,thermofisher-mouse-dna-tcb-ampliseq-sr,thermofisher-mouse-rna-igh-ampliseq-sr,thermofisher-mouse-dna-igh-ampliseq-sr.
  • Preset for SMARTer Human scTCR a/b Profiling Kit: takara-sc-human-rna-tcr-smarter
  • The milab-human-rna-ig-umi-multiplex preset has been updated: the pattern now trims fewer nucleotides, which facilitates CDR1 identification. The splits by V and J genes have been removed as redundant due to the full-length assembling feature.

🛠️ Minor improvements & fixes

  • More strict Combining trees step in findShmTrees command
  • Better calculation of indel mutations between nodes in process of building shm trees
  • Increased percent of successful alignment-aided overlaps by removing unnecessary overlap region quality sum threshold
  • Impossible export of germline sequence for VJJunction in shmTrees exports now produces an error
  • Parameter validation fix in -nMutationsRate
  • Fix for -nMutationsRate if region is not covered for the clone
  • Fix for the formal of exportAlignmentsPretty broken in the previous version
  • Fix for IllegalArgumentException in exportAlignmentsPretty for cases where translation can't be performed
  • Fix error for analyze executed with -f and --output-not-used-reads at the same time
  • Resolutions of wildcards are excluded from calculation of -nMutationsRate for CDR3 in exportShmTreesWithNodes
  • Fix OutOfMemory exception in command extend with .vdjca input
  • In findShmTrees filter for productive only clones now check for stop codons in all features, not only in CDR3
  • Change default value for filter for productive clones in findShmTrees to false (was true before)
  • Add option --productive-only to findShmTrees
  • Fixed parsing of --export-clone-groups-for-cell-type parameter
  • Fixed usage of slice command on clnx files that weren't ordered by id.
  • In slice now default behaviour is to keep original ids. Previous behaviour available with --reassign-ids option
  • Fixed parsing of composite gene features with offsets like --assemble-clonotypes-by [VDJRegion,CBegin(0,10)]
  • Fixed parent directory creation for output of exportClonesOverlap
  • Fixed exportAirr in case of a clone with CDR3 that don't have VCDR3Part and JCDR3Part
  • Optimize calculation of ranks in clone set. Speeds up export with tags and several other places.
  • Added clone_id column in exportAirr
  • Fixed exportClones in case of splitting file by tag:... if there is a clone that have several tags of requested level
  • Fixed calculation of -nMutationsCount, -nMutationsRate, -aaMutationsCount and -aaMutationsRate. Previously in some cases it was calculated on different region, from what was requested.
  • Added CellBarcodesWithFoundGroups for groupClones QC checks
  • New filter --no-feature in exportAlignmentsPretty
  • Fixed reporting in align, now coverage takes into account alignment-aided overlap

❗ Breaking changes

  • Option --build-from <path> was removed from findShmTrees command

Deprecations of export options

  • -lengthOf now is deprecated, use -nLength instead
  • -allLengthOf now is deprecated, use -allNLength instead
  • -mutationRate now is deprecated, use -nMutationsRate instead

MiXCR v4.5.0

22 Sep 12:58
Compare
Choose a tag to compare

🚀 New features

Multi-chain clone assembly for single-cell data

Now MiXCR calculates Heavy-Light antibody and Alpha-Beta and Gamma-Delta TCR combined clones for single-cell data. Two new commands were introduced to enable this functionality:

  • groupClones: calculates multi-chain clones from assembled clonotypes and writes result in a binary format;
  • exportCloneGroups: export information about combined clonotypes.

All single-cell presets now automatically produce combined multi-chain output in both binary and textual formats, see files with names matching *.clone.groups.tsv pattern in the output folder.

New characteristics in clonotype export

  • Export biochemical properties of gene regions with -biochemicalProperty <geneFeature> <property> or -baseBiochemicalProperties <geneFeature> export options. Available in export for alignments, clones and SHM tree nodes. Available properties: Hydrophobicity, Charge, Polarity, Volume, Strength, MjEnergy, Kf1, Kf2, Kf3, Kf4, Kf5, Kf6, Kf7, Kf8, Kf9, Kf10, Rim, Surface, Turn, Alpha, Beta, Core, Disorder, N2Strength, N2Hydrophobicity, N2Volume, N2Surface.
  • Export isotype with -isotype [<(primary|subclass|auto)>]
  • Export -mutationRate [<gene_feature>] in exportShmTreesWithNodes, exportClones and exportCloneGroups command: number of mutations relative to corresponding germline divided by the target sequence size. For exportClones and exportCloneGroups CDR3 is not included in calculation.

Support for wider set of input formats

  • Support for cram files as input for analyze and align commands. Optionally, a reference to the genome can be specified by --reference-for-cram
  • Fixed usage of BAM input for analyze and align, if file contains both paired and single reads

Algorithm enhancements

  • Global consensus assembly algorithm, applied in assemble to collapse UMI/Cell groups into contigs, now have much better seed selection empirical step for multi-consensus assembly scenarios. This significantly increases sensitivity during assembly of secondary consensuses from the same group of sequences.
  • New constrain in low-quality reads mapping procedure preventing cross-cell read mapping.

📚 Preset updates

  • Additional improvement of clone filters in 10x-sc-xcr-vdj preset.
  • Tag pattern upgrade for cellecta-human-rna-xcr-umi-drivermap-air. Now UMI includes a part of the C-gene primer to increase diversity, and R2 is also used for payload.
  • Assembling feature fix for irepertoire-human-rna-xcr-repseq-plus preset. Now {CDR2Begin:FR4End}.
  • New preset for BD full-length protocol with enhanced beads V2 featuring B384 whitelists: bd-sc-xcr-rhapsody-full-length-enhanced-bead-v2.
  • New preset for Takara Bio SMART-Seq Mouse TCR (with UMIs): takara-mouse-rna-tcr-umi-smarseq.
  • Presets for new Cellecta kits: cellecta-human-dna-xcr-umi-drivermap-air, cellecta-human-rna-xcr-full-length-umi-drivermap-air, cellecta-mouse-rna-xcr-umi-drivermap-air.
  • Presets for iRepertoire RepSeq+ kits with UMI: irepertoire-mouse-rna-xcr-repseq-plus-umi-pe, irepertoire-human-rna-xcr-repseq-plus-umi-se,irepertoire-human-rna-xcr-repseq-plus-umi-pe.
  • isotype field added to exportClones for presets supporting isotype identification.
  • Split by C-gene enabled in thermofisher-human-rna-igh-oncomine-lr and cellecta-human-rna-xcr-umi-drivermap-air presets to facilitate isotype separation.
  • Default consensus assembly parameters maxNormalizedAlignmentPenalty and altSeedPenaltyTolerance are adjusted to increase sensitivity.
  • The --split-by-sample option is now set to true by default for all align presets, as well as all presets that inherit from it. This new default behavior applies unless it is directly overridden in the preset or with --dont-split-by-sample mix-in.
  • exportAlignments now reports UMI and/or Cell barcodes by default for presets with barcodes.

🛠️ Minor improvements & fixes

  • Fixed possible crash with --dry-run option in analyze
  • More informative help message that appears when using a deprecated preset and incorrectly suggests using --assemble-contigs-by instead of --assemble-clonotypes-by.
  • When split-by-tags is enabled, exportClone and exportShmTreesWithNodes now output read count as the sum of reads for given tags selection, more complicated formula was used in previous versions
  • exportAlignments by default now include the column topChains. exportClones function reports topChains for single cell presets.
  • Fixed calculation of geneFamilyName for genes like IGHA*00 (without the number before * symbol)
  • Better formatting in listPresets command. Added grouping by vendor, labels and optional filtering
  • Validation of input types in align or analyze by given tag pattern

MiXCR v4.4.2

30 Jul 08:30
Compare
Choose a tag to compare

🚀 New features

  • Two-fold align step speedup for most of the protocol-specific presets (see the list below)
  • Import tags from sequence headers by parsing its content with regular expressions
  • Highly optimized generic presets for amplicon/rna-seq sequence by long-read sequencers such as Pacific Biosciences and Oxford Nanopore: generic-ont, generic-ont-with-umi, generic-pacbio, generic-pacbio-with-umi

🐞 Bug fixes

  • Fixes crush when input contains quality scores > 70
  • Fixes excessive memory consumption issue for long read data
  • Fix for crush in assemble with UMI tags but with consensus assembler turned off

👷 Other minor adjustments

  • Long-read J gene aligner optimization
  • FLAIRR-seq preset optimized with new long-read-optimized aligner
  • Quality trimming is disabled for long-read aligner
  • Removed qc reports for clustered alignments and clones
  • The following presets have been optimized by specifying a single reverse/direct alignment mode and now work faster:
    takara-human-rna-bcr-umi-smartseq, takara-human-rna-bcr-umi-smarter,takara-human-rna-tcr-umi-smartseq,takara-human-rna-tcr-umi-smarter-v2,takara-human-rna-tcr-smarter,takara-mouse-rna-bcr-smarter,takara-mouse-rna-tcr-smarter,10x-sc-xcr-vdj,10x-sc-5gex,abhelix-human-rna-xcr,bd-human-sc-xcr-rhapsody-cdr3,bd-mouse-sc-xcr-rhapsody-cdr3,bd-sc-xcr-rhapsody-full-length,cellecta-human-rna-xcr-umi-drivermap-air,illumina-human-rna-trb-ampliseq-sr,illumina-human-rna-trb-ampliseq-plus,irepertoire-human-rna-xcr-repseq-sr,irepertoire-human-rna-xcr-repseq-lr,irepertoire-mouse-rna-xcr-repseq-sr,irepertoire-mouse-rna-xcr-repseq-lr,irepertoire-human-rna-xcr-repseq-plus,irepertoire-mouse-rna-xcr-repseq-plus,irepertoire-human-dna-xcr-repseq-sr,irepertoire-human-dna-xcr-repseq-lr,milab-human-rna-ig-umi-multiplex,milab-human-rna-tcr-umi-race,milab-human-rna-tcr-umi-multiplex,milab-human-dna-tcr-multiplex,milab-human-dna-xcr-7genes-multiplex,milab-mouse-rna-tcr-umi-race,neb-human-rna-xcr-umi-nebnext,qiagen-human-rna-tcr-umi-qiaseq

MiXCR v4.4.1

06 Jul 17:49
Compare
Choose a tag to compare

🐞 Bug fixes

  • resolves issue encountered while executing MiXCR on Windows OS

MiXCR v4.4.0

05 Jul 18:51
Compare
Choose a tag to compare

🚀 New features

Built-in alleles database

MiXCR features robust support for inferring donor-specific allelic variants of V and J genes from NGS data, using the findAlleles command. With this new release, we introduce a comprehensive built-in database of human alleles. Now, the findAlleles command will utilize known allele names from this integrated library. Feel free to explore our database at https://vdj.online/library.

image

New rigorous quality checks

MiXCR now offers detailed insights into the quality of input data with its new quality control (QC) checks. A comprehensive list of checks provides complete information about the data and facilitates immediate feedback to the wet lab if any issues are detected

image

Convenient way to build custom libraries

Now one can build gene segment reference library for de-novo libraries or for chimeric model animals with just a single buildLibrary command. Check out our updated guide.

mixcr buildLibrary \
    --v-genes-from-fasta v-genes.IGH.fasta \
    --v-gene-feature VRegion \
    --j-genes-from-fasta j-genes.IGH.fasta \
    --d-genes-from-fasta d-genes.IGH.fasta \ # optional
    --c-genes-from-fasta c-genes.IGH.fasta \ # optional
    --chain IGH \
    --species phocoena \
    phocoena-IGH.json.gz

Comprehensive support of sample sheets

Now one can pass sample sheet directly to MiXCR analyze command as input. This way one can easily run MiXCR for arbitrary structure of input files, demultiplexed or not, with any type of multiplexing used:

mixcr analyze generic-sc-ht-vdj-amplicon --species hsa \
    sample-sheet.csv \
    output_prefix

🤩 New presets

  • Support of MiLaboratories Human 7 Genes DNA Multiplex: milab-human-dna-xcr-7genes-multiplex

  • Support of Parse Bio Evercode Whole Transcriptome presets: parsebio-sc-3gex-evercode-wt-mini, parsebio-sc-3gex-evercode-wt and parsebio-sc-3gex-evercode-wt-mega

  • Support of FLAIRR-Seq protocol via flairr-seq preset

  • New generic single cell presets:

    • Low throughput (e.g. micro-wells) amplicon-based single cell:

      • No UMIs: generic-sc-lt-vdj-amplicon
      • With UMIs: generic-sc-lt-vdj-amplicon-umi
    • Low throughput (e.g. micro-wells) single cell with fragmentation (RNA-Seq):

      • No UMIs: generic-sc-lt-vdj-fragmented
      • With UMIs: generic-sc-lt-vdj-fragmented-umi
    • High throughput (e.g. droplets) amplicon-based single cell:

      • No UMIs: generic-sc-ht-vdj-amplicon
      • With UMIs: generic-sc-ht-vdj-amplicon-umi
    • High throughput (e.g. droplets) single cell with fragmentation (RNA-Seq):

      • No UMIs: generic-sc-ht-vdj-fragmented
      • With UMIs: generic-sc-ht-vdj-fragmented-umi
    • Reconstructing VDJ from generic gene expression data:

      • No UMIs: generic-sc-gex
      • With UMIs: generic-sc-gex-umi
  • New Biomed2 primer sets: biomed2-human-rna-igkl, biomed2-human-rna-trbdg.

💪 Major changes

  • Improved aligner parameters for all protocols. We spent in total more than 100,000 CPU/hours running optimization. As a result alignment rate is better for most of the protocols, especially in the case of average data quality.

  • Adds new minSequenceCount parameter for k-mer filter, allowing construction of more flexible filtering pipelines with better fallback behaviour for under-sequenced libraries.

  • Now full sample sheet with input file names can be provided as an input to the pipeline.

  • Sample sheets provided both with --sample-sheet mixin and as a pipeline input, will be fuzzy matched against the data, allowing for one substitutions in unambiguous cases. This behaviour can be turned off by using --sample-sheet-strict mixin instead, or by adding a --strict-sample-sheet-matching option if full sample sheet input is used as pipeline input.

  • New commands: mixcr qc, mixcr buildLibrary) , mixcr mergeLibrary, mixcr debugLibrary)

  • Various major improvements to sequencing and PCR error correction algorithms for tags and clonotypes:

    • tag refinement now uses average quality in statistical inference; this is the correct approach from the mathematical point of view, and it slightly increases performance judging by better consensus assembly downstream
    • statistical inference in PCR error correction redone from scratch, now it takes into account aggregated quality scores of clonotypes, which makes the procedure automatically adapt to low quality samples and better perform in many marginal cases in both UMI and non-UMI protocols
    • better algorithm for quality score aggregation in clonotype assembly
    • better algorithm for quality score aggregation in consensus assembly
  • Mechanism to apply different tag transformations on the align step. Transformations include mappings, string and sequence manipulations and various arithmetic operations. This feature allows to fit single-cell scenarios where multiple well-known barcodes marks the same cell, allows to convert sequence barcodes to textual representation to adopt different barcode naming schemas used in some protocols, convert multiple barcodes to single cell id. Feature is currently used in presets for analysis of data from Parse Bioscience and BD Rhapsody single-cell platforms.

  • Special mechanism to allow for NaN values in metrics in group filters (used in minSequenceCount parameter in k-mer filter, see below).

  • Added fallback behaviour for under-sequenced libraries

🐞 Bug fixes

  • Fix for naming of intermediate files and reports produced by analyze if target folder is specified
  • Tag pattern now is also searched in reverse strand for single-ended input with --tag-parse-unstranded
  • fix for value in report line Reads dropped due to low quality, percent of total report string
  • Fixed bug not allowing to parse more than two reads with tag pattern
  • Fixed bug when --chains is used with exportClonesOverlap
  • Fixed for export... - tag quality field added back to export columns

👷 Minor fixes and improvements

  • Added gene feature coverage in alignment report
  • On Linux platforms default calculation of -Xmx now based on "available" memory (previously "free" was used)
  • New gene aligner parameter edgeRealignmentMinScoreOverride for more sensitive alignments for short paired-end reads
  • Report values downstream align now calculate percents relative to the number of reads in the sample rather than the
    total number of reads in multi-sample analysis
  • Options helping with advanced analysis of data quality and consensus assembly process added
    to assemble (--consensus-alignments, --consensus-state-stat, --downsample-consensus-state-stat)
    and analyze (--output-consensus-alignments, --output-consensus-state-stat, --downsample-consensus-state-stat)
  • Better tag pattern search projection representation in reports
  • findAlleles now recalculate functionality of de novo found alleles
  • Better algorithm to calculating checksum of VDJC library
  • Additional report string "Aligned reads processed" in assemble report
  • Added options --by-feature and --by-gene to sortClones
  • Added options -rankByReads and -rankByTag <(Molecule|Cell|Sample)> to exportClones and exportShmTreesWithNodes.txt
  • Export readIds in exportAlignments by default
  • Added recalculation functionality for de-novo found alleles in findAlleles
  • Add info about CDR3 in generated hash for de-novo alleles
  • Remove de-novo alleles that are actually the same
  • findAlleles will remove not used genes from the library (genes that not represented in given donor)
  • Make --chains optional in downsampling command and allow multiple input
  • Write empty file on exportClones if file doesn't contain any clones
  • Better exception messages on incorrect inputs for export commands
  • In exportClones write no_d_gene if requested VDJunction, DCDR3Part or DJJunction in absence of D hit
  • Columns in exportReportsTable now covers most of significant statistics from reports

🐬 Docker image changes

  • Custom entry-point of the image removed, and now is set to /bin/bash. Now one needs to specify mixcr command at the beginning of argument list:

    Old: docker run ghcr.io/milaboratory/mixcr/mixcr analyze ...

    New: docker run ghcr.io/milaboratory/mixcr/mixcr mixcr analyze ...

  • New image is based on Amazon Corretto which in turn is based on Amazon Linux 2. If customization is required for the image, one now need to use yum package manager instead of apt/apt-get.

    With old image:

    FROM ghcr.io/milaboratory/mixcr/mixcr:4.3.2
    # ...
    RUN apt-get install -y wget
    # ...

    With new image:

    FROM ghcr.io/milaboratory/mixcr/mixcr:4.4....
Read more

MiXCR v4.3.2

11 Apr 18:50
Compare
Choose a tag to compare

🐞 This update addresses a significant issue that first appeared in version 4.3.0, which caused incorrect column names for FR4 nucleotide and amino acid sequences in export tables (e.g. nSeqJGeneWithoutCDR3Part instead of nSeqFR4).

Minor improvements

  • findAlleles now works much faster for extremely diverse samples

Other bug fixes

  • fixed inconsistency in reports and behaviour for assemble when badQualityThreshold=0
  • fixes X axes label for k-mer filters in tags filtering QC plots
  • adds threshold lines for tags filtering QC plots for composite operators (like operators with cumtop fallbacks)
  • fixes NPE crash for chain usage plots if chimeric sequences present in the stats

MiXCR v4.3.1

27 Mar 13:45
Compare
Choose a tag to compare

Minor improvements

  • added -isOOF <gene_feature> column to export
  • added -hasStops <gene_feature> column to export
  • added -isProductive <gene_feature> column to export
  • improvements of report and alleles description table for findAlleles command
  • removing of unused genes from result library in findAlleles command
  • findAlleles now more resilient to case when most allele variants of donor differ from *00 alleles in a library

Bug fixes

  • fixed AssertionError in findAlleles command with --output-template argument
  • fixed wrong behaviour with inferMinRecordsPerConsensus == true and cell level assembly
  • fixed minRecordsPerConsensus inference mechanism for new filtering features introduced in previous version (4.3.0)

MiXCR v4.3.0

17 Mar 16:46
Compare
Choose a tag to compare

Key changes

  • Improved Otsu's method with less stringency for automated histogram thresholding for barcoded data. It allows to recover more "good" UMI groups. The old filter was replaced by new one in all presets for airr-seq and single cell V(D)J protocols that utilize UMI: Cellecta, Milaboratories, NEB, Qiagen, Takara, 10x Genomics, BD, Singleron.
  • New group filter operators allowing to mix thresholds form multiple operators, taking lowest or highest value and applying it. This allows to create more universal filtering strategies, robust to edge cases like undersequencing of barcodes.
  • Added default fallback threshold for UMI filtering: if automated UMI thresholding leaves less than 85% of reads, then MiXCR will preserve UMIs to always keep minimum 85% of reads.

Presets

  • New preset for Seq-Well VDJ data
  • New presets for NEBNext® Immune Sequencing Kit TCR and BCR profiling for data with both TCR and BCR.
  • Improved Takara human TCR and BCR presets

Reference Library

  • New IGHV genes added to human reference: IGHV3-30-3, IGHV4-30-4, IGHV1-69-2, IGHV2-70D, IGHV3-30-5
  • IGHV1-69D renamed to IGHV1-69

Minor improvements

  • Threshold rounding in cumtop and top-n filters
  • Support of sequence-end token ($) in tag pattern matching algorithm
  • Added discardAmbiguousNucleotideCalls parameters for contig assembly
  • Added field -cellId in commands exportClones and exportAlignments
  • Added fields cell_id, umi_count and consensus_count to exportAirr command
  • Better text descriptions in align and assemble reports
  • exportAirr command now split clones by cells if there is cell barcodes in the data
  • Replace analyze options --not-aligned-.. and --not-parsed-.. with one option --output-not-used-reads
  • Fix comma-separated chains input in postanalysis --chainsoption
  • Split column with tagValue (like tagValueCELL) to two columns: tagValue<tag_name> and tagQuality<tag_name>
  • Support of system proxy settings for license
  • # character now can be used to separate groupName from group matcher in file expansion mechanist (additionally to :), allowing multi-sample analysis on Windows
  • Fixed usage of composite features for --assemble-contigs-by
  • Removed some restrictions for possible combinations of gene features used in analysis and export
  • Fixed behaviour of empirical alignment assignment in assemble if --write-all was used in align

MiXCR v4.2.0

26 Jan 20:06
b0f194e
Compare
Choose a tag to compare

Built-in support for new protocols

Sample barcodes

Complete support of sample barcodes that may be picked up from all possible sources:

  • from names of input files;
  • from index I1/I2 FASTQ files;
  • from sequence header lines;
  • from inside the tag pattern.

Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.

Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.

New robust filters for single cell and molecular barcoded data

For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.

For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.

List of all changes

Sample barcodes

  • support for more than two fastq files as input (I1 and I2 reads support)
  • multiple possible sources of data for sample resolution:
    • sequences extracted with tag pattern (including those coming from I1 and I2 reads)
    • samples can be based on specific pattern variant (with multi-variant patterns, separated by ||, allows to easily adopt MiGEC-style-like sample files)
    • parts of file names (extracted using file name expansion mechanism)
  • flexible sample table matching criteria
    • matching multiple tags
    • matching variant id from multi-variant tag patterns
  • special --sample-table mixin option allowing for flexible sample table definition in a tab-delimited table form
  • special --infer-sample-table mixin option to infer sample table for sample tags from file name expansion
  • special generic presets for multiplexed data analysis scenarios (e.g. generic-tcr-amplicon-separate-samples-umi)
  • align command now optionally allows to split output alignments by sample into separate vdjca files
  • exportClones command now supports splitting the output into multiple files by sample
  • analyze command supports new splitting behaviour of the align command, separately running all the analysis steps for all the output files (if splitting is enabled)

Filters and error correction

  • preset for 10X VDJ BCR enhanced with k-mer-based filter to eliminate rare cross-cell contamination from plasmatic cells
  • new advanced thresholding algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering
  • rework of clustering step aimed at PCR / reverse-transcription error correction in assemble, now it correctly handles any possible tag combination (sample, cell or molecule)
  • new feature to add histogram preprocessing steps in automated thresholding

Quality trimming

  • turn on default quality trimming (trimmingQualityThreshold changed from 0 to 10), this setting showed better performance in many real world use-cases

Reference library

  • reference V/D/J/C gene library upgrade to repseqio v2.1 (see changelog)

New commands

  • added command exportReportsTable that prints file in tabular format with report data from commands that were run

Other

  • optimized aligner parameters for long-read data
  • fixed system temp folder detection behaviour, now mixcr respects TMPDIR environment variable
  • rework of preset-mixin logic, now external presets (like those starting from local:...) are packed into the output *.vdjca file on align step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics.
  • new mixin options to adjust tag refinement whitelists with analyze: --set-whitelist and --reset-whitelist
  • removed refineTagsAndSort options -w and --whitelist; corresponding deprecation error message printed if used
  • new grouping feature for exportClones, allowing to normalize values for -readFraction and -uniqueTagFraction ... columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell.
  • new mixin options --add-export-clone-table-splitting, --reset-export-clone-table-splitting, --add-export-clone-grouping and --reset-export-clone-grouping
  • improved sensitivity of findAlleles command
  • add tags info in exportAlignmentsPretty and exportClonesPretty
  • add --chains filter for exportShmTrees, exportShmTreesWithNodes, exportShmTreesNewick and exportPlots shmTrees commands
  • fixed old bug #353, now all aligners favor leftmost J gene in situations where multiple genes can ve found in the sequence (i.e. mis-spliced mRNA)
  • fixes exception in align happening for not-parsed sequences with writeFailedAlignments=true
  • new filter and parameter added in assemblePartial; parameter name is minimalNOverlapShare, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangement
  • default paired-end overlap parameters changed to slightly more relaxed version
  • better criteria for alignments to be accepted for the assemblePartial procedure
  • fixed NPE in assemblePartial executed for the data without C-gene alignment settings
  • fixed rare exception in exportAirr command
  • by default exports show messages like 'region_not_covered' for data that can't be extracted (requesting -nFeature for not covered region or not existed tag). Option --not-covered-as-empty will save previous behaviour
  • info about genes with enough data to find allele was added into report of findAlleles and description of alleles
  • fixed error message appearing when analysis parameter already assigned to null is overridden by null using the -O... option
  • fixed wrong reporting of number of trimmed letters from the right side of R1 and R2 sequence
  • fixed error message about repeated generic mixin overrides
  • fixed error of exportClones with some arguments
  • fixes for report indention artefacts
  • fixed bug when chains filter set to ALL in exportAlignments was preventing not-aligned records to be exported
  • fixed runtime exception in assemble rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly
  • fixed bug leading to incorrect mixin option ordering during it's application to parameters bundle
  • minor change to the contigAssembly filtering parametrization
  • added mix-in --export-productive-clones-only
  • warning message about automatically set -Xmx.. JVM option in mixcr script
  • safer automatic value for -Xms..
  • fix: added species flag to 10x, nanopore and smart-seq2 presets