Only upto 8 % clonotype exported #1782
-
I'm trying to run the a full length BCR repertoire analysis with UMIs, however I see very few clonotypes retrieved. I'm unable to trouble shoot this. Just wanted to check what may have gonna wrong. Library construct details : I want to retain the UMI's in the final tsv , The tag pattern is --tagPattern "^CTTCCGATCT(UMI:N{12})N{5}(R1:)^(R2:*)" \ Prep: Actual ResultRunning: mixcr exportClones results/Sample_data.clns results/Sample_data.clones.tsv Exporting IGH Exporting clones: 8.3% Analysis finished successfully. Exact MiXCR commandsmixcr align \
MiXCR complete report files
Running: mixcr align --report results/Sample_data.align.report.txt --json-report results/Sample_data.align.report.json --preset generic-amplicon --save-output-file-names results/Sample_data.align.list.tsv --rigid-left-alignment-boundary --rigid-right-alignment-boundary --rna --species alpaca --tag-pattern ^CTTCCGATCT(UMI:N{12})N{5}(R1:)^(R2:*) --assemble-clonotypes-by CDR1_TO_FR4 AL121122BovineIgG_S1_L001_R1_001.fastq.gz AL121122BovineIgG_S1_L001_R2_001.fastq.gz results/Sample_data.alignments.vdjca The following tags and their roles will be associated with each output alignment: Payload tags: R1, R2 Molecule tags: UMI(SQ) Alignment: 0% Alignment: 10% ETA: 00:01:38 Alignment: 20.9% ETA: 00:01:42 Alignment: 31.3% ETA: 00:01:05 Alignment: 42.2% ETA: 00:00:58 Alignment: 52.6% ETA: 00:00:45 Alignment: 63.2% ETA: 00:00:34 Alignment: 73.8% ETA: 00:00:24 Alignment: 84.1% ETA: 00:00:15 Alignment: 94.2% ETA: 00:00:05 ====================== report: align ====================== Analysis time: 1.7m Total sequencing reads: 10838897 Successfully aligned reads: 845 (0.01%) Coverage (percent of successfully aligned): CDR3: 791 (93.61%) FR3_TO_FR4: 750 (88.76%) CDR2_TO_FR4: 697 (82.49%) FR2_TO_FR4: 357 (42.25%) CDR1_TO_FR4: 298 (35.27%) VDJRegion: 262 (31.01%) Alignment failed: no hits (not TCR/IG?): 9 (0%) Alignment failed: absence of V hits: 1 (0%) Alignment failed: absence of J hits: 17 (0%) Alignment failed: no target with both V and J alignments: 36 (0%) Alignment failed: absent barcode: 10837989 (99.99%) Overlapped: 299 (0%) Overlapped and aligned: 291 (0%) Overlapped and not aligned: 8 (0%) Alignment-aided overlaps, percent of overlapped and aligned: 192 (65.98%) No CDR3 parts alignments, percent of successfully aligned: 1 (0.12%) Partial aligned reads, percent of successfully aligned: 53 (6.27%) V gene chimeras: 116 (0%) Paired-end alignment conflicts eliminated: 30 (0%) Realigned with forced non-floating bound: 1602 (0.01%) Realigned with forced non-floating right bound in left read: 310 (0%) Realigned with forced non-floating left bound in right read: 310 (0%) IGH chains: 845 (100%) IGH non-functional: 17 (2.01%) Trimming report: R1 reads trimmed left: 22146 (0.2%) R1 reads trimmed right: 3307060 (30.51%) Average R1 nucleotides trimmed left: 0.10428422744491436 Average R1 nucleotides trimmed right: 2.8993084812965746 R2 reads trimmed left: 8661 (0.08%) R2 reads trimmed right: 6400637 (59.05%) Average R2 nucleotides trimmed left: 0.01569430911650881 Average R2 nucleotides trimmed right: 10.942370612065046 Tag parsing report: Execution time: 0ns Total reads: 10838897 Matched reads: 908 (0.01%) Projection +R1 +R2: 908 (0.01%) For variant 0:
Running: mixcr assemble --report results/Sample_data.assemble.report.txt --json-report results/Sample_data.assemble.report.json results/Sample_data.alignments.vdjca results/Sample_data.clns Initialization: progress unknown ===================== report: assemble ===================== Analysis time: 205ms Final clonotype count: 12 Reads used in clonotypes, percent of total: 12 (0%) Average number of reads per clonotype: 1 Reads dropped due to the lack of a clone sequence, percent of total: 547 (0.01%) Reads dropped due to a too short clonal sequence, percent of total: 0 (0%) Reads dropped due to low quality, percent of total: 0 (0%) Reads dropped due to failed mapping, percent of total: 277 (0%) Reads dropped with low quality clones, percent of total: 9 (0%) Aligned reads processed: 298 Reads used in clonotypes before clustering, percent of total: 12 (0%) Number of reads used as a core, percent of used: 12 (100%) Mapped low quality reads, percent of used: 0 (0%) Reads clustered in PCR error correction, percent of used: 0 (0%) Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%) Clonotypes dropped as low quality: 7 Clonotypes eliminated by PCR error correction: 0 Clonotypes pre-clustered due to the similar VJC-lists: 0 Clones dropped in post filtering: 0 (0%) Reads dropped in post filtering: 0.0 (0%) Reads filtered by tag prefix: 0 (0%) IGH chains: 12 (100%) IGH non-functional: 0 (0%)
Running: mixcr qc --print-to-stdout results/Sample_data.clns results/Sample_data.qc.txt results/Sample_data.qc.json Successfully aligned reads: 0.01% [ALERT] Off target (non TCR/IG) reads: 0.0% [OK] Reads with no V or J hits: 0.0% [OK] Reads used in clonotypes: 0.0% [ALERT] Alignments that do not cover CDR1_TO_FR4: 0.01% [OK] Alignments dropped due to low sequence quality: 0.0% [OK] Clones dropped in post-filtering: 0.0% [OK] Alignments dropped in clones post-filtering: 0.0% [OK]
Running: mixcr exportClones results/Sample_data.clns results/Sample_data.clones.tsv Exporting IGH Exporting clones: 8.3% Analysis finished successfully. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 3 replies
-
Hi, It seems like you should run:
|
Beta Was this translation helpful? Give feedback.
-
Hi, Thanks for the clarification. CTTCCGATCT is coming from the i5 index primer which is part of our library construct. Indeed, the UMI starts after R1 { P5 + i5 + R1 + 12 UMI + GGGGG +insert + R2 +i7 + P7} . We have trimmed off few nucleotides from the P5 and i5 primers and hence I start with CTTCCGATCT which is end of R1 and starts with UMI. |
Beta Was this translation helpful? Give feedback.
-
Based on what you have shared, as expected, CTTCCGATCT is not part of the read, and the read starts with a UMI. You’ve only shared 20 reads, and all of them start with an “N” (see below). If that is the case for all your reads, you can try using the following pattern: However, if all your reads begin with an ambiguous nucleotide, it’s worth investigating why that’s happening. This could be due to a sequencing issue.
Sometimes this issue only affects a few reads at the top of a FASTQ file. If the rest of the reads are fine, you can continue using the original pattern:
You definitely should not include CTTCCGATCT in the pattern, as it is part of the sequencing primer annealing region and is not present in the data, as you can see. Apart from that, the command, I have shared above, is correct. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot, I see that my reads look different from each run, something that I have to investigate. Not sure, why the N is persisting in all F1 reads not just the 20 that I have sent. Definitely worth looking at. Thanks for regex pattern clarification. I really appreciate your help. |
Beta Was this translation helpful? Give feedback.
Hi,
The barcode you specify (
^*CTTCCGATCT(UMI:N{12})N{5}(R1:)\^(R2:*)
) is not present in the reads. Based on the scheme you shared, it's unclear why you expect sequence "CTTCCGATCT" to be at the beginning of R1. Shouldn’t R1 start from the UMI?It seems like you should run: