Only upto 8 % clonotype exported #1782

PavithraV0223 · 2024-09-06T12:15:55Z

PavithraV0223
Sep 6, 2024

I'm trying to run the a full length BCR repertoire analysis with UMIs, however I see very few clonotypes retrieved. I'm unable to trouble shoot this. Just wanted to check what may have gonna wrong.

Library construct details :

I want to retain the UMI's in the final tsv , The tag pattern is --tagPattern "^CTTCCGATCT(UMI:N{12})N{5}(R1:)^(R2:*)" \

Prep:
cDNA libraries were subjected to NGS on the MiSeq platform with the reagent kit V3 2 × 300 bp paired-end (Illumina).
5′RACE library preparation- to generate libraries using 5′RACE including cDNA synthesis, template switch
reaction, cDNA amplification and addition of Illumina adaptors.

Actual Result

Running:

mixcr exportClones results/Sample_data.clns results/Sample_data.clones.tsv

Exporting IGH

Exporting clones: 8.3%

Analysis finished successfully.

Exact MiXCR commands

mixcr align \

-p generic-amplicon \

--species alpaca \

--tag-pattern "^*CTTCCGATCT(UMI:N{12})N{5}(R1:*)\^(R2:*)" \

--report result/sample.align.report.txt \

--json-report result/sample.align.report.json \

-OvParameters.geneFeatureToAlign="VTranscriptWithout5UTRWithP" \

-OvParameters.parameters.floatingLeftBound=false \

-OjParameters.parameters.floatingRightBound=false \

-OcParameters.parameters.floatingRightBound=false \

AL121122BovineIgG_S1_L001_R1_001.fastq.gz \

AL121122BovineIgG_S1_L001_R2_001.fastq.gz \

results/Sample.vdjca

MiXCR complete report files

mixcr align <<<<<<<<<<<<<<<<<<<<<<<

Running:

mixcr align --report results/Sample_data.align.report.txt --json-report results/Sample_data.align.report.json --preset generic-amplicon --save-output-file-names results/Sample_data.align.list.tsv --rigid-left-alignment-boundary --rigid-right-alignment-boundary --rna --species alpaca --tag-pattern ^CTTCCGATCT(UMI:N{12})N{5}(R1:)^(R2:*) --assemble-clonotypes-by CDR1_TO_FR4 AL121122BovineIgG_S1_L001_R1_001.fastq.gz AL121122BovineIgG_S1_L001_R2_001.fastq.gz results/Sample_data.alignments.vdjca

The following tags and their roles will be associated with each output alignment:

Payload tags: R1, R2

Molecule tags: UMI(SQ)

Alignment: 0%

Alignment: 10% ETA: 00:01:38

Alignment: 20.9% ETA: 00:01:42

Alignment: 31.3% ETA: 00:01:05

Alignment: 42.2% ETA: 00:00:58

Alignment: 52.6% ETA: 00:00:45

Alignment: 63.2% ETA: 00:00:34

Alignment: 73.8% ETA: 00:00:24

Alignment: 84.1% ETA: 00:00:15

Alignment: 94.2% ETA: 00:00:05

====================== report: align ======================

Analysis time: 1.7m

Total sequencing reads: 10838897

Successfully aligned reads: 845 (0.01%)

Coverage (percent of successfully aligned):

CDR3: 791 (93.61%)

FR3_TO_FR4: 750 (88.76%)

CDR2_TO_FR4: 697 (82.49%)

FR2_TO_FR4: 357 (42.25%)

CDR1_TO_FR4: 298 (35.27%)

VDJRegion: 262 (31.01%)

Alignment failed: no hits (not TCR/IG?): 9 (0%)

Alignment failed: absence of V hits: 1 (0%)

Alignment failed: absence of J hits: 17 (0%)

Alignment failed: no target with both V and J alignments: 36 (0%)

Alignment failed: absent barcode: 10837989 (99.99%)

Overlapped: 299 (0%)

Overlapped and aligned: 291 (0%)

Overlapped and not aligned: 8 (0%)

Alignment-aided overlaps, percent of overlapped and aligned: 192 (65.98%)

No CDR3 parts alignments, percent of successfully aligned: 1 (0.12%)

Partial aligned reads, percent of successfully aligned: 53 (6.27%)

V gene chimeras: 116 (0%)

Paired-end alignment conflicts eliminated: 30 (0%)

Realigned with forced non-floating bound: 1602 (0.01%)

Realigned with forced non-floating right bound in left read: 310 (0%)

Realigned with forced non-floating left bound in right read: 310 (0%)

IGH chains: 845 (100%)

IGH non-functional: 17 (2.01%)

Trimming report:

R1 reads trimmed left: 22146 (0.2%)

R1 reads trimmed right: 3307060 (30.51%)

Average R1 nucleotides trimmed left: 0.10428422744491436

Average R1 nucleotides trimmed right: 2.8993084812965746

R2 reads trimmed left: 8661 (0.08%)

R2 reads trimmed right: 6400637 (59.05%)

Average R2 nucleotides trimmed left: 0.01569430911650881

Average R2 nucleotides trimmed right: 10.942370612065046

Tag parsing report:

Execution time: 0ns

Total reads: 10838897

Matched reads: 908 (0.01%)

Projection +R1 +R2: 908 (0.01%)

For variant 0:

For projection +R1 +R2:

  UMI:Left position:

    10~45: + 7 (0.77%) = 7 (0.77%)

    46: + 237 (26.1%) = 244 (26.87%)

    47~50: + 143 (15.75%) = 387 (42.62%)

    51: + 408 (44.93%) = 795 (87.56%)

    52~265: + 113 (12.44%) = 908 (100%)

  UMI:Right position:

    22~57: + 7 (0.77%) = 7 (0.77%)

    58: + 237 (26.1%) = 244 (26.87%)

    59~62: + 143 (15.75%) = 387 (42.62%)

    63: + 408 (44.93%) = 795 (87.56%)

    64~277: + 113 (12.44%) = 908 (100%)

  R1:Left position:

    27~62: + 7 (0.77%) = 7 (0.77%)

    63: + 237 (26.1%) = 244 (26.87%)

    64~67: + 143 (15.75%) = 387 (42.62%)

    68: + 408 (44.93%) = 795 (87.56%)

    69~282: + 113 (12.44%) = 908 (100%)

  R1:Right position:

    68~297: + 8 (0.88%) = 8 (0.88%)

    298: + 201 (22.14%) = 209 (23.02%)

    299: + 120 (13.22%) = 329 (36.23%)

    300: + 579 (63.77%) = 908 (100%)

  R2:Left position: 0

  Variants: 0

  Cost: 0

  UMI length: 12

  R1 length:

    0~231: + 195 (21.48%) = 195 (21.48%)

    232: + 346 (38.11%) = 541 (59.58%)

    233~234: + 114 (12.56%) = 655 (72.14%)

    235~271: + 253 (27.86%) = 908 (100%)

  R2 length:

    156~298: + 29 (3.19%) = 29 (3.19%)

    299: + 404 (44.49%) = 433 (47.69%)

    300: + 475 (52.31%) = 908 (100%)

mixcr assemble <<<<<<<<<<<<<<<<<<<<<<

Running:

mixcr assemble --report results/Sample_data.assemble.report.txt --json-report results/Sample_data.assemble.report.json results/Sample_data.alignments.vdjca results/Sample_data.clns

Initialization: progress unknown

===================== report: assemble =====================

Analysis time: 205ms

Final clonotype count: 12

Reads used in clonotypes, percent of total: 12 (0%)

Average number of reads per clonotype: 1

Reads dropped due to the lack of a clone sequence, percent of total: 547 (0.01%)

Reads dropped due to a too short clonal sequence, percent of total: 0 (0%)

Reads dropped due to low quality, percent of total: 0 (0%)

Reads dropped due to failed mapping, percent of total: 277 (0%)

Reads dropped with low quality clones, percent of total: 9 (0%)

Aligned reads processed: 298

Reads used in clonotypes before clustering, percent of total: 12 (0%)

Number of reads used as a core, percent of used: 12 (100%)

Mapped low quality reads, percent of used: 0 (0%)

Reads clustered in PCR error correction, percent of used: 0 (0%)

Reads pre-clustered due to the similar VJC-lists, percent of used: 0 (0%)

Clonotypes dropped as low quality: 7

Clonotypes eliminated by PCR error correction: 0

Clonotypes pre-clustered due to the similar VJC-lists: 0

Clones dropped in post filtering: 0 (0%)

Reads dropped in post filtering: 0.0 (0%)

Reads filtered by tag prefix: 0 (0%)

IGH chains: 12 (100%)

IGH non-functional: 0 (0%)

mixcr qc <<<<<<<<<<<<<<<<<<<<<<<<<

Running:

mixcr qc --print-to-stdout results/Sample_data.clns results/Sample_data.qc.txt results/Sample_data.qc.json

Successfully aligned reads: 0.01% [ALERT]

Off target (non TCR/IG) reads: 0.0% [OK]

Reads with no V or J hits: 0.0% [OK]

Reads used in clonotypes: 0.0% [ALERT]

Alignments that do not cover CDR1_TO_FR4: 0.01% [OK]

Alignments dropped due to low sequence quality: 0.0% [OK]

Clones dropped in post-filtering: 0.0% [OK]

Alignments dropped in clones post-filtering: 0.0% [OK]

mixcr exportClones <<<<<<<<<<<<<<<<<<<<

Running:

mixcr exportClones results/Sample_data.clns results/Sample_data.clones.tsv

Exporting IGH

Exporting clones: 8.3%

Analysis finished successfully.

Answered by mizraelson

Sep 6, 2024

Hi,
The barcode you specify (^*CTTCCGATCT(UMI:N{12})N{5}(R1:)\^(R2:*)) is not present in the reads. Based on the scheme you shared, it's unclear why you expect sequence "CTTCCGATCT" to be at the beginning of R1. Shouldn’t R1 start from the UMI?

It seems like you should run:

mixcr analyze generic-amplicon-with-umi \
    --species alpaca \
    --rna \
    --tag-pattern "^(UMI:N{12})N{5}(R1:*)\^(R2:*)" \
    --rigid-left-alignment-boundary \
    --floating-right-alignment-boundary C \
    --assemble-clonotypes-by VDJRegion \
     AL121122BovineIgG_S1_L001_R1_001.fastq.gz \
     AL121122BovineIgG_S1_L001_R2_001.fastq.gz \
     results/AL121122BovineIgG

View full answer

mizraelson · 2024-09-06T20:39:34Z

mizraelson
Sep 6, 2024
Collaborator

Hi,
The barcode you specify (^*CTTCCGATCT(UMI:N{12})N{5}(R1:)\^(R2:*)) is not present in the reads. Based on the scheme you shared, it's unclear why you expect sequence "CTTCCGATCT" to be at the beginning of R1. Shouldn’t R1 start from the UMI?

It seems like you should run:

mixcr analyze generic-amplicon-with-umi \
    --species alpaca \
    --rna \
    --tag-pattern "^(UMI:N{12})N{5}(R1:*)\^(R2:*)" \
    --rigid-left-alignment-boundary \
    --floating-right-alignment-boundary C \
    --assemble-clonotypes-by VDJRegion \
     AL121122BovineIgG_S1_L001_R1_001.fastq.gz \
     AL121122BovineIgG_S1_L001_R2_001.fastq.gz \
     results/AL121122BovineIgG

0 replies

PavithraV0223 · 2024-09-10T08:18:24Z

PavithraV0223
Sep 10, 2024
Author

Hi, Thanks for the clarification. CTTCCGATCT is coming from the i5 index primer which is part of our library construct. Indeed, the UMI starts after R1 { P5 + i5 + R1 + 12 UMI + GGGGG +insert + R2 +i7 + P7} . We have trimmed off few nucleotides from the P5 and i5 primers and hence I start with CTTCCGATCT which is end of R1 and starts with UMI.

1 reply

mizraelson Sep 10, 2024
Collaborator

Did you try the command above? Can you please share a few examples from the raw R1 FASTQ file?

PavithraV0223 · 2024-09-13T12:00:38Z

PavithraV0223
Sep 13, 2024
Author

1 reply

mizraelson Sep 16, 2024
Collaborator

Hi, I don't see the attached R1 file you have mentioned.

PavithraV0223 · 2024-09-25T11:42:20Z

PavithraV0223
Sep 25, 2024
Author

1 reply

mizraelson Sep 26, 2024
Collaborator

Hi, If you are replying through your email (not GitHub) the files are not getting attached.

PavithraV0223 · 2024-09-26T07:28:58Z

PavithraV0223
Sep 26, 2024
Author

Sequences_Fr.txt
Library Construct V2.xlsx

0 replies

mizraelson · 2024-10-01T23:53:37Z

mizraelson
Oct 1, 2024
Collaborator

Based on what you have shared, as expected, CTTCCGATCT is not part of the read, and the read starts with a UMI. You’ve only shared 20 reads, and all of them start with an “N” (see below). If that is the case for all your reads, you can try using the following pattern:
"^N(UMI:N{12})N{5}(R1:*)\^(R2:*)"

However, if all your reads begin with an ambiguous nucleotide, it’s worth investigating why that’s happening. This could be due to a sequencing issue.

>M05233:22:000000000-KN5KK:1:1101:12991:1001 1:N:0:ATCACG+TCTTTCCC
NGAGTATACCCGGGGGGGGTAGAGGAGCCCCAGTCCGGGATTCCCAGCTGCTCCCATTCTCTAACCAGGACTGAGCACAGACGACCCGCCATGGAGCTGGGGCTGAGTTTGGTGGTCCTGGCTGCTCTTTCACAAGGTGTCCAGGCTCAGGTACAACTCGCGGAGTCTGGGGGAGGTTTAGTACAGGCTGGGGGTTCGCTGACACTCTCCTGTACAGCCTCGGGAATCTCACTTCGTGATAATGCCATGGGCTGGTTCCGCCGGGCCCCAGGGAAGGACCGTGACGGGGGCGCATGCCTT

Sometimes this issue only affects a few reads at the top of a FASTQ file. If the rest of the reads are fine, you can continue using the original pattern:

"^(UMI:N{12})N{5}(R1:*)\^(R2:*)"

You definitely should not include CTTCCGATCT in the pattern, as it is part of the sequencing primer annealing region and is not present in the data, as you can see.

Apart from that, the command, I have shared above, is correct.

0 replies

PavithraV0223 · 2024-10-09T10:08:09Z

PavithraV0223
Oct 9, 2024
Author

Thanks a lot, I see that my reads look different from each run, something that I have to investigate. Not sure, why the N is persisting in all F1 reads not just the 20 that I have sent. Definitely worth looking at. Thanks for regex pattern clarification. I really appreciate your help.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only upto 8 % clonotype exported #1782

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Only upto 8 % clonotype exported #1782

PavithraV0223 Sep 6, 2024

Actual Result

Exact MiXCR commands

MiXCR complete report files

Replies: 7 comments · 3 replies

mizraelson Sep 6, 2024 Collaborator

PavithraV0223 Sep 10, 2024 Author

mizraelson Sep 10, 2024 Collaborator

PavithraV0223 Sep 13, 2024 Author

mizraelson Sep 16, 2024 Collaborator

PavithraV0223 Sep 25, 2024 Author

mizraelson Sep 26, 2024 Collaborator

PavithraV0223 Sep 26, 2024 Author

mizraelson Oct 1, 2024 Collaborator

PavithraV0223 Oct 9, 2024 Author

PavithraV0223
Sep 6, 2024

Replies: 7 comments 3 replies

mizraelson
Sep 6, 2024
Collaborator

PavithraV0223
Sep 10, 2024
Author

mizraelson Sep 10, 2024
Collaborator

PavithraV0223
Sep 13, 2024
Author

mizraelson Sep 16, 2024
Collaborator

PavithraV0223
Sep 25, 2024
Author

mizraelson Sep 26, 2024
Collaborator

PavithraV0223
Sep 26, 2024
Author

mizraelson
Oct 1, 2024
Collaborator

PavithraV0223
Oct 9, 2024
Author