generic-amplicon preset #1416
-
Beta Was this translation helpful? Give feedback.
Replies: 16 comments
-
HI, yes, of course, could you please share the size of the UMI? |
Beta Was this translation helpful? Give feedback.
-
Size of the UMI is 7 bp long. it is pretty short. |
Beta Was this translation helpful? Give feedback.
-
It's pretty short. I must say I have read this paper before and contacted the authors on the matter of sharing the data (because, as far as I'm concerned, the raw data is not publicly available), so I can tune the preset, but I never heard back from them. If you have raw data generated by this protocol that you can share with us (the same goes for the single-part data from this publication) - that would be of great help.
Because the UMI is quite short, I suggest trying to include a few more letters from the TRBC/TRAC primer, which will at least increase the diversity two-fold. |
Beta Was this translation helpful? Give feedback.
-
Hello, I do have the bulk and single cell data generated from this protocol. I would probably have to converse with the data generator because the data that we have is a clinical data of origin. I will get back to you once I talked with the developer and get back to you. in terms of the bulk data, would single pair of the Fastq file suffice? ( R1 and R2 ) Also for the single cell data, Would you need all fastq files for the entire batch(it will be 384 pairs of fastq files in total)? |
Beta Was this translation helpful? Give feedback.
-
A single pair of files will be enough for our purposes. In the case of Single-cell analysis, it's better to see the full picture, as the filtering process includes all cells. If needed, we can provide a secure SFTP server for the data transfer. Nevertheless, I recommend you try the commands suggested and we can see how well it worked, as these generic presets should cover most cases. |
Beta Was this translation helpful? Give feedback.
-
the command above throws an error stating that, "Could not invoke public final void com.milaboratory.mixcr.cli.AlignMiXCRMixins.floatingRightAlignmentBoundary(java.lang.String) with /jsimonlab/users/bshim/BMS-Bulk-Reads/BMS-61_S1_L001_R1_001.fastq.gz (java.lang.IllegalArgumentException: Unknown point: /jsimonlab/users/bshim/BMS-Bulk-Reads/BMS-61_S1_L001_R1_001.fastq.gz)" why might this be? |
Beta Was this translation helpful? Give feedback.
-
Please try the following:
|
Beta Was this translation helpful? Give feedback.
-
I am also in the process of getting access to sample data for both single and bulk library which we can share to you. I will let you guys know as soon as possible. |
Beta Was this translation helpful? Give feedback.
-
Upon analyzing the bulk dataset, I see that as expected, such a short UMI sequence leads to a high number of distinct clones within a single UMI group, which in some cases makes it hard to assemble consensus. I tweaked the parameters in the example below to recover as many clones as possible.
Nevertheless, it is strongly recommended using a longer UMI, as in this case it doesn't really mark unique molecules, thus de facto is not a true UMI.
Sincerely, |
Beta Was this translation helpful? Give feedback.
-
Could you explain little bit about the tag pattern used here? |
Beta Was this translation helpful? Give feedback.
-
In your R1 files the reads have UMI and Illumina indices at the end. These 8bp is the small part of C gene at the very end of the payload sequence (that is most likely comes from the primer) that I use to trim artificial barcode sequences. |
Beta Was this translation helpful? Give feedback.
-
Hi @mizraelson, I recently read one paper entitled TCR sequencing and cloning methods for repertoire analysis and isolation of tumor-reactive TCRs. In this paper, they introduced one TCR sequencing method for RNA extracted from T cells under the name
The qc output for
The qc output for
Then I compared the output files of
For the results from
For the results from
I truly value your expertise and insight in this matter and I believe your perspective could be of great help. Best, |
Beta Was this translation helpful? Give feedback.
-
Hi, You are rigth – a 9 bp UMI is quite short. As such, we're seeing about half the reads being dropped due to multiple CDR3s being assigned to the same UMI. Considering the UMIs are attached to multiple V gene primers, a good way around might be to include a few nucleotides right after the UMI, potentially increasing diversity. I'd recommend giving this a go: As for the CDR3 discrepancy. In the paper they do include an extra amino acid from the FR4 (sourced from the J gene) within the CDR3. The reasoning behind this addition isn't entirely clear. While some researchers opt to exclude the initial and final amino acids from the CDR3 definition (e.i. IMGT), adding an extra one is a bit weird However, since this particular amino acid stems from the J gene – which both methods identify correctly – you can safely consider the clones equivalent. For a quick comparison:
Check out this link, and you'll see that the terminal 'G' belongs to the FR4. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your clarification of the "G" amino acid shown in the results of the manuscript. I have both run the pipeline with set the first 15 bases as UMI and the first 25 bases as UMI. The results are slightly different. The results for the first 15 bases set as UMI is:
The results for the first 25 bp as UMI is:
When the first 10 bases are ignored using the fore-mentioned
Should I try to use longer bases to be used as UMI, or should I just ignore the first 10 bases? |
Beta Was this translation helpful? Give feedback.
-
Actually, the 15bp UMI looks much better; the number of unassigned alignments has dropped from 53% to 7.8%! Additionally, 85% of the reads are used in clonotype assembly. I would suggest going with the 15bp UMI. While it's not perfect, it still allows you to leverage the UMI to correct the data effectively. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your clarification! |
Beta Was this translation helpful? Give feedback.
Upon analyzing the bulk dataset, I see that as expected, such a short UMI sequence leads to a high number of distinct clones within a single UMI group, which in some cases makes it hard to assemble consensus. I tweaked the parameters in the example below to recover as many clones as possible.