bioinformatics-core-shared-training
diff --git a/‎Markdowns/03_Quantification_with_Salmon_introduction.Rmd
Lines changed: 64 additions & 3 deletions b/‎Markdowns/03_Quantification_with_Salmon_introduction.Rmd
Lines changed: 64 additions & 3 deletions
diff --git a/‎Markdowns/03_Quantification_with_Salmon_introduction.html
Lines changed: 76 additions & 1 deletion b/‎Markdowns/03_Quantification_with_Salmon_introduction.html
Lines changed: 76 additions & 1 deletion
diff --git a/‎Markdowns/03_Quantification_with_Salmon_introduction.pdf
-627 KB b/‎Markdowns/03_Quantification_with_Salmon_introduction.pdf
-627 KB
diff --git a/‎images/aln_quant_overview.png
55.4 KB b/‎images/aln_quant_overview.png
55.4 KB
diff --git a/‎images/quasi-mapping_overview.png
61.5 KB b/‎images/quasi-mapping_overview.png
61.5 KB
@@ -2,12 +2,12 @@
 title: "Alignment and Quantification of Gene Expression with Salmon"
 date: "March 2023"
 output:
-  beamer_presentation: default
   ioslides_presentation:
     css: css/stylesheet.css
     logo: images/CRUK_Cambridge_Institute.png
     smaller: yes
     widescreen: yes
+  beamer_presentation: default
 bibliography: ref.bib
 ---
 
@@ -17,6 +17,9 @@ bibliography: ref.bib
 
 <img src="images/workflow_3Day.svg" class="centerimg" style="width: 80%; margin-top: 60px;">
 
+
+
+
 ## Traditional Alignment
 
 AIM: Given a reference sequence and a set of short reads, align each read to
@@ -45,6 +48,14 @@ Aligners: STAR, HISAT2
 
 <img src="images/quasi_mapping_2.svg" class="centerimg" style="width: 90%; margin-top: 40px;">
 
+
+## Alignment and Quantification overview {#less_space_after_title}
+
+<div style="line-height: 10%;"><br></div>
+
+<img src="images/aln_quant_overview.png" class="centerimg" style="width: 48%; margin-top: 60px;">
+
+
 ## Alignment
 * Traditional alignment perform base-by-base alignment
 * Traditional alignment is (relatively) slow and computationally intensive
@@ -111,15 +122,65 @@ Salmon also takes account of biases:
 
 * Because salmon searches transcription, not genome, it's not the right tool for finding new genes or isoforms
 
-## Salmon workflow
 
-<img src="images/Salmon_workflow_2.png" class="centerimg" style="width: 55%;">
+
+## Salmon workflow
+* Salmon essential steps
+  1. Salmon indexing
+  2. Quasi-mapping and abundance quantification
+<img src="images/Salmon_workflow_2.png" class="centerimg" style="width: 40%;">
 
 <div style="text-align: right">
   Patro *et al.* (2017) Nature Methods doi:10.1038/nmeth.4197
 </div>
 
+
+## Salmon: Salmon indexing
+
+* Two essential steps
+  1. Create transcriptome index
+    * This makes downstream quasi-mapping and quantification step efficient and faster
+    * Once you create an index, you can use it again and again
+    * Salmon indexing has two components
+      * Creates the reference transcriptome suffix array (SA)
+      * Each transcript in the reference transcriptome is mapped to its location in the SA using a hash table
+  2. Quasi-mapping and quantification    
+
+
+## Salmon: Quasi-mapping
+<div class="columns-2">
+<img src="images/quasi-mapping_overview.png" class="centerimg" style="width: 100%; height: 100%">
+
+  * The transcriptome (consisting of transcripts $t1,...,t6$) is converted into a \$ separated string "T" 
+  * On "T" suffix array, SA[T], and a hash table, h , are constructed (in indexing step).
+  * The mapping operation begins with a k-mer (here, k = 3) 
+  * From left to right, the read is scanned until a k-mer appears in the hash table.
+  * All suffixes containing the k-mer are found in the hash table and the SA intervals are retrieved
+  * The maximal matching prefix (MMP) is determined by finding the longest read sequence that exactly matches the reference suffix
+  * This process is repeated until the end of the read
+  * The final mapping is generated by determining the transcripts that appear in all MMPs for the read
+
+</div>
+
+\
+
+Avi Srivastava *et al.* (2016) Bioinformatics 2016 Jun 15;32(12)
+
+
+## Abundance estimation
+
+* With the quasi-mapping method, the best mapping is determined for each read
+* After modeling sample-specific parameters and biases, salmon will generate transcript abundance estimates
+* A read that maps equally to more than one transcript will have its count divided among them (Isoform information not lost)
+* A variety of complex modeling approaches are used to estimate transcript abundances, including Expectation Maximization (EM), which corrects for sample-specific biases.
+  * GC bias
+  * Positional bias
+  * Fragment length bias
+  * Sequence-based bias
+  
+  
 ## Practical
 
+
 1. Create and index to the transcriptome with Salmon
 2. Quantify transcript expression using Salmon