diff --git a/docs/no_toc/01-intro.md b/docs/no_toc/01-intro.md
index 8a9bce7b..6ed130fd 100644
--- a/docs/no_toc/01-intro.md
+++ b/docs/no_toc/01-intro.md
@@ -3,7 +3,7 @@
 
 # Introduction
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png" title="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" alt="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png" alt="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" width="100%" />
 
 This is a *living* course meaning it is constantly changing and being updated. The goal for this course is to be a "wikipedia" of omic data.
 If you'd like to contribute, [you can file a pull request on GitHub](https://github.com/fhdsl/Choosing_Genomics_Tools) if you are comfortable with that sort of thing or email `csavonen@fredhutch.org` to ask how to get started.
@@ -18,11 +18,11 @@ _This course is written for individuals who:_
 - Want a basic overview of genomic data types.
 - Want to find resources for processing and interpreting genomics data.
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png" title="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" alt="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png" alt="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" width="100%" />
 
 ## Topics covered:
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png" title=" " alt=" " width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png" alt=" " width="100%" />
 
 ## Motivation
 
@@ -33,17 +33,17 @@ Often students and researchers need to utilize genomic data to reach the next st
 
 Often researchers receive their genomic data processed from another lab or institution, and although they are excited to gain insights from it to inform the next steps of their research, they may not have a practical understanding of how the data they have received came to be or what needs to be done with it.
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_0.png" title="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" alt="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_0.png" alt="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" width="100%" />
 
 As an example, data file formats may not have been covered in their training, and the data they received seems unintelligible and not as straightforward as they hoped.
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_13.png" title="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" alt="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_13.png" alt="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" width="100%" />
 
 This course attempts to give this researcher the basic bearings and resources regarding their data, in hopes that they will be equipped and informed about how to obtain the insights for their researcher they originally aimed to find.
 
 ## Curriculum  
 
-<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png" title="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " alt="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " width="100%" />
+<img src="resources/images/01-intro_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png" alt="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " width="100%" />
 
 **Goal of this course:**  
 Equip learners with tutorials and resources so they can understand and interpret their genomic data in a way that helps them meet their goals and handle the data properly.
diff --git a/docs/no_toc/02-genomics_overview.md b/docs/no_toc/02-genomics_overview.md
index 095061f0..f13018c3 100644
--- a/docs/no_toc/02-genomics_overview.md
+++ b/docs/no_toc/02-genomics_overview.md
@@ -4,7 +4,7 @@
 
 ## Learning Objectives
 
-<img src="resources/images/02-genomics_overview_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png" title="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" alt="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" width="100%" />
+<img src="resources/images/02-genomics_overview_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png" alt="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" width="100%" />
 
 In this chapter we are going to cover sequencing and microarray workflows at a very general high level overview to give you a first orientation. As we dive into specific data types and experiments, we will get into more specifics.
 Here we will cover the most common file formats. If you have a file format you are dealing with that you don't see listed here, it may be specific to your data type and we will discuss that more in that data type's respective chapter. We still suggest you go through this chapter to give you a basic understanding of commonalities of all genomic data types and workflows
@@ -13,7 +13,7 @@ Here we will cover the most common file formats. If you have a file format you a
 
 In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly, this summarized data is what can be used to make inferences and create plots and results tables.
 
-<img src="resources/images/02-genomics_overview_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png" title="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " alt="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " width="100%" />
+<img src="resources/images/02-genomics_overview_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png" alt="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " width="100%" />
 
 ### Basic file formats
 
diff --git a/docs/no_toc/03-whats-metadata.md b/docs/no_toc/03-whats-metadata.md
index 33258c57..ecf40a79 100644
--- a/docs/no_toc/03-whats-metadata.md
+++ b/docs/no_toc/03-whats-metadata.md
@@ -5,7 +5,7 @@
 
 ## Learning Objectives
 
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png" title="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" alt="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png" alt="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" width="100%" />
 
 ## What are metadata?
 
@@ -15,11 +15,11 @@ Metadata are critically important descriptive information about your data.
 
 Metadata describe how your data came to be, what organism or patient the data are from and include any and every relevant piece of information about the samples in your data set.
 
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png" title="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" alt="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png" alt="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" width="100%" />
 
 Metadata includes but isn't limited to, the following example categories:
 
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png" title="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" alt="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png" alt="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" width="100%" />
 
 <div class = "warning">
 At this time it's important to note that if you work with human data or samples, your metadata will likely contain personal identifiable information (PII) and protected health information (PHI). It's critical that you protect this information! For more details on this, we encourage you to see our [course about data management](https://jhudatascience.org/Ethical_Data_Handling_for_Cancer_Research/data-privacy.html).
@@ -74,13 +74,13 @@ Toward these two goals, [this excellent article](https://www.tandfonline.com/doi
 
 <div class = "warning">
 Note that it is very dangerous to open gene data with Excel. According to @Ziemann2016, approximately one-fifth of papers with Excel gene lists have errors. This happens because Excel wants to interpret everything as a date. We strongly caution against opening (and saving afterward) gene data in Excel.
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png" title="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " alt="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png" alt="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " width="100%" />
 </div>
 
 ### To recap:
 
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png" title="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" alt="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png" alt="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" width="100%" />
 
-<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png" title="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" alt="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" width="100%" />
+<img src="resources/images/03-whats-metadata_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png" alt="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" width="100%" />
 
 If you are not the person who has the information needed to create metadata, or you believe that another individual already has this information, make sure you get ahold of the metadata that correspond to your data. It will be critical for you to have to do any sort of meaningful analysis!
diff --git a/docs/no_toc/04-considerations-for-choosing.md b/docs/no_toc/04-considerations-for-choosing.md
index dcc31e08..bc9617df 100644
--- a/docs/no_toc/04-considerations-for-choosing.md
+++ b/docs/no_toc/04-considerations-for-choosing.md
@@ -5,7 +5,7 @@
 
 ## Learning Objectives
 
-<img src="resources/images/04-considerations-for-choosing_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png" title="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " alt="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " width="100%" />
+<img src="resources/images/04-considerations-for-choosing_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png" alt="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " width="100%" />
 
 ## Overview
 
@@ -13,7 +13,7 @@ In this course, we will introduce you to the fundamentals of various data types
 
 We will discuss the following considerations you should gather information and otherwise ponder when comparing one or more tools for your analysis:
 
-<img src="resources/images/04-considerations-for-choosing_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png" title="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " alt="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " width="100%" />
+<img src="resources/images/04-considerations-for-choosing_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png" alt="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " width="100%" />
 
 ### Is this tool appropriate for your data type?
 
diff --git a/docs/no_toc/05-general-data-analysis-tools.md b/docs/no_toc/05-general-data-analysis-tools.md
index fb100034..1acba1c0 100644
--- a/docs/no_toc/05-general-data-analysis-tools.md
+++ b/docs/no_toc/05-general-data-analysis-tools.md
@@ -5,7 +5,7 @@
 
 ## Learning Objectives
 
-<img src="resources/images/05-general-data-analysis-tools_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png" title="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" alt="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" width="100%" />
+<img src="resources/images/05-general-data-analysis-tools_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png" alt="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" width="100%" />
 
 ## Command Line vs GUI
 
diff --git a/docs/no_toc/06-sequencing-data.md b/docs/no_toc/06-sequencing-data.md
index 5f5f0023..7b068a75 100644
--- a/docs/no_toc/06-sequencing-data.md
+++ b/docs/no_toc/06-sequencing-data.md
@@ -9,7 +9,7 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/06-sequencing-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png" title="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." alt="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." width="100%" />
+<img src="resources/images/06-sequencing-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png" alt="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." width="100%" />
 
 In this section, we are going to discuss generalities that apply to all sequencing data. This is meant to be a "primer" for you which data-type specific chapters will build off of to give you more specific and practical steps and advice in regards to your data type.
 
@@ -31,7 +31,7 @@ At the end of this process, base sequences are called for the samples (with vary
 
 ### Inherent biases
 
-<img src="resources/images/06-sequencing-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_5.png" title="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" alt="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" width="100%" />
+<img src="resources/images/06-sequencing-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_5.png" alt="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" width="100%" />
 
 Sequences are not all sequenced or amplified at the same rate. In a perfect world, we could take a simple snapshot of the genome we are interested in and know exactly what and how many sequences were in a sample. But in reality, sequencing methods and the resulting data always have some biases we have to be aware of and hopefully use methods that attempt to mitigate the biases.
 
diff --git a/docs/no_toc/07-microarray-data.md b/docs/no_toc/07-microarray-data.md
index 8d76f43d..b3db7658 100644
--- a/docs/no_toc/07-microarray-data.md
+++ b/docs/no_toc/07-microarray-data.md
@@ -9,7 +9,7 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/07-microarray-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png" title="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." alt="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." width="100%" />
+<img src="resources/images/07-microarray-data_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png" alt="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." width="100%" />
 
 ## Summary of microarrays
 
diff --git a/docs/no_toc/08-annotating-genomes.md b/docs/no_toc/08-annotating-genomes.md
index 49cd2c02..b6fd1602 100644
--- a/docs/no_toc/08-annotating-genomes.md
+++ b/docs/no_toc/08-annotating-genomes.md
@@ -9,7 +9,7 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png" title="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" alt="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" width="100%" />
+<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png" alt="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" width="100%" />
 
 In this chapter, we are going to discuss methods that affect every genomic method and may take up the majority of your time as a genomic data analyst: Annotation.
 
@@ -21,7 +21,7 @@ Proper annotation requires an understanding of how the annotation data you are u
 
 Every individual organism has its own DNA sequence that is unique to it. So how can we compare organisms to each other? In some studies, sequencing data is obtained and the genome is built de novo (aka from scratch) but this takes a lot of time and computing power. So instead, most genomic studies use the imperfect method of comparing to a reference genome. Reference genomes are built from prior data and available online. They inherently have biases in them. For example, human genomes are generally not made from diverse populations but instead from mostly males of european descent. It is inherently bad for both ethical and scientific reasons to to have [genome references that are too white](https://www.sciencenews.org/article/genetics-race-dna-databases-reference-genome-too-white). For more on the problems with reference genomes, [read this](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1774-4).
 
-<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png" title="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" />
+<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" />
 
 In summary, reference genomes are used for comparison and as a 'source of truth' of sorts, but its important to note that this method is biased and better alternatives need to be realized.
 
@@ -29,7 +29,7 @@ In summary, reference genomes are used for comparison and as a 'source of truth'
 
 If you are familiar with software development, or have used any app before, you're familiar with software updates and releases. Similarly, the genome has updates and releases as continued cloning and assemblies of organisms teaches us more. In the image below we are showing an example of what a genome version may be noted as (note that different databases may have different terminology -- here we are showing the Genome Reference Consortium). You may also notice on their website it shows the date the genome version was released and what was fixed.
 
-<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_33.png" title="Genome assemblies are changed and updated over time much like software packages. " alt="Genome assemblies are changed and updated over time much like software packages. " width="100%" />
+<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_33.png" alt="Genome assemblies are changed and updated over time much like software packages. " width="100%" />
 
 The details of how genome versions are fixed and released are not really of concern for your data analysis. This is merely to explain that genomes change and what is most important in your analysis is that:
 
@@ -40,7 +40,7 @@ The details of how genome versions are fixed and released are not really of conc
 
 Although we can't walk you through every organism and database set up, we will walkthrough the files and structure of one example here.
 
-<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png" title="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" />
+<img src="resources/images/08-annotating-genomes_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" />
 
 In the above screenshot, [from Ensembl](https://useast.ensembl.org/info/data/ftp/index.html), it shows different organisms in the rows, but also a variety of different files across the columns. In this example, DNA reference to the DNA sequence of the organism's genome, but cDNA refers to complementary DNA -- aka DNA that has been reversed transcribed from RNA. If you are working with RNA data you may want to use the cDNA file. Whereas CDS files are referring to only coding sequences and ncRNA files are showing only non coding sequences. Most of these files are FASTA files. Gene sets are also their own annotation files called GTF or GFF files. Ensembl provides more [detailed information about what these files contain](https://useast.ensembl.org/info/website/upload/gff.html), but briefly, each row is a feature and has information describing that feature such as genomic locations, the relevant feature type (gene, coding sequence, pseudogene, etc.), and the gene ID or name. For a reminder on what these different file types are [see the previous chapter](http://hutchdatascience.org/Choosing_Genomics_Tools/a-very-general-genomics-overview.html#basic-file-formats).
 
diff --git a/docs/no_toc/09-DNA.md b/docs/no_toc/09-DNA.md
index dc3be96d..664bb9da 100644
--- a/docs/no_toc/09-DNA.md
+++ b/docs/no_toc/09-DNA.md
@@ -11,7 +11,7 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png" title="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" width="100%" />
+<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" width="100%" />
 
 ## What are the goals of analyzing DNA sequences?
 
@@ -35,7 +35,7 @@ There are several larger goals behind DNA sequencing experiments ranging from as
 
 ## Comparison of DNA methods
 
-<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png" title="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." alt="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." width="100%" />
+<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png" alt="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." width="100%" />
 There are four DNA sequencing methods discussed in this chapter. The above graph compares WGS, WXS, and Targeted gene sequencing. The last section compares all 4.
 
 1. Whole genome sequencing (WGS)
@@ -81,6 +81,6 @@ If your research question does not pertain to non-coding regions of the genome o
 Furthermore, if you are able to narrow down even further what regions are of interest this would be better in terms of cost and detection abilities. A targeted sequencing panel or DNA microarray are ideal for assaying known groups of targets. DNA microarrays are the least costly of all the methods to identify DNA variants, but with both targeted sequencing and DNA microarray you will need to find or create a custom probe or primer set. Ideally a probe or primer set that hits your regions of interest already exists commercially but if not, then you will have to design your own -- which also costs time and money.
 
 
-<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png" title="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." alt="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." width="100%" />
+<img src="resources/images/09-DNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png" alt="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." width="100%" />
 
 In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data.
diff --git a/docs/no_toc/09a-WGS-and-WXS.md b/docs/no_toc/09a-WGS-and-WXS.md
index ef295688..b07e55f4 100644
--- a/docs/no_toc/09a-WGS-and-WXS.md
+++ b/docs/no_toc/09a-WGS-and-WXS.md
@@ -9,14 +9,14 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png" title="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" alt="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png" alt="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" width="100%" />
 The learning objectives for this course are to explain the use and application of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES/WXS) for genomics studies, outline the technical steps in generating WGS/WXS data, and detail the processing steps for analyzing and interpreting WGS/WXS data.
 
 **To familiarize yourself with sequencing methods as a whole, we recommend you read our [chapter on sequencing first](http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html).**
 
 ## WGS and WGS Overview
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_8.png" title="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" alt="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_8.png" alt="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" width="100%" />
 The difference between WGS and WXS sequencing is whether or not the open reading frames and thus coding regions are targeted in sequencing. WGS attempts to sequence the whole genome, while for WXS only exons with open reading frames are targeted for sequencing. Both of these methods can be massively beneficial for studying rare and complex diseases.
 
 Thus, whole genome sequencing is a technique to thoroughly analyze the entire DNA sequence of an organism's genome. This includes sequencing all genes both coding and non-coding and all mitochondrial DNA. WGS is beneficial for identifying new and previously established variants related to disease and the regulatory elements of the genome including promoters, enhancers, and silencers. Increasingly non-coding RNAs have also been identified to play a functional role in biological mechanisms and diseases. In order to learn more about the non-coding regions of the genome, WGS is necessary.
@@ -25,7 +25,7 @@ Alternatively whole exome sequencing is used to sequence the coding regions of a
 
 ## Advantages and Disadvantages of WGS vs WXS
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png" title="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" alt="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png" alt="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" width="100%" />
 
 
 We more thoroughly discuss how to choose DNA sequencing methods [here in the previous chapter](http://hutchdatascience.org/Choosing_Genomics_Tools/dna-methods.html), but we will briefly cover this here. Alternatives to WGS include Whole Exome Sequencing (WES/WXS), which sequences the open reading frame areas of the genome or Targeted Gene Sequencing where probes have been designed to sequence only regions of interest.
@@ -33,7 +33,7 @@ The main advantages of WGS include the ability to comprehensively analyze all re
 
 ## WGS/WXS Considerations
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png" title="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" alt="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png" alt="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" width="100%" />
 Some important considerations for WGS/WXS include:  
 
 - What genome you are studying and the size of this genome. Included in this considerations is whether this genome has been sequenced before and you will have a "reference" genome to compare your data against or whether you will have to make a reference genome yourself. [This bioinformatics resource](https://eriqande.github.io/eca-bioinf-handbook/alignment-of-sequence-data-to-a-reference-genome-and-associated-steps.html) provides a great overview of genome alignment.
@@ -52,19 +52,19 @@ For WXS or other targeted sequencing specifically (so not relevant to WGS data),
 
 ## DNA Sequencing Pipeline Overview
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_33.png" title="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" alt="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_33.png" alt="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" width="100%" />
 In order to create WGS/WXS data, DNA is first extracted from a specific sample type (tissue, blood samples, cells, FFPE blocks, etc.). Either traditional (involving phenol and chloroform) or commercial kits can be used for this first step. Next, the DNA sequencing libraries are prepared. This involves fragmenting the DNA, adding sequencing adapters, and DNA amplification if the input DNA is not of sufficient quantity. Recall that for WXS After sequencing, data is analyzed by converting and aligning reads to generate a BAM file. Many analysis tools will use the BAM file to identify variants, which then generates a VCF file. More information about sequencing and BAM and VCF file generation can be found [here](http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html) in the sequencing data chapter.  
 
 
 ## Data Pre-processing
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png" title="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " alt="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png" alt="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " width="100%" />
 Raw sequencing reads are first transformed into a fastq file (more information about fastq files can be found [here](http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html) in the sequencing data chapter in the Quality Controls section. Then the sequencing reads are aligned to a reference genome to create a BAM file. This data is sorted and merged, and PCR duplicates are identified. The confidence that each read was sequenced correctly is reflected in the base quality score. This score must be recalibrated at this step before variants are called. A final BAM file is thus created. This can be used for future analysis steps include variant or mutation identification, which is outlined on the following slide.
 
 
 ## Commonly Used Tools
 
-<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png" title="Tools commonly used in WGS data analysis" alt="Tools commonly used in WGS data analysis" width="100%" />
+<img src="resources/images/09a-WGS-and-WXS_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png" alt="Tools commonly used in WGS data analysis" width="100%" />
 The following link provides the data analysis pipeline written by researchers in the NCI division of the NIH and provides a helpful overview of the typical steps necessary for [WGS analysis](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/).
 
 Here are many of the tools and resources used by researchers for analyzing WGS data.
diff --git a/docs/no_toc/10-RNA.md b/docs/no_toc/10-RNA.md
index 0fa4783a..21488653 100644
--- a/docs/no_toc/10-RNA.md
+++ b/docs/no_toc/10-RNA.md
@@ -9,19 +9,19 @@ This chapter is in a beta stage. Some of it has been written with AI tools. If y
 
 ## Learning Objectives
 
-<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png" title="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" width="100%" />
+<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" width="100%" />
 
 ## What are the goals of gene expression analysis?
 
 The goal of gene expression analysis is to quantify RNAs across the genome. This can signify the extent to which various RNAs are being transcribed in a particular cell. This can be informative for what kinds of activity a cell is undergoing and responding to.
 
-<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png" title="The goal of gene expression analysis is to quantify RNAs on a genome wide level" alt="The goal of gene expression analysis is to quantify RNAs on a genome wide level" width="100%" />
+<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png" alt="The goal of gene expression analysis is to quantify RNAs on a genome wide level" width="100%" />
 
 ## Comparison of RNA methods
 
 There are three general methods we will discuss for evaluating gene expression. RNA sequencing (whether bulk or single-cell) allows you to catch more targets than gene expression microarrays but is much more costly and computationally intensive. Gene expression microarrays have a lower dynamic range than RNA-seq generally but are much more cost effective. Spatial transcriptomics is the newest method on the block and has the ability to relate gene expression to tissue regions and subpopulations.
 
-<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png" title="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." alt="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." width="100%" />
+<img src="resources/images/10-RNA_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png" alt="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." width="100%" />
 
 ### Single-cell RNA-seq (scRNA-seq):
 
diff --git a/docs/no_toc/10a-bulk-RNA-seq.md b/docs/no_toc/10a-bulk-RNA-seq.md
index 6cba11da..f79cd13a 100644
--- a/docs/no_toc/10a-bulk-RNA-seq.md
+++ b/docs/no_toc/10a-bulk-RNA-seq.md
@@ -10,17 +10,17 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png" title="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" alt="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png" alt="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" width="100%" />
 
 ## Where RNA-seq data comes from
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_5.png" title="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." alt="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_5.png" alt="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." width="100%" />
 
 ## RNA-seq workflow
 
 In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses.
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png" title="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " alt="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png" alt="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " width="100%" />
 
 In this chapter we will highlight some of the more popular RNA-seq tools, that are generally suitable for most experiment data but there is no "one size fits all" for computational analysis of RNA-seq data [@Conesa2016]. You may find tools out there that better suit your needs than the ones we discuss here.
 
@@ -34,7 +34,7 @@ In this chapter we will highlight some of the more popular RNA-seq tools, that a
 
 RNA-seq suffers from a lot of the common sequence biases which are further worsened by PCR amplification steps. We discussed some of the sequence biases in the [previous sequencing chapter]().
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142e3de7ce8_0_19.png" title="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." alt="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142e3de7ce8_0_19.png" alt="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." width="100%" />
 
 These biases are nicely covered in [this blog by Mike Love](https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/) and we'll summarize them here:
 
@@ -45,7 +45,7 @@ These biases are nicely covered in [this blog by Mike Love](https://mikelove.wor
 
 _Main Takeaway_: When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way.
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png" title="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." alt="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png" alt="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." width="100%" />
 
 ## RNA-seq data considerations
 
@@ -58,7 +58,7 @@ Most of the RNA in the cell is not mRNA or noncoding RNAs of interest, but inste
 
 [This blog by Sitools Biotech does a good summary](https://blog.sitoolsbiotech.com/2019/08/ribo-depletion-rna-seq-ribosomal-rna-depletion-method-works-best/) of the pros and cons of either selection method.
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png" title="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." alt="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png" alt="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." width="100%" />
 
 ### Transcriptome mapping
 
@@ -80,7 +80,7 @@ _Examples of pseudo aligners_:
 
 These strategies are discussed at greater length [in this excellent manuscript by Conesa et al, 2016](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8).
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png" title="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." alt="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png" alt="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." width="100%" />
 
 ### Abundance measures
 
@@ -116,11 +116,11 @@ TPM has gained a popularity in recent years because it is more intuitive to unde
 
 > When you use TPM, the sum of all TPMs in each sample are the same. This makes it easier to compare the proportion of reads that mapped to a gene in each sample. In contrast, with RPKM and FPKM, the sum of the normalized reads in each sample may be different, and this makes it harder to compare samples directly.
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png" title="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." alt="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png" alt="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." width="100%" />
 
 ### RNA-seq downstream analysis tools
 
-<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_80.png" title="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" alt="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" width="100%" />
+<img src="resources/images/10a-bulk-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_80.png" alt="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" width="100%" />
 
 - [ComplexHeatmap](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html#:~:text=Complex%20heatmaps%20are%20efficient%20to,and%20supports%20various%20annotation%20graphics.) is great for visualizations
 - [DESEq2](https://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) and [edgeR](https://www.bioconductor.org/packages/release/bioc/html/edgeR.html) are great for differential expression analyses.
diff --git a/docs/no_toc/10b-single-cell-RNA-seq.md b/docs/no_toc/10b-single-cell-RNA-seq.md
index 450a8a12..cefcd671 100644
--- a/docs/no_toc/10b-single-cell-RNA-seq.md
+++ b/docs/no_toc/10b-single-cell-RNA-seq.md
@@ -9,17 +9,17 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png" title="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" alt="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png" alt="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" width="100%" />
 
 ## Where single-cell RNA-seq data comes from
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_6.png" title="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" alt="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_6.png" alt="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" width="100%" />
 
 As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity.
 
 Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity. If your research questions require cell-level transcriptional information, single-cell RNA-seq will on interest to you.
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_11.png" title="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." alt="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_11.png" alt="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." width="100%" />
 
 ## Single-cell RNA-seq data types
 
@@ -30,9 +30,9 @@ There are broadly two categories of single-cell RNA-seq data methods we will dis
 
 Depending on your goals for your single cell RNA-seq analysis, you may want to choose one method over the other.
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_25.png" title="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." alt="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_25.png" alt="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." width="100%" />
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_30.png" title="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_30.png" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" />
 
 (Material borrowed from [@AlexsLemonade2022]).
 
@@ -40,13 +40,13 @@ Depending on your goals for your single cell RNA-seq analysis, you may want to c
 
 Often Tag based single cell RNA-seq methods will include not only a cell barcode for cell identification but will also have a unique molecular identifier (UMI) for original molecule identification. The idea behind the UMIs is it is a way to have insight into the original snapshot of the cell and potentially combat PCR amplification biases.
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_41.png" title="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_41.png" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" />
 
 ## Single cell RNA-seq tools
 
 There are a lot of scRNA-seq tools for various steps along the way.
 
-<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png" title="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." alt="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." width="100%" />
+<img src="resources/images/10b-single-cell-RNA-seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png" alt="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." width="100%" />
 
 In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting doublets (also known as duplets), and using this information to filter out data that is not trustworthy. [Doublets are transcriptome data generated from two cells](https://bioconductor.org/books/3.15/OSCA.advanced/doublet-detection.html), and an undesired technical artifact when single cell RNA-seq workflows want data representing a single cell at a time. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses.
 
diff --git a/docs/no_toc/10c-spatial-transcriptomics.md b/docs/no_toc/10c-spatial-transcriptomics.md
index a4b3719c..ed7b927d 100644
--- a/docs/no_toc/10c-spatial-transcriptomics.md
+++ b/docs/no_toc/10c-spatial-transcriptomics.md
@@ -2,13 +2,9 @@
 
 # Spatial transcriptomics
 
-::: warning
-This chapter has currently been written by ChatGPT and has not been verified by experts. We need help writing and reviewing it! If you wish to contribute, please [go to this form](https://forms.gle/dqYgmKH8XXE2ohwD9) or our [GitHub page](https://github.com/fhdsl/Choosing_Genomics_Tools).
-:::
-
 ## Learning objectives
 
-<img src="resources/images/10c-spatial-transcriptomics_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png" title="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" alt="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" width="100%" />
+<img src="resources/images/10c-spatial-transcriptomics_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png" alt="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" width="100%" />
 
 ## What are the goals of spatial transcriptomic analysis?
 
@@ -24,7 +20,7 @@ Spatial transcriptomics (ST) technologies have been developed as a solution to t
 
 There is a large diversity in approaches to spatially profile tissues. Some ST technologies allow profiling at coarse cellular resolution, where regions of interest (ROIs) are usually identified by a pathologist. These ROIs may include tens of cells up to few hundreds (e.g., GeoMx @bergholtz2021best). Smaller ROI sizes can be found in other technologies such as Visium, where ROIs of 55uM of diameter (or "spots") often contain no more than 10 cells (<https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data>). For finer cellular resolution, technologies such as MERFISH, SMI, or Xenium, among others, can measure gene expression at individual cells [@yue2023guidebook]. In general, there is a trade-off between the cellular resolution and molecular resolution, as the number of quantified genes and RNA molecules is lower in single-cell level spatial technologies compared to those at the ROI or spot level. In single-cell ST, often a panel of hundreds of genes is quantified, while in "mini-bulk" (ROI/spot) ST, it is possible to genes at the whole transcriptome level.
 
-<img src="resources/images/10c-spatial-transcriptomics_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_461_0.png" title="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." alt="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." width="100%" />
+<img src="resources/images/10c-spatial-transcriptomics_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_461_0.png" alt="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." width="100%" />
 
 In addition to the differences in cellular and molecular, there are fundamental differences in the chemistry used to count the RNA transcripts in the tissue [@wang2021spatial; @yue2023guidebook]. Capture or hybridization of RNA followed by sequencing, or fluorescent imaging are two of the most common techniques used in ST methods. Because of large diversity in resolution and chemical procedures among ST technologies, data collection workflows are equally diverse. Finally, each study poses specific questions that cannot be addressed with traditional scRNA-seq pipelines, requiring customized workflows.
 
diff --git a/docs/no_toc/11-chromatin.md b/docs/no_toc/11-chromatin.md
index 9c4021da..dbd2ed6d 100644
--- a/docs/no_toc/11-chromatin.md
+++ b/docs/no_toc/11-chromatin.md
@@ -11,7 +11,7 @@ In its existing form, this chapter has been written with AI and still needs furt
 
 ## Learning Objectives
 
-<img src="resources/images/11-chromatin_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png" title="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." alt="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." width="100%" />
+<img src="resources/images/11-chromatin_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png" alt="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." width="100%" />
 
 ## Why are people interested in chromatin?
 
@@ -41,7 +41,7 @@ Therefore, understanding the mechanisms that regulate chromatin structure and fu
 
 ## Comparison of technologies
 
-<img src="resources/images/11-chromatin_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_5.png" title="A table that compares all the technologies:" alt="A table that compares all the technologies:" width="100%" />
+<img src="resources/images/11-chromatin_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_5.png" alt="A table that compares all the technologies:" width="100%" />
 
 ### ATAC-seq:
 
diff --git a/docs/no_toc/11a-ATAC-Seq.md b/docs/no_toc/11a-ATAC-Seq.md
index cbf404c9..cf9acd53 100644
--- a/docs/no_toc/11a-ATAC-Seq.md
+++ b/docs/no_toc/11a-ATAC-Seq.md
@@ -9,28 +9,28 @@ This chapter is incomplete! If you wish to contribute, please [go to this form](
 
 ## Learning Objectives
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" width="100%" />
 
 ## What are the goals of ATAC-Seq analysis?
 
 The goals of ATAC-seq are to identify the accessible regions of the genome in a particular set of samples. These data allow us to understand the relationships between the chromatin accessibility patterns and cell states, and to understand the mechanistic causes and consequences of these chromatin accessibility patterns.  
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png" title="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " alt="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png" alt="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " width="100%" />
 
 ATAC-seq data is generated by fragmenting the genome with the Tn5 endonuclease and sequencing the shorter DNA fragments. While most of the genome is associated with protein complexes that preclude the digestion of DNA by Tn5, some regions of the genome have accessible chromatin that can be cleaved by Tn5 resulting in short (<500bp) fragments. These regions of the genome are of biological interest as they are likely to harbor transcription factor binding sites and to constitute cis-regulatory elements, genomic regions that are involved in the regulation of gene expression.
 
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" title="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" alt="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" alt="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" width="100%" />
 
 ### What questions can be answered with ATAC-seq?
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png" title="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" alt="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png" alt="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" width="100%" />
 
 ## ATAC-Seq general workflow overview
 
 A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means.
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" title="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." alt="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" alt="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." width="100%" />
 
 ### Data quality metrics:
 
@@ -38,13 +38,13 @@ A basic ATAC-seq workflow involves mapping sequence reads to the genome, identif
 
 #### Sequencing considerations:
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png" title="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" alt="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png" alt="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" width="100%" />
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png" title="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" alt="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png" alt="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" width="100%" />
 
 #### Pre-alignment QC:
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" title="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" />
 
 A tool like FastQC or similar should be used to check for GC content, read quality and length, and primer or adapter reads prior to alignment. Trimmomatic is a useful tool for removing primer and adapter sequences if they are present. ATAC-seq experiments should be sequenced with paired-end sequencing, and existing pipelines will expect paired-end. (2 files *_R1.fastq and *_R2.fastq)
 
@@ -61,7 +61,7 @@ As for all DNA-sequencing based genomics technologies, a sufficient number of ma
 
 #### Post-alignment QC:
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" title="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" />
 
 Post alignment: check percent of matched, unmatched, unpaired and duplicated reads. Reads which are duplicated or unmatched should be filtered out.
 [Picard](https://broadinstitute.github.io/picard/) is a useful tool for this step.
@@ -71,7 +71,7 @@ Reads on the + strand should be shifted +4bp, reads on the - strand should be sh
 
 ATAC-seq data is often generated using paired end sequencing technologies, which allow for characterization of ATAC-seq fragments. Histograms of these distributions using single base pair resolution bins reveal patterns of enrichment relative to the nucleosome scale of 147bp and the DNA-helix scale ~10.5bp.
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png" title="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" alt="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png" alt="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" width="100%" />
 
 When comparing ATAC-seq samples, it is important to consider the fragment size distributions of the samples being compared. Differences in the distributions could lead to results that are unrelated to biology.
 
@@ -81,7 +81,7 @@ When comparing ATAC-seq samples, it is important to consider the fragment size d
 ATAC-seq peak calling typically makes use of analysis tools developed for ChIP-seq. MACS2 is one of the most common choices for a peak calling tool, but HOMER or other common ChIP-seq peak callers are also acceptable.
 An input sample is not typically generated for ATAC-seq as it would be for a ChIP-seq experiment, so the major requirement for the peak caller is that it does not require the input control to call peaks.
 
-<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png" title="Overview of ATAC-seq data analysis pipeline" alt="Overview of ATAC-seq data analysis pipeline" width="100%" />
+<img src="resources/images/11a-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png" alt="Overview of ATAC-seq data analysis pipeline" width="100%" />
 #### Number of peaks:
 
 Although the number of accessible chromatin regions can vary from one cell type to another, there are several regions that appear to be constitutively accessible across most cell types. At least 20,000 peaks can be identified in a high quality experiment.  The deeper the sequencing the more peaks will be detected in an ATAC-seq experiments. At a very high sequencing depth some of the statistically significant peaks might not be of biological interest. In an analysis of such data sets the fold enrichment relative to background, or absolute peak signal, in addition to statistical significance, ought to be taken into account.
diff --git a/docs/no_toc/11b-sc-ATAC-Seq.md b/docs/no_toc/11b-sc-ATAC-Seq.md
index c4d2464a..4a39aa53 100644
--- a/docs/no_toc/11b-sc-ATAC-Seq.md
+++ b/docs/no_toc/11b-sc-ATAC-Seq.md
@@ -9,7 +9,7 @@ This chapter is incomplete! If you wish to contribute, please [go to this form](
 
 ## Learning Objectives
 
-<img src="resources/images/11b-sc-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" width="100%" />
+<img src="resources/images/11b-sc-ATAC-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" width="100%" />
 
 ## What are the goals of scATAC-seq analysis?
 
diff --git a/docs/no_toc/11c-ChIP-Seq.md b/docs/no_toc/11c-ChIP-Seq.md
index 14b4c891..ac6aaa4c 100644
--- a/docs/no_toc/11c-ChIP-Seq.md
+++ b/docs/no_toc/11c-ChIP-Seq.md
@@ -9,12 +9,12 @@ This chapter is in a beta stage. If you wish to contribute, please [go to this f
 
 ## Learning Objectives
 
-<img src="resources/images/11c-ChIP-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" width="100%" />
+<img src="resources/images/11c-ChIP-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" width="100%" />
 
 ## What are the goals of ChIP-Seq analysis?
 
 
-<img src="resources/images/11c-ChIP-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png" title="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." alt="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." width="100%" />
+<img src="resources/images/11c-ChIP-Seq_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png" alt="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." width="100%" />
 
 ChIP-Seq (chromatin immunoprecipitation sequencing) and related approaches are used to identify genome-wide binding sites of specific proteins or protein complexes. Given the diversity of interactions at the DNA-protein interface, sequencing-based methods for targeted chromatin capture have evolved to meet precise research needs and improve the quality of the results. Specifically, ChIP-Seq builds on protein immunoprecipitation techniques (IP) by applying next generation sequencing to a pulldown product. IP followed by sequencing can be applied to any nucleic-acid binding protein for which an antibody is available, including a known or putative transcription factor (TF), chromatin remodeler or histone modifications, or other DNA- or chromatin-specific factors. ChiP-Seq approaches have been honed to increase signal-to-noise, reduce input material, and more specifically map protein-DNA interactions, for example by treating the IP product with a exonuclease that chews-back unprotected DNA end (e.g. ChIP-exo).
 
diff --git a/docs/no_toc/12-methylation.md b/docs/no_toc/12-methylation.md
index 34bed4c1..c227800f 100644
--- a/docs/no_toc/12-methylation.md
+++ b/docs/no_toc/12-methylation.md
@@ -9,7 +9,7 @@ This chapter is incomplete! If you wish to contribute, please [go to this form](
 
 ## Learning Objectives
 
-<img src="resources/images/12-methylation_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png" title="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" alt="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" width="100%" />
+<img src="resources/images/12-methylation_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png" alt="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" width="100%" />
 
 ## What are the goals of analyzing DNA methylation?
 
@@ -47,7 +47,7 @@ Because of this, its been proposed that the most appropriate way to model these
 
 ## Methylation data workflow
 
-<img src="resources/images/12-methylation_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png" title="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " alt="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " width="100%" />
+<img src="resources/images/12-methylation_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png" alt="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " width="100%" />
 
 Like other sequencing methods, you will first need to start by quality control checks. Next, you will also need to align your sequences to the genome. Then, using the base calls, you will need to make methylation calls -- which are methylated and which are not. This details of step depends on whether you are measuring 5mC and/or 5hmC methylation calls. Lastly, you will likely want to use your methylation calls as a whole to identify differentially methylated regions of interest.
 
diff --git a/docs/no_toc/13-microbiome.md b/docs/no_toc/13-microbiome.md
index fc0707a4..95c4a3f1 100644
--- a/docs/no_toc/13-microbiome.md
+++ b/docs/no_toc/13-microbiome.md
@@ -9,7 +9,8 @@ This chapter is incomplete! If you wish to contribute, please [go to this form](
 
 ## Learning Objectives
 
-<img src="resources/images/13-microbiome_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png" title="Learning Objectives" alt="Learning Objectives" width="100%" />
+<img src="resources/images/13-microbiome_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png" alt="Learning Objectives" width="100%" />
+
 ## A Brief Introduction to Microbiomes
 
 
@@ -22,14 +23,16 @@ Microbes are everywhere. We have found these tiny organisms in the deepest regio
 
  If we looked hard enough, I think we’d find them on the surface of the moon and Mars, though they are probably microbes who stowed away on our spacecraft and are now patiently waiting for a drop of water that may or may not ever show up. If we ever colonize those worlds, microbes will be an indispensable ally in creating an environment that could sustain us.
 
-<img src="resources/images/13-microbiome_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g26ebab787e9_0_0.png" title="Learning Objectives" alt="Learning Objectives" width="100%" />
+<img src="resources/images/13-microbiome_files/figure-html//1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g26ebab787e9_0_0.png" alt="Learning Objectives" width="100%" />
 This figure is adapted from [@Tignat-Perrier2022] under Creative Commons license.
 
 Microbes almost never live alone in the real world (i.e., outside of a laboratory). Rather they exist in communities of different species who are interacting with each other and their environment. Some of these communities will have many different types of organisms, and some will have only a few. Because of the large number of species and individuals involved, no two communities will ever be exactly alike, and quantifying differences between microbial communities is an important area of research at the moment. The types of interactions between organisms are also highly varied. These can include mutualistic relationships, where both organisms benefit from the interaction; parasitic relationships, where one organism exclusively benefits to the detriment of the other; and the full gradient in between.
 
 Microbiome science is everywhere. There are tens of articles published daily in the scientific literature, and many popular science articles and books present these findings to the world of non-scientists. Understanding the promises and limitations of the methods of microbiome science can help avoid misconceptions about microbiome research, and it’s important for practitioners of microbiome science to understand and convey the promise and limitations of our field. Misconceptions abound, frequently arising from the same sources as high-quality popular science microbiome reporting.
 
-    For example, on 5 Feb 2015 an article appeared in the New York Times noting (almost offhand) that Yersinia pestis, the organism responsible for Bubonic plague, had been found in multiple locations throughout the New York City subway system as part of its normal built environment microbiome. This was rapidly followed up on 6 Feb 2015 with an article noting that there was probably not Bubonic plague on the subway system after all, but rather that the approaches used by the research team are limited in their taxonomic resolution, and that likely a harmless close relative of Y. pestis was observed: “What the researchers probably found, [a spokesman for the university where the study originated] said, was bacteria from an unknown species or from organisms that happened to share some gene sequences with the plague bacterium…”.
+```
+For example, on 5 Feb 2015 an article appeared in the New York Times noting (almost offhand) that Yersinia pestis, the organism responsible for Bubonic plague, had been found in multiple locations throughout the New York City subway system as part of its normal built environment microbiome. This was rapidly followed up on 6 Feb 2015 with an article noting that there was probably not Bubonic plague on the subway system after all, but rather that the approaches used by the research team are limited in their taxonomic resolution, and that likely a harmless close relative of Y. pestis was observed: “What the researchers probably found, [a spokesman for the university where the study originated] said, was bacteria from an unknown species or from organisms that happened to share some gene sequences with the plague bacterium…”.
+```
 
  As microbiome services and products are increasingly marketed directly to the public, consumers of microbiome research findings, products, and services need to know how to critically evaluate these offerings and their associated claims. As practitioners in the field, we can help by ensuring that the methods we apply are appropriate and reliable, and that we make our work accessible.
 
diff --git a/docs/no_toc/404.html b/docs/no_toc/404.html
index 44f92ae3..e7a12887 100644
--- a/docs/no_toc/404.html
+++ b/docs/no_toc/404.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Page not found | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Page not found | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 
 
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Page not found | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -548,10 +542,17 @@ <h1>Page not found</h1>
 the table of contents to find the page you are looking for.</p>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -621,7 +622,7 @@ <h1>Page not found</h1>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/About.md b/docs/no_toc/About.md
index d54c3981..f3a6ba4f 100644
--- a/docs/no_toc/About.md
+++ b/docs/no_toc/About.md
@@ -31,76 +31,95 @@ These credits are based on our [course contributors table guidelines](https://gi
 
 ```
 ## ─ Session info ───────────────────────────────────────────────────────────────
-##  setting  value                       
-##  version  R version 4.0.2 (2020-06-22)
-##  os       Ubuntu 20.04.5 LTS          
-##  system   x86_64, linux-gnu           
-##  ui       X11                         
-##  language (EN)                        
-##  collate  en_US.UTF-8                 
-##  ctype    en_US.UTF-8                 
-##  tz       Etc/UTC                     
-##  date     2024-05-23                  
+##  setting  value
+##  version  R version 4.3.2 (2023-10-31)
+##  os       Ubuntu 22.04.4 LTS
+##  system   x86_64, linux-gnu
+##  ui       X11
+##  language (EN)
+##  collate  en_US.UTF-8
+##  ctype    en_US.UTF-8
+##  tz       Etc/UTC
+##  date     2024-12-11
+##  pandoc   3.1.1 @ /usr/local/bin/ (via rmarkdown)
 ## 
 ## ─ Packages ───────────────────────────────────────────────────────────────────
-##  package     * version date       lib source                            
-##  askpass       1.1     2019-01-13 [1] RSPM (R 4.0.3)                    
-##  assertthat    0.2.1   2019-03-21 [1] RSPM (R 4.0.5)                    
-##  bookdown      0.24    2024-03-13 [1] Github (rstudio/bookdown@88bc4ea) 
-##  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.0.2)                    
-##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.0.2)                    
-##  callr         3.5.0   2020-10-08 [1] RSPM (R 4.0.2)                    
-##  cli           3.6.2   2023-12-11 [1] CRAN (R 4.0.2)                    
-##  crayon        1.3.4   2017-09-16 [1] RSPM (R 4.0.0)                    
-##  desc          1.2.0   2018-05-01 [1] RSPM (R 4.0.3)                    
-##  devtools      2.3.2   2020-09-18 [1] RSPM (R 4.0.3)                    
-##  digest        0.6.25  2020-02-23 [1] RSPM (R 4.0.0)                    
-##  ellipsis      0.3.1   2020-05-15 [1] RSPM (R 4.0.3)                    
-##  evaluate      0.23    2023-11-01 [1] CRAN (R 4.0.2)                    
-##  fansi         0.4.1   2020-01-08 [1] RSPM (R 4.0.0)                    
-##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.0.2)                    
-##  fs            1.5.0   2020-07-31 [1] RSPM (R 4.0.3)                    
-##  glue          1.4.2   2020-08-27 [1] RSPM (R 4.0.5)                    
-##  hms           0.5.3   2020-01-08 [1] RSPM (R 4.0.0)                    
-##  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.0.2)                    
-##  httr          1.4.2   2020-07-20 [1] RSPM (R 4.0.3)                    
-##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.0.2)                    
-##  jsonlite      1.7.1   2020-09-07 [1] RSPM (R 4.0.2)                    
-##  knitr         1.33    2024-03-13 [1] Github (yihui/knitr@a1052d1)      
-##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.0.2)                    
-##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.0.2)                    
-##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.0.2)                    
-##  openssl       1.4.3   2020-09-18 [1] RSPM (R 4.0.3)                    
-##  ottrpal       1.2.1   2024-03-13 [1] Github (jhudsl/ottrpal@48e8c44)   
-##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.0.2)                    
-##  pkgbuild      1.1.0   2020-07-13 [1] RSPM (R 4.0.2)                    
-##  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.0.3)                    
-##  pkgload       1.1.0   2020-05-29 [1] RSPM (R 4.0.3)                    
-##  prettyunits   1.1.1   2020-01-24 [1] RSPM (R 4.0.3)                    
-##  processx      3.4.4   2020-09-03 [1] RSPM (R 4.0.2)                    
-##  ps            1.4.0   2020-10-07 [1] RSPM (R 4.0.2)                    
-##  R6            2.4.1   2019-11-12 [1] RSPM (R 4.0.0)                    
-##  readr         1.4.0   2020-10-05 [1] RSPM (R 4.0.2)                    
-##  remotes       2.2.0   2020-07-21 [1] RSPM (R 4.0.3)                    
-##  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.0.2)                    
-##  rmarkdown     2.10    2024-03-13 [1] Github (rstudio/rmarkdown@02d3c25)
-##  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.0.2)                    
-##  sass          0.4.8   2023-12-06 [1] CRAN (R 4.0.2)                    
-##  sessioninfo   1.1.1   2018-11-05 [1] RSPM (R 4.0.3)                    
-##  stringi       1.5.3   2020-09-09 [1] RSPM (R 4.0.3)                    
-##  stringr       1.4.0   2019-02-10 [1] RSPM (R 4.0.3)                    
-##  testthat      3.0.1   2024-03-13 [1] Github (R-lib/testthat@e99155a)   
-##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.0.2)                    
-##  usethis       1.6.3   2020-09-17 [1] RSPM (R 4.0.2)                    
-##  utf8          1.1.4   2018-05-24 [1] RSPM (R 4.0.3)                    
-##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.0.2)                    
-##  withr         2.3.0   2020-09-22 [1] RSPM (R 4.0.2)                    
-##  xfun          0.26    2024-03-13 [1] Github (yihui/xfun@74c2a66)       
-##  xml2          1.3.2   2020-04-23 [1] RSPM (R 4.0.3)                    
-##  yaml          2.2.1   2020-02-01 [1] RSPM (R 4.0.3)                    
+##  package     * version date (UTC) lib source
+##  askpass       1.2.0   2023-09-03 [1] RSPM (R 4.3.0)
+##  bookdown      0.41    2024-10-16 [1] CRAN (R 4.3.2)
+##  bslib         0.6.1   2023-11-28 [1] RSPM (R 4.3.0)
+##  cachem        1.0.8   2023-05-01 [1] RSPM (R 4.3.0)
+##  chromote      0.3.1   2024-08-30 [1] CRAN (R 4.3.2)
+##  cli           3.6.2   2023-12-11 [1] RSPM (R 4.3.0)
+##  devtools      2.4.5   2022-10-11 [1] RSPM (R 4.3.0)
+##  digest        0.6.34  2024-01-11 [1] RSPM (R 4.3.0)
+##  dplyr         1.1.4   2023-11-17 [1] RSPM (R 4.3.0)
+##  ellipsis      0.3.2   2021-04-29 [1] RSPM (R 4.3.0)
+##  evaluate      0.23    2023-11-01 [1] RSPM (R 4.3.0)
+##  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.3.0)
+##  fastmap       1.1.1   2023-02-24 [1] RSPM (R 4.3.0)
+##  fs            1.6.3   2023-07-20 [1] RSPM (R 4.3.0)
+##  generics      0.1.3   2022-07-05 [1] RSPM (R 4.3.0)
+##  glue          1.7.0   2024-01-09 [1] RSPM (R 4.3.0)
+##  hms           1.1.3   2023-03-21 [1] RSPM (R 4.3.0)
+##  htmltools     0.5.7   2023-11-03 [1] RSPM (R 4.3.0)
+##  htmlwidgets   1.6.4   2023-12-06 [1] RSPM (R 4.3.0)
+##  httpuv        1.6.14  2024-01-26 [1] RSPM (R 4.3.0)
+##  httr          1.4.7   2023-08-15 [1] RSPM (R 4.3.0)
+##  janitor       2.2.0   2023-02-02 [1] RSPM (R 4.3.0)
+##  jquerylib     0.1.4   2021-04-26 [1] RSPM (R 4.3.0)
+##  jsonlite      1.8.8   2023-12-04 [1] RSPM (R 4.3.0)
+##  knitr         1.48    2024-07-07 [1] CRAN (R 4.3.2)
+##  later         1.3.2   2023-12-06 [1] RSPM (R 4.3.0)
+##  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.3.0)
+##  lubridate     1.9.3   2023-09-27 [1] RSPM (R 4.3.0)
+##  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.3.0)
+##  memoise       2.0.1   2021-11-26 [1] RSPM (R 4.3.0)
+##  mime          0.12    2021-09-28 [1] RSPM (R 4.3.0)
+##  miniUI        0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0)
+##  openssl       2.1.1   2023-09-25 [1] RSPM (R 4.3.0)
+##  ottrpal       1.3.0   2024-10-23 [1] Github (jhudsl/ottrpal@2e19782)
+##  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.3.0)
+##  pkgbuild      1.4.3   2023-12-10 [1] RSPM (R 4.3.0)
+##  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.3.0)
+##  pkgload       1.3.4   2024-01-16 [1] RSPM (R 4.3.0)
+##  processx      3.8.3   2023-12-10 [1] RSPM (R 4.3.0)
+##  profvis       0.3.8   2023-05-02 [1] RSPM (R 4.3.0)
+##  promises      1.2.1   2023-08-10 [1] RSPM (R 4.3.0)
+##  ps            1.7.6   2024-01-18 [1] RSPM (R 4.3.0)
+##  purrr         1.0.2   2023-08-10 [1] RSPM (R 4.3.0)
+##  R6            2.5.1   2021-08-19 [1] RSPM (R 4.3.0)
+##  Rcpp          1.0.12  2024-01-09 [1] RSPM (R 4.3.0)
+##  readr         2.1.5   2024-01-10 [1] RSPM (R 4.3.0)
+##  remotes       2.4.2.1 2023-07-18 [1] RSPM (R 4.3.0)
+##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.2)
+##  rmarkdown     2.25    2023-09-18 [1] RSPM (R 4.3.0)
+##  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.3.2)
+##  sass          0.4.8   2023-12-06 [1] RSPM (R 4.3.0)
+##  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.3.0)
+##  shiny         1.8.0   2023-11-17 [1] RSPM (R 4.3.0)
+##  snakecase     0.11.1  2023-08-27 [1] RSPM (R 4.3.0)
+##  stringi       1.8.3   2023-12-11 [1] RSPM (R 4.3.0)
+##  stringr       1.5.1   2023-11-14 [1] RSPM (R 4.3.0)
+##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
+##  tidyselect    1.2.0   2022-10-10 [1] RSPM (R 4.3.0)
+##  timechange    0.3.0   2024-01-18 [1] RSPM (R 4.3.0)
+##  tzdb          0.4.0   2023-05-12 [1] RSPM (R 4.3.0)
+##  urlchecker    1.0.1   2021-11-30 [1] RSPM (R 4.3.0)
+##  usethis       2.2.3   2024-02-19 [1] RSPM (R 4.3.0)
+##  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.3.0)
+##  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.3.0)
+##  webshot2      0.1.1   2023-08-11 [1] CRAN (R 4.3.2)
+##  websocket     1.4.2   2024-07-22 [1] CRAN (R 4.3.2)
+##  xfun          0.48    2024-10-03 [1] CRAN (R 4.3.2)
+##  xml2          1.3.6   2023-12-04 [1] RSPM (R 4.3.0)
+##  xtable        1.8-4   2019-04-21 [1] RSPM (R 4.3.0)
+##  yaml          2.3.8   2023-12-11 [1] RSPM (R 4.3.0)
 ## 
-## [1] /usr/local/lib/R/site-library
-## [2] /usr/local/lib/R/library
+##  [1] /usr/local/lib/R/site-library
+##  [2] /usr/local/lib/R/library
+## 
+## ──────────────────────────────────────────────────────────────────────────────
 ```
 
 <!-- Author information -->
diff --git a/docs/no_toc/a-very-general-genomics-overview.html b/docs/no_toc/a-very-general-genomics-overview.html
index 2e0130e8..4fb925c3 100644
--- a/docs/no_toc/a-very-general-genomics-overview.html
+++ b/docs/no_toc/a-very-general-genomics-overview.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 2 A Very General Genomics Overview | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 2 A Very General Genomics Overview | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="introduction.html"/>
 <link rel="next" href="guidelines-for-good-metadata.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 2 A Very General Genomics Overview | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,27 +535,27 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="a-very-general-genomics-overview" class="section level1" number="2">
-<h1><span class="header-section-number">Chapter 2</span> A Very General Genomics Overview</h1>
-<div id="learning-objectives" class="section level2" number="2.1">
-<h2><span class="header-section-number">2.1</span> Learning Objectives</h2>
-<p><img src="resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png" title="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" alt="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" width="100%" /></p>
+<div id="a-very-general-genomics-overview" class="section level1 hasAnchor" number="2">
+<h1><span class="header-section-number">Chapter 2</span> A Very General Genomics Overview<a href="a-very-general-genomics-overview.html#a-very-general-genomics-overview" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<div id="learning-objectives" class="section level2 hasAnchor" number="2.1">
+<h2><span class="header-section-number">2.1</span> Learning Objectives<a href="a-very-general-genomics-overview.html#learning-objectives" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png" alt="Learning objectives This chapter will demonstrate how to: Understand what will be covered in this course. Find information about your particular file format" width="100%" /></p>
 <p>In this chapter we are going to cover sequencing and microarray workflows at a very general high level overview to give you a first orientation. As we dive into specific data types and experiments, we will get into more specifics.
 Here we will cover the most common file formats. If you have a file format you are dealing with that you don’t see listed here, it may be specific to your data type and we will discuss that more in that data type’s respective chapter. We still suggest you go through this chapter to give you a basic understanding of commonalities of all genomic data types and workflows</p>
-<div id="what-do-genomics-workflows-look-like" class="section level3" number="2.1.1">
-<h3><span class="header-section-number">2.1.1</span> What do genomics workflows look like?</h3>
+<div id="what-do-genomics-workflows-look-like" class="section level3 hasAnchor" number="2.1.1">
+<h3><span class="header-section-number">2.1.1</span> What do genomics workflows look like?<a href="a-very-general-genomics-overview.html#what-do-genomics-workflows-look-like" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly, this summarized data is what can be used to make inferences and create plots and results tables.</p>
-<p><img src="resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png" title="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " alt="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " width="100%" /></p>
+<p><img src="resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png" alt="In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly this summarized data is what can be used to make inferences and create plots and results tables. " width="100%" /></p>
 </div>
-<div id="basic-file-formats" class="section level3" number="2.1.2">
-<h3><span class="header-section-number">2.1.2</span> Basic file formats</h3>
+<div id="basic-file-formats" class="section level3 hasAnchor" number="2.1.2">
+<h3><span class="header-section-number">2.1.2</span> Basic file formats<a href="a-very-general-genomics-overview.html#basic-file-formats" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Before we get into bioinformatic file types, we should establish some general file types that you likely have already worked with on your computer. These file types are used in all kinds of applications and not specific to bioinformatics.</p>
-<div id="txt---text" class="section level4" number="2.1.2.1">
-<h4><span class="header-section-number">2.1.2.1</span> TXT - Text</h4>
+<div id="txt---text" class="section level4 hasAnchor" number="2.1.2.1">
+<h4><span class="header-section-number">2.1.2.1</span> TXT - Text<a href="a-very-general-genomics-overview.html#txt---text" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A text file is a very basic file format that contains text!</p>
 </div>
-<div id="tsv---tab-separated-values" class="section level4" number="2.1.2.2">
-<h4><span class="header-section-number">2.1.2.2</span> TSV - Tab Separated Values</h4>
+<div id="tsv---tab-separated-values" class="section level4 hasAnchor" number="2.1.2.2">
+<h4><span class="header-section-number">2.1.2.2</span> TSV - Tab Separated Values<a href="a-very-general-genomics-overview.html#tsv---tab-separated-values" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Tab separated values file is a text file is good for storing a data table.
 It has rows and columns where each value is separated by (you guessed it), <em>tabs</em>.
 Most commonly, if your genomics data has been provided to you in a TSV or CSV file, it has been processed and summarized! It will be your job to know <em>how</em> it was processed and summarized</p>
@@ -570,8 +564,8 @@ <h4><span class="header-section-number">2.1.2.2</span> TSV - Tab Separated Value
 gene_a⇥12⇥15,
 gene_b⇥13⇥14</code></pre>
 </div>
-<div id="csv---comma-separated-values" class="section level4" number="2.1.2.3">
-<h4><span class="header-section-number">2.1.2.3</span> CSV - Comma Separated Values</h4>
+<div id="csv---comma-separated-values" class="section level4 hasAnchor" number="2.1.2.3">
+<h4><span class="header-section-number">2.1.2.3</span> CSV - Comma Separated Values<a href="a-very-general-genomics-overview.html#csv---comma-separated-values" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A comma separated values file is list just like a TSV file but instead of values being separated by tabs it is separated by… (you guessed it), <em>commas</em>!</p>
 <p>In its raw form, a CSV file might look like our example below (but if you open it with a program for spreadsheets, like Excel or Googlesheets, it will look like a table)</p>
 <pre><code>gene_id, sample_1, sample_2,
@@ -579,25 +573,25 @@ <h4><span class="header-section-number">2.1.2.3</span> CSV - Comma Separated Val
 gene_b, 13, 14</code></pre>
 </div>
 </div>
-<div id="sequencing-file-formats" class="section level3" number="2.1.3">
-<h3><span class="header-section-number">2.1.3</span> Sequencing file formats</h3>
-<div id="sam---sequence-alignment-map" class="section level4" number="2.1.3.1">
-<h4><span class="header-section-number">2.1.3.1</span> SAM - Sequence Alignment Map</h4>
+<div id="sequencing-file-formats" class="section level3 hasAnchor" number="2.1.3">
+<h3><span class="header-section-number">2.1.3</span> Sequencing file formats<a href="a-very-general-genomics-overview.html#sequencing-file-formats" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="sam---sequence-alignment-map" class="section level4 hasAnchor" number="2.1.3.1">
+<h4><span class="header-section-number">2.1.3.1</span> SAM - Sequence Alignment Map<a href="a-very-general-genomics-overview.html#sam---sequence-alignment-map" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. <a href="https://samtools.github.io/hts-specs/SAMv1.pdf">For more about SAM files</a>.</p>
 </div>
-<div id="bam---binary-alignment-map" class="section level4" number="2.1.3.2">
-<h4><span class="header-section-number">2.1.3.2</span> BAM - Binary Alignment Map</h4>
+<div id="bam---binary-alignment-map" class="section level4 hasAnchor" number="2.1.3.2">
+<h4><span class="header-section-number">2.1.3.2</span> BAM - Binary Alignment Map<a href="a-very-general-genomics-overview.html#bam---binary-alignment-map" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily).</p>
 </div>
-<div id="fasta---fast-a" class="section level4" number="2.1.3.3">
-<h4><span class="header-section-number">2.1.3.3</span> FASTA - “fast A”</h4>
+<div id="fasta---fast-a" class="section level4 hasAnchor" number="2.1.3.3">
+<h4><span class="header-section-number">2.1.3.3</span> FASTA - “fast A”<a href="a-very-general-genomics-overview.html#fasta---fast-a" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence):</p>
 <pre><code>&gt;SEQ_ID
 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT</code></pre>
 <p>For <a href="https://en.wikipedia.org/wiki/FASTA_format">more about fasta files</a>.</p>
 </div>
-<div id="fastq---fast-q" class="section level4" number="2.1.3.4">
-<h4><span class="header-section-number">2.1.3.4</span> FASTQ - “Fast q”</h4>
+<div id="fastq---fast-q" class="section level4 hasAnchor" number="2.1.3.4">
+<h4><span class="header-section-number">2.1.3.4</span> FASTQ - “Fast q”<a href="a-very-general-genomics-overview.html#fastq---fast-q" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A Fastq file is like a Fasta file except that it also contains information about the <strong>Q</strong>uality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly?</p>
 <pre><code>@SEQ_ID
 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
@@ -607,53 +601,53 @@ <h4><span class="header-section-number">2.1.3.4</span> FASTQ - “Fast q”</h4>
 <p>Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you.</p>
 <p><em>Sequencing data that is not of high enough quality should not be trusted!</em> It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters.</p>
 </div>
-<div id="bcl---binary-base-call-bcl-sequence-file-format" class="section level4" number="2.1.3.5">
-<h4><span class="header-section-number">2.1.3.5</span> BCL - binary base call (BCL) sequence file format</h4>
+<div id="bcl---binary-base-call-bcl-sequence-file-format" class="section level4 hasAnchor" number="2.1.3.5">
+<h4><span class="header-section-number">2.1.3.5</span> BCL - binary base call (BCL) sequence file format<a href="a-very-general-genomics-overview.html#bcl---binary-base-call-bcl-sequence-file-format" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs.</p>
 <p><a href="https://medium.com/@marija190396/bcl-to-fastq-conversion-e289852823d0">More about BCL to Fastq conversion</a>.</p>
 </div>
-<div id="vcf---variant-call-format" class="section level4" number="2.1.3.6">
-<h4><span class="header-section-number">2.1.3.6</span> VCF - Variant Call Format</h4>
+<div id="vcf---variant-call-format" class="section level4 hasAnchor" number="2.1.3.6">
+<h4><span class="header-section-number">2.1.3.6</span> VCF - Variant Call Format<a href="a-very-general-genomics-overview.html#vcf---variant-call-format" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are <em>variant</em> from the reference genome or each other.</p>
 <p>This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter.</p>
 <p>For <a href="https://en.wikipedia.org/wiki/Variant_Call_Format">more on VCF files</a>.</p>
 </div>
-<div id="maf---mutation-annotation-format" class="section level4" number="2.1.3.7">
-<h4><span class="header-section-number">2.1.3.7</span> MAF - Mutation Annotation Format</h4>
+<div id="maf---mutation-annotation-format" class="section level4 hasAnchor" number="2.1.3.7">
+<h4><span class="header-section-number">2.1.3.7</span> MAF - Mutation Annotation Format<a href="a-very-general-genomics-overview.html#maf---mutation-annotation-format" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file.</p>
 <p>For <a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/#:~:text=Mutation%20Annotation%20Format%20(MAF)%20is,(or%20open%2Daccess).">more on MAF files</a>.</p>
 </div>
 </div>
-<div id="microarray-file-formats" class="section level3" number="2.1.4">
-<h3><span class="header-section-number">2.1.4</span> Microarray file formats</h3>
-<div id="idat---intensity-data-file" class="section level4" number="2.1.4.1">
-<h4><span class="header-section-number">2.1.4.1</span> IDAT - intensity data file</h4>
+<div id="microarray-file-formats" class="section level3 hasAnchor" number="2.1.4">
+<h3><span class="header-section-number">2.1.4</span> Microarray file formats<a href="a-very-general-genomics-overview.html#microarray-file-formats" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="idat---intensity-data-file" class="section level4 hasAnchor" number="2.1.4.1">
+<h4><span class="header-section-number">2.1.4.1</span> IDAT - intensity data file<a href="a-very-general-genomics-overview.html#idat---intensity-data-file" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly.</p>
 <p>Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into <a href="https://github.com/freeseek/gtc2vcf">this package to help you do that</a>.</p>
 <p><a href="https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/technote_array_analysis_workflows.pdf">For more on IDAT files</a>.</p>
 </div>
-<div id="dat---data-file" class="section level4" number="2.1.4.2">
-<h4><span class="header-section-number">2.1.4.2</span> DAT - data file</h4>
+<div id="dat---data-file" class="section level4 hasAnchor" number="2.1.4.2">
+<h4><span class="header-section-number">2.1.4.2</span> DAT - data file<a href="a-very-general-genomics-overview.html#dat---data-file" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/dat.html">For more on DAT files</a>.</p>
 </div>
-<div id="cel" class="section level4" number="2.1.4.3">
-<h4><span class="header-section-number">2.1.4.3</span> CEL</h4>
+<div id="cel" class="section level4 hasAnchor" number="2.1.4.3">
+<h4><span class="header-section-number">2.1.4.3</span> CEL<a href="a-very-general-genomics-overview.html#cel" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html">For more on CEL files</a></p>
 </div>
-<div id="chp" class="section level4" number="2.1.4.4">
-<h4><span class="header-section-number">2.1.4.4</span> CHP</h4>
+<div id="chp" class="section level4 hasAnchor" number="2.1.4.4">
+<h4><span class="header-section-number">2.1.4.4</span> CHP<a href="a-very-general-genomics-overview.html#chp" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/chp-xda.html">For more about CHP files</a>.</p>
 </div>
 </div>
 </div>
-<div id="general-informatics-files" class="section level2" number="2.2">
-<h2><span class="header-section-number">2.2</span> General informatics files</h2>
+<div id="general-informatics-files" class="section level2 hasAnchor" number="2.2">
+<h2><span class="header-section-number">2.2</span> General informatics files<a href="a-very-general-genomics-overview.html#general-informatics-files" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter:</p>
-<div id="bed---browser-extensible-data" class="section level4" number="2.2.0.1">
-<h4><span class="header-section-number">2.2.0.1</span> BED - Browser Extensible Data</h4>
+<div id="bed---browser-extensible-data" class="section level4 hasAnchor" number="2.2.0.1">
+<h4><span class="header-section-number">2.2.0.1</span> BED - Browser Extensible Data<a href="a-very-general-genomics-overview.html#bed---browser-extensible-data" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the <code>chrom</code>, <code>chromStart</code> and <code>chromEnd</code> columns to start.</p>
 <p>A BED file might look like this:</p>
 <pre><code>chrom   chromStart  chromEnd other_optional_columns
@@ -661,8 +655,8 @@ <h4><span class="header-section-number">2.2.0.1</span> BED - Browser Extensible
 chr2    100    3000  bad</code></pre>
 <p>For <a href="https://en.wikipedia.org/wiki/BED_(file_format)">more on BED files</a>.</p>
 </div>
-<div id="gffgtf-general-feature-formatgene-transfer-format" class="section level4" number="2.2.0.2">
-<h4><span class="header-section-number">2.2.0.2</span> GFF/GTF General Feature Format/Gene Transfer Format</h4>
+<div id="gffgtf-general-feature-formatgene-transfer-format" class="section level4 hasAnchor" number="2.2.0.2">
+<h4><span class="header-section-number">2.2.0.2</span> GFF/GTF General Feature Format/Gene Transfer Format<a href="a-very-general-genomics-overview.html#gffgtf-general-feature-formatgene-transfer-format" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data.</p>
 <p>You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version.</p>
 <p>A GFF file may look like this (borrowed example from Ensembl):</p>
@@ -670,18 +664,25 @@ <h4><span class="header-section-number">2.2.0.2</span> GFF/GTF General Feature F
 <p>Note that it will be useful for annotating genes and what we know about them.</p>
 <p>For <a href="https://useast.ensembl.org/info/website/upload/gff.html">more about GTF and GFF files</a>.</p>
 </div>
-<div id="other-files" class="section level3" number="2.2.1">
-<h3><span class="header-section-number">2.2.1</span> Other files</h3>
+<div id="other-files" class="section level3 hasAnchor" number="2.2.1">
+<h3><span class="header-section-number">2.2.1</span> Other files<a href="a-very-general-genomics-overview.html#other-files" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>* If you didn’t see a file type listed you are looking for, take a look at this <a href="https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats">list by the BROAD</a>. Or, it may be covered in the data type specific chapters.</p>
 
 </div>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -751,7 +752,7 @@ <h3><span class="header-section-number">2.2.1</span> Other files</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/about-the-authors.html b/docs/no_toc/about-the-authors.html
index 4e5ba074..35d5d810 100644
--- a/docs/no_toc/about-the-authors.html
+++ b/docs/no_toc/about-the-authors.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>About the Authors | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="About the Authors | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="itcr--omic-tool-glossary.html"/>
 <link rel="next" href="references.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>About the Authors | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,8 +535,8 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="about-the-authors" class="section level1 unnumbered">
-<h1>About the Authors</h1>
+<div id="about-the-authors" class="section level1 unnumbered hasAnchor">
+<h1>About the Authors<a href="about-the-authors.html#about-the-authors" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <p>These credits are based on our <a href="https://github.com/jhudsl/OTTR_Template/wiki/How-to-give-credits">course contributors table guidelines</a>.</p>
 <p> 
  </p>
@@ -630,85 +624,111 @@ <h1>About the Authors</h1>
 </table>
 <p> </p>
 <pre><code>## ─ Session info ───────────────────────────────────────────────────────────────
-##  setting  value                       
-##  version  R version 4.0.2 (2020-06-22)
-##  os       Ubuntu 20.04.5 LTS          
-##  system   x86_64, linux-gnu           
-##  ui       X11                         
-##  language (EN)                        
-##  collate  en_US.UTF-8                 
-##  ctype    en_US.UTF-8                 
-##  tz       Etc/UTC                     
-##  date     2024-05-23                  
+##  setting  value
+##  version  R version 4.3.2 (2023-10-31)
+##  os       Ubuntu 22.04.4 LTS
+##  system   x86_64, linux-gnu
+##  ui       X11
+##  language (EN)
+##  collate  en_US.UTF-8
+##  ctype    en_US.UTF-8
+##  tz       Etc/UTC
+##  date     2024-12-11
+##  pandoc   3.1.1 @ /usr/local/bin/ (via rmarkdown)
 ## 
 ## ─ Packages ───────────────────────────────────────────────────────────────────
-##  package     * version date       lib source                            
-##  askpass       1.1     2019-01-13 [1] RSPM (R 4.0.3)                    
-##  assertthat    0.2.1   2019-03-21 [1] RSPM (R 4.0.5)                    
-##  bookdown      0.24    2024-03-13 [1] Github (rstudio/bookdown@88bc4ea) 
-##  bslib         0.6.1   2023-11-28 [1] CRAN (R 4.0.2)                    
-##  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.0.2)                    
-##  callr         3.5.0   2020-10-08 [1] RSPM (R 4.0.2)                    
-##  cli           3.6.2   2023-12-11 [1] CRAN (R 4.0.2)                    
-##  crayon        1.3.4   2017-09-16 [1] RSPM (R 4.0.0)                    
-##  desc          1.2.0   2018-05-01 [1] RSPM (R 4.0.3)                    
-##  devtools      2.3.2   2020-09-18 [1] RSPM (R 4.0.3)                    
-##  digest        0.6.25  2020-02-23 [1] RSPM (R 4.0.0)                    
-##  ellipsis      0.3.1   2020-05-15 [1] RSPM (R 4.0.3)                    
-##  evaluate      0.23    2023-11-01 [1] CRAN (R 4.0.2)                    
-##  fansi         0.4.1   2020-01-08 [1] RSPM (R 4.0.0)                    
-##  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.0.2)                    
-##  fs            1.5.0   2020-07-31 [1] RSPM (R 4.0.3)                    
-##  glue          1.4.2   2020-08-27 [1] RSPM (R 4.0.5)                    
-##  hms           0.5.3   2020-01-08 [1] RSPM (R 4.0.0)                    
-##  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.0.2)                    
-##  httr          1.4.2   2020-07-20 [1] RSPM (R 4.0.3)                    
-##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.0.2)                    
-##  jsonlite      1.7.1   2020-09-07 [1] RSPM (R 4.0.2)                    
-##  knitr         1.33    2024-03-13 [1] Github (yihui/knitr@a1052d1)      
-##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.0.2)                    
-##  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.0.2)                    
-##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.0.2)                    
-##  openssl       1.4.3   2020-09-18 [1] RSPM (R 4.0.3)                    
-##  ottrpal       1.2.1   2024-03-13 [1] Github (jhudsl/ottrpal@48e8c44)   
-##  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.0.2)                    
-##  pkgbuild      1.1.0   2020-07-13 [1] RSPM (R 4.0.2)                    
-##  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.0.3)                    
-##  pkgload       1.1.0   2020-05-29 [1] RSPM (R 4.0.3)                    
-##  prettyunits   1.1.1   2020-01-24 [1] RSPM (R 4.0.3)                    
-##  processx      3.4.4   2020-09-03 [1] RSPM (R 4.0.2)                    
-##  ps            1.4.0   2020-10-07 [1] RSPM (R 4.0.2)                    
-##  R6            2.4.1   2019-11-12 [1] RSPM (R 4.0.0)                    
-##  readr         1.4.0   2020-10-05 [1] RSPM (R 4.0.2)                    
-##  remotes       2.2.0   2020-07-21 [1] RSPM (R 4.0.3)                    
-##  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.0.2)                    
-##  rmarkdown     2.10    2024-03-13 [1] Github (rstudio/rmarkdown@02d3c25)
-##  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.0.2)                    
-##  sass          0.4.8   2023-12-06 [1] CRAN (R 4.0.2)                    
-##  sessioninfo   1.1.1   2018-11-05 [1] RSPM (R 4.0.3)                    
-##  stringi       1.5.3   2020-09-09 [1] RSPM (R 4.0.3)                    
-##  stringr       1.4.0   2019-02-10 [1] RSPM (R 4.0.3)                    
-##  testthat      3.0.1   2024-03-13 [1] Github (R-lib/testthat@e99155a)   
-##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.0.2)                    
-##  usethis       1.6.3   2020-09-17 [1] RSPM (R 4.0.2)                    
-##  utf8          1.1.4   2018-05-24 [1] RSPM (R 4.0.3)                    
-##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.0.2)                    
-##  withr         2.3.0   2020-09-22 [1] RSPM (R 4.0.2)                    
-##  xfun          0.26    2024-03-13 [1] Github (yihui/xfun@74c2a66)       
-##  xml2          1.3.2   2020-04-23 [1] RSPM (R 4.0.3)                    
-##  yaml          2.2.1   2020-02-01 [1] RSPM (R 4.0.3)                    
+##  package     * version date (UTC) lib source
+##  askpass       1.2.0   2023-09-03 [1] RSPM (R 4.3.0)
+##  bookdown      0.41    2024-10-16 [1] CRAN (R 4.3.2)
+##  bslib         0.6.1   2023-11-28 [1] RSPM (R 4.3.0)
+##  cachem        1.0.8   2023-05-01 [1] RSPM (R 4.3.0)
+##  chromote      0.3.1   2024-08-30 [1] CRAN (R 4.3.2)
+##  cli           3.6.2   2023-12-11 [1] RSPM (R 4.3.0)
+##  devtools      2.4.5   2022-10-11 [1] RSPM (R 4.3.0)
+##  digest        0.6.34  2024-01-11 [1] RSPM (R 4.3.0)
+##  dplyr         1.1.4   2023-11-17 [1] RSPM (R 4.3.0)
+##  ellipsis      0.3.2   2021-04-29 [1] RSPM (R 4.3.0)
+##  evaluate      0.23    2023-11-01 [1] RSPM (R 4.3.0)
+##  fansi         1.0.6   2023-12-08 [1] RSPM (R 4.3.0)
+##  fastmap       1.1.1   2023-02-24 [1] RSPM (R 4.3.0)
+##  fs            1.6.3   2023-07-20 [1] RSPM (R 4.3.0)
+##  generics      0.1.3   2022-07-05 [1] RSPM (R 4.3.0)
+##  glue          1.7.0   2024-01-09 [1] RSPM (R 4.3.0)
+##  hms           1.1.3   2023-03-21 [1] RSPM (R 4.3.0)
+##  htmltools     0.5.7   2023-11-03 [1] RSPM (R 4.3.0)
+##  htmlwidgets   1.6.4   2023-12-06 [1] RSPM (R 4.3.0)
+##  httpuv        1.6.14  2024-01-26 [1] RSPM (R 4.3.0)
+##  httr          1.4.7   2023-08-15 [1] RSPM (R 4.3.0)
+##  janitor       2.2.0   2023-02-02 [1] RSPM (R 4.3.0)
+##  jquerylib     0.1.4   2021-04-26 [1] RSPM (R 4.3.0)
+##  jsonlite      1.8.8   2023-12-04 [1] RSPM (R 4.3.0)
+##  knitr         1.48    2024-07-07 [1] CRAN (R 4.3.2)
+##  later         1.3.2   2023-12-06 [1] RSPM (R 4.3.0)
+##  lifecycle     1.0.4   2023-11-07 [1] RSPM (R 4.3.0)
+##  lubridate     1.9.3   2023-09-27 [1] RSPM (R 4.3.0)
+##  magrittr      2.0.3   2022-03-30 [1] RSPM (R 4.3.0)
+##  memoise       2.0.1   2021-11-26 [1] RSPM (R 4.3.0)
+##  mime          0.12    2021-09-28 [1] RSPM (R 4.3.0)
+##  miniUI        0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0)
+##  openssl       2.1.1   2023-09-25 [1] RSPM (R 4.3.0)
+##  ottrpal       1.3.0   2024-10-23 [1] Github (jhudsl/ottrpal@2e19782)
+##  pillar        1.9.0   2023-03-22 [1] RSPM (R 4.3.0)
+##  pkgbuild      1.4.3   2023-12-10 [1] RSPM (R 4.3.0)
+##  pkgconfig     2.0.3   2019-09-22 [1] RSPM (R 4.3.0)
+##  pkgload       1.3.4   2024-01-16 [1] RSPM (R 4.3.0)
+##  processx      3.8.3   2023-12-10 [1] RSPM (R 4.3.0)
+##  profvis       0.3.8   2023-05-02 [1] RSPM (R 4.3.0)
+##  promises      1.2.1   2023-08-10 [1] RSPM (R 4.3.0)
+##  ps            1.7.6   2024-01-18 [1] RSPM (R 4.3.0)
+##  purrr         1.0.2   2023-08-10 [1] RSPM (R 4.3.0)
+##  R6            2.5.1   2021-08-19 [1] RSPM (R 4.3.0)
+##  Rcpp          1.0.12  2024-01-09 [1] RSPM (R 4.3.0)
+##  readr         2.1.5   2024-01-10 [1] RSPM (R 4.3.0)
+##  remotes       2.4.2.1 2023-07-18 [1] RSPM (R 4.3.0)
+##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.2)
+##  rmarkdown     2.25    2023-09-18 [1] RSPM (R 4.3.0)
+##  rprojroot     2.0.4   2023-11-05 [1] CRAN (R 4.3.2)
+##  sass          0.4.8   2023-12-06 [1] RSPM (R 4.3.0)
+##  sessioninfo   1.2.2   2021-12-06 [1] RSPM (R 4.3.0)
+##  shiny         1.8.0   2023-11-17 [1] RSPM (R 4.3.0)
+##  snakecase     0.11.1  2023-08-27 [1] RSPM (R 4.3.0)
+##  stringi       1.8.3   2023-12-11 [1] RSPM (R 4.3.0)
+##  stringr       1.5.1   2023-11-14 [1] RSPM (R 4.3.0)
+##  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.2)
+##  tidyselect    1.2.0   2022-10-10 [1] RSPM (R 4.3.0)
+##  timechange    0.3.0   2024-01-18 [1] RSPM (R 4.3.0)
+##  tzdb          0.4.0   2023-05-12 [1] RSPM (R 4.3.0)
+##  urlchecker    1.0.1   2021-11-30 [1] RSPM (R 4.3.0)
+##  usethis       2.2.3   2024-02-19 [1] RSPM (R 4.3.0)
+##  utf8          1.2.4   2023-10-22 [1] RSPM (R 4.3.0)
+##  vctrs         0.6.5   2023-12-01 [1] RSPM (R 4.3.0)
+##  webshot2      0.1.1   2023-08-11 [1] CRAN (R 4.3.2)
+##  websocket     1.4.2   2024-07-22 [1] CRAN (R 4.3.2)
+##  xfun          0.48    2024-10-03 [1] CRAN (R 4.3.2)
+##  xml2          1.3.6   2023-12-04 [1] RSPM (R 4.3.0)
+##  xtable        1.8-4   2019-04-21 [1] RSPM (R 4.3.0)
+##  yaml          2.3.8   2023-12-11 [1] RSPM (R 4.3.0)
+## 
+##  [1] /usr/local/lib/R/site-library
+##  [2] /usr/local/lib/R/library
 ## 
-## [1] /usr/local/lib/R/site-library
-## [2] /usr/local/lib/R/library</code></pre>
+## ──────────────────────────────────────────────────────────────────────────────</code></pre>
 <!-- Author information -->
 <!-- Links -->
 
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -778,7 +798,7 @@ <h1>About the Authors</h1>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/annotating-genomes.html b/docs/no_toc/annotating-genomes.html
index 57df8208..598c6cfc 100644
--- a/docs/no_toc/annotating-genomes.html
+++ b/docs/no_toc/annotating-genomes.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 8 Annotating Genomes | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 8 Annotating Genomes | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="microarray-data.html"/>
 <link rel="next" href="dna-methods-overview.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 8 Annotating Genomes | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,42 +535,42 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="annotating-genomes" class="section level1" number="8">
-<h1><span class="header-section-number">Chapter 8</span> Annotating Genomes</h1>
+<div id="annotating-genomes" class="section level1 hasAnchor" number="8">
+<h1><span class="header-section-number">Chapter 8</span> Annotating Genomes<a href="annotating-genomes.html#annotating-genomes" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-6" class="section level2" number="8.1">
-<h2><span class="header-section-number">8.1</span> Learning Objectives</h2>
-<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png" title="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" alt="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" width="100%" /></p>
+<div id="learning-objectives-6" class="section level2 hasAnchor" number="8.1">
+<h2><span class="header-section-number">8.1</span> Learning Objectives<a href="annotating-genomes.html#learning-objectives-6" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png" alt="The learning objectives for this chapter are to: Understand the fundamentals of annotating genomic data. Be aware of how reference genomes and their versions affect annotation. Be able to find genomic annotation from the respective databases" width="100%" /></p>
 <p>In this chapter, we are going to discuss methods that affect every genomic method and may take up the majority of your time as a genomic data analyst: Annotation.</p>
 <p>We know that the sequencing or array data is not useful on its own – for our human minds to comprehend it and apply it to something we need a tangible piece of information to be attached to it. This is where annotation comes in. At best annotation helps you and others interpret genomic data. At its worst, its a time consuming activity that, done incorrectly, can lead to erroneous conclusions and labeling.</p>
 <p>Proper annotation requires an understanding of how the annotation data you are using was derived as well as the realization that all annotation data is constantly changing and the confidence for these data are never 100%. Some organism’s genomes are better annotated than others but nearly all are at least somewhat incomplete.</p>
 </div>
-<div id="what-are-reference-genomes" class="section level2" number="8.2">
-<h2><span class="header-section-number">8.2</span> What are reference genomes?</h2>
+<div id="what-are-reference-genomes" class="section level2 hasAnchor" number="8.2">
+<h2><span class="header-section-number">8.2</span> What are reference genomes?<a href="annotating-genomes.html#what-are-reference-genomes" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Every individual organism has its own DNA sequence that is unique to it. So how can we compare organisms to each other? In some studies, sequencing data is obtained and the genome is built de novo (aka from scratch) but this takes a lot of time and computing power. So instead, most genomic studies use the imperfect method of comparing to a reference genome. Reference genomes are built from prior data and available online. They inherently have biases in them. For example, human genomes are generally not made from diverse populations but instead from mostly males of european descent. It is inherently bad for both ethical and scientific reasons to to have <a href="https://www.sciencenews.org/article/genetics-race-dna-databases-reference-genome-too-white">genome references that are too white</a>. For more on the problems with reference genomes, <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1774-4">read this</a>.</p>
-<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png" title="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" /></p>
+<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" /></p>
 <p>In summary, reference genomes are used for comparison and as a ‘source of truth’ of sorts, but its important to note that this method is biased and better alternatives need to be realized.</p>
 </div>
-<div id="what-are-genome-versions" class="section level2" number="8.3">
-<h2><span class="header-section-number">8.3</span> What are genome versions?</h2>
+<div id="what-are-genome-versions" class="section level2 hasAnchor" number="8.3">
+<h2><span class="header-section-number">8.3</span> What are genome versions?<a href="annotating-genomes.html#what-are-genome-versions" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>If you are familiar with software development, or have used any app before, you’re familiar with software updates and releases. Similarly, the genome has updates and releases as continued cloning and assemblies of organisms teaches us more. In the image below we are showing an example of what a genome version may be noted as (note that different databases may have different terminology – here we are showing the Genome Reference Consortium). You may also notice on their website it shows the date the genome version was released and what was fixed.</p>
-<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_33.png" title="Genome assemblies are changed and updated over time much like software packages. " alt="Genome assemblies are changed and updated over time much like software packages. " width="100%" /></p>
+<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_33.png" alt="Genome assemblies are changed and updated over time much like software packages. " width="100%" /></p>
 <p>The details of how genome versions are fixed and released are not really of concern for your data analysis. This is merely to explain that genomes change and what is most important in your analysis is that:</p>
 <ol style="list-style-type: decimal">
 <li>You choose one genome version and consistently use it in all your analyses.</li>
 <li>Choose a genome version that the rest of your field has generally had a consensus on and is also using. Generally this means sticking with major releases of a genome instead of always going with the latest version. Most databases will try to point you to their major release, so just stick with that. We will point you where you can find genome annotation for a lot of the major organisms.</li>
 </ol>
 </div>
-<div id="what-are-the-different-files" class="section level2" number="8.4">
-<h2><span class="header-section-number">8.4</span> What are the different files?</h2>
+<div id="what-are-the-different-files" class="section level2 hasAnchor" number="8.4">
+<h2><span class="header-section-number">8.4</span> What are the different files?<a href="annotating-genomes.html#what-are-the-different-files" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Although we can’t walk you through every organism and database set up, we will walkthrough the files and structure of one example here.</p>
-<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png" title="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" /></p>
+<p><img src="resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png" alt="Reference genomes are often used to make sense of genomic data through comparison. Here we are showing a screenshot of Ensembl's website which has many different organisms and file types" width="100%" /></p>
 <p>In the above screenshot, <a href="https://useast.ensembl.org/info/data/ftp/index.html">from Ensembl</a>, it shows different organisms in the rows, but also a variety of different files across the columns. In this example, DNA reference to the DNA sequence of the organism’s genome, but cDNA refers to complementary DNA – aka DNA that has been reversed transcribed from RNA. If you are working with RNA data you may want to use the cDNA file. Whereas CDS files are referring to only coding sequences and ncRNA files are showing only non coding sequences. Most of these files are FASTA files. Gene sets are also their own annotation files called GTF or GFF files. Ensembl provides more <a href="https://useast.ensembl.org/info/website/upload/gff.html">detailed information about what these files contain</a>, but briefly, each row is a feature and has information describing that feature such as genomic locations, the relevant feature type (gene, coding sequence, pseudogene, etc.), and the gene ID or name. For a reminder on what these different file types are <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/a-very-general-genomics-overview.html#basic-file-formats">see the previous chapter</a>.</p>
 <p>Depending on the tool you are using, the data file and type you need will vary. Some tools have these data built in or are compatible with other packages that have annotation. If a tool automatically includes annotation within it, you will need to ensure that any additional tools you are using are also pulling from the same genome and version. Look into a tool’s documentation to find out what genome versions it is based on. If it doesn’t tell you at all, you don’t want to be using that tool. You cannot assume that cross genome analyses will translate.</p>
-<div id="how-to-download-annotation-files" class="section level3" number="8.4.1">
-<h3><span class="header-section-number">8.4.1</span> How to download annotation files</h3>
+<div id="how-to-download-annotation-files" class="section level3 hasAnchor" number="8.4.1">
+<h3><span class="header-section-number">8.4.1</span> How to download annotation files<a href="annotating-genomes.html#how-to-download-annotation-files" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>For another database example we’ll look at the human data on ENA’s servers. Note that if you see FTP that just means “Fast Transfer Protocol” and it just means its where you can get the files themselves. For more on computing lingo, you can take our <a href="https://www.itcrtraining.org/courses#h.civy2cnri95t">Computing in Cancer Informatics course</a>.</p>
 <p><a href="https://ena-docs.readthedocs.io/en/latest/retrieval/file-download.html">There’s many ways you can download these files and they are described here</a>. In summary:
 - If you don’t feel comfortable using command line, <a href="https://www.ebi.ac.uk/ena/browser/home">you can use the browser downloader for ENA here</a>
@@ -584,10 +578,10 @@ <h3><span class="header-section-number">8.4.1</span> How to download annotation
 <p>Also note that if you are working from a high power computing cluster or other online server, these annotation files may already be available to you. You don’t want to take up more computing resources by downloading extra files, so check with an administrator or informatics expert who also uses the cluster or cloud to check if the annotation files already exist in your workspace.</p>
 </div>
 </div>
-<div id="considerations-for-annotating-genomic-data" class="section level2" number="8.5">
-<h2><span class="header-section-number">8.5</span> Considerations for annotating genomic data</h2>
-<div id="make-sure-you-have-the-right-file-to-start" class="section level3" number="8.5.1">
-<h3><span class="header-section-number">8.5.1</span> Make sure you have the right file to start!</h3>
+<div id="considerations-for-annotating-genomic-data" class="section level2 hasAnchor" number="8.5">
+<h2><span class="header-section-number">8.5</span> Considerations for annotating genomic data<a href="annotating-genomes.html#considerations-for-annotating-genomic-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="make-sure-you-have-the-right-file-to-start" class="section level3 hasAnchor" number="8.5.1">
+<h3><span class="header-section-number">8.5.1</span> Make sure you have the right file to start!<a href="annotating-genomes.html#make-sure-you-have-the-right-file-to-start" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ol style="list-style-type: decimal">
 <li>Is the annotation from the right organism?
 You may think this is a dumb question, but its very critical that you make sure you have the genome annotation for the organism that matches your data. Indeed the author of this has made this mistake in the past, so double check that you are using the correct organism.</li>
@@ -595,22 +589,22 @@ <h3><span class="header-section-number">8.5.1</span> Make sure you have the righ
 Genome versions are constantly being updated. Files from older genome versions cannot be used with newer ones (without some sort of liftover conversion). This also goes for transcriptome and genome data. All analysis need to be done using the same genomic versions so that is ensured that any chromosomal coordinates can translate between files. For example, it could be in one genome version a particular gene was said to be at chromosome base pairs 300 - 400, but in the next version its now been changed to 305 - 405. This can throw off an analysis if you are not careful. This type of annotation mapping becomes even more complicated when considering different splice variants or non-coding genes or regulatory regions that have even less confidence and annotation about them.</li>
 </ol>
 </div>
-<div id="be-consistent-in-your-annotations" class="section level3" number="8.5.2">
-<h3><span class="header-section-number">8.5.2</span> Be consistent in your annotations</h3>
+<div id="be-consistent-in-your-annotations" class="section level3 hasAnchor" number="8.5.2">
+<h3><span class="header-section-number">8.5.2</span> Be consistent in your annotations<a href="annotating-genomes.html#be-consistent-in-your-annotations" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ol style="list-style-type: decimal">
 <li>If at all possible avoid making cross species analyses - unless you are an evolutionary genomics expert and understand what you are doing. But for most applications cross species analyses are hopeful wishing at best, so stick to one organism.</li>
 <li>Avoid mixing genome/transcriptome versions. Yes there is liftover annotation data to help you identify what loci are parallel between releases, but its really much simpler to stick with the same version throughout your analyses’ annotations.</li>
 </ol>
 </div>
-<div id="be-clear-in-your-write-ups" class="section level3" number="8.5.3">
-<h3><span class="header-section-number">8.5.3</span> Be clear in your write ups!</h3>
+<div id="be-clear-in-your-write-ups" class="section level3 hasAnchor" number="8.5.3">
+<h3><span class="header-section-number">8.5.3</span> Be clear in your write ups!<a href="annotating-genomes.html#be-clear-in-your-write-ups" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Above all else, not matter what you end up doing, make sure that your steps, what files you use, and what tool versions you use are clear and reproducible! Be sure to clearly link to and state the database files you used and include your code and steps so others can track what you did and reproduce it. For more information on how to create reproducible analyses, you can take our reproducibility in cancer informatics courses: <a href="https://www.itcrtraining.org/courses#h.n5yoq68qj0rz">Introduction to Reproducibility</a> and <a href="https://www.itcrtraining.org/courses#h.i5zyiyjyttr4">Advanced Reproducibility in Cancer Informatics</a>.</p>
 </div>
 </div>
-<div id="resources-you-will-need-for-annotation" class="section level2" number="8.6">
-<h2><span class="header-section-number">8.6</span> Resources you will need for annotation!</h2>
-<div id="annotation-databases" class="section level3" number="8.6.1">
-<h3><span class="header-section-number">8.6.1</span> Annotation databases</h3>
+<div id="resources-you-will-need-for-annotation" class="section level2 hasAnchor" number="8.6">
+<h2><span class="header-section-number">8.6</span> Resources you will need for annotation!<a href="annotating-genomes.html#resources-you-will-need-for-annotation" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="annotation-databases" class="section level3 hasAnchor" number="8.6.1">
+<h3><span class="header-section-number">8.6.1</span> Annotation databases<a href="annotating-genomes.html#annotation-databases" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://useast.ensembl.org/downloads.html">Ensembl</a></li>
 <li><a href="https://www.ebi.ac.uk/services">EMBL-EBI</a></li>
@@ -618,18 +612,18 @@ <h3><span class="header-section-number">8.6.1</span> Annotation databases</h3>
 <li><a href="https://ftp.ncbi.nlm.nih.gov/genomes/">NCBI Genomes download page</a></li>
 </ul>
 </div>
-<div id="gui-based-annotation-tools" class="section level3" number="8.6.2">
-<h3><span class="header-section-number">8.6.2</span> GUI based annotation tools</h3>
+<div id="gui-based-annotation-tools" class="section level3 hasAnchor" number="8.6.2">
+<h3><span class="header-section-number">8.6.2</span> GUI based annotation tools<a href="annotating-genomes.html#gui-based-annotation-tools" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://genome.ucsc.edu/cgi-bin/hgGateway?hgsid=1571980135_Ym6A5aa3nDyOfZKtGishprdrhLDm">UCSCGenomeBrowser</a></li>
 <li><a href="https://software.broadinstitute.org/software/igv/">BROAD’s IGV</a></li>
 <li><a href="https://useast.ensembl.org/info/data/biomart/index.html">Ensembl’s biomart</a></li>
 </ul>
 </div>
-<div id="command-line-based-tools" class="section level3" number="8.6.3">
-<h3><span class="header-section-number">8.6.3</span> Command line based tools</h3>
-<div id="r-based-packages" class="section level4" number="8.6.3.1">
-<h4><span class="header-section-number">8.6.3.1</span> R-based packages:</h4>
+<div id="command-line-based-tools" class="section level3 hasAnchor" number="8.6.3">
+<h3><span class="header-section-number">8.6.3</span> Command line based tools<a href="annotating-genomes.html#command-line-based-tools" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="r-based-packages" class="section level4 hasAnchor" number="8.6.3.1">
+<h4><span class="header-section-number">8.6.3.1</span> R-based packages:<a href="annotating-genomes.html#r-based-packages" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><p><a href="https://bioconductor.org/packages/release/bioc/html/annotatr.html">annotatr</a></p></li>
 <li><p><a href="https://bioconductor.org/packages/release/bioc/html/ensembldb.html">ensembldb</a></p></li>
@@ -640,16 +634,16 @@ <h4><span class="header-section-number">8.6.3.1</span> R-based packages:</h4>
 <li><p><a href="https://bioconductor.org/packages/3.16/data/annotation/">A full list of Bioconductors annotation packages</a> - contains annotation for all kinds of species and versions of genomes and transcriptomes.</p></li>
 </ul>
 </div>
-<div id="python-based-packages" class="section level4" number="8.6.3.2">
-<h4><span class="header-section-number">8.6.3.2</span> Python-based packages:</h4>
+<div id="python-based-packages" class="section level4 hasAnchor" number="8.6.3.2">
+<h4><span class="header-section-number">8.6.3.2</span> Python-based packages:<a href="annotating-genomes.html#python-based-packages" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://biopython.org/">BioPython</a></li>
 <li><a href="https://github.com/ialbert/chipexo">genetrack</a></li>
 </ul>
 </div>
 </div>
-<div id="more-resources-about-genome-annotation" class="section level3" number="8.6.4">
-<h3><span class="header-section-number">8.6.4</span> More resources about genome annotation</h3>
+<div id="more-resources-about-genome-annotation" class="section level3 hasAnchor" number="8.6.4">
+<h3><span class="header-section-number">8.6.4</span> More resources about genome annotation<a href="annotating-genomes.html#more-resources-about-genome-annotation" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 
 </div>
 </div>
@@ -658,10 +652,17 @@ <h3><span class="header-section-number">8.6.4</span> More resources about genome
 
 
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -731,7 +732,7 @@ <h3><span class="header-section-number">8.6.4</span> More resources about genome
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/atac-seq-1.html b/docs/no_toc/atac-seq-1.html
index abf77acf..74ed695c 100644
--- a/docs/no_toc/atac-seq-1.html
+++ b/docs/no_toc/atac-seq-1.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 16 ATAC-Seq | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 16 ATAC-Seq | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="chromatin-methods-overview.html"/>
 <link rel="next" href="single-cell-atac-seq-1.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 16 ATAC-Seq | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,121 +535,121 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="atac-seq-1" class="section level1" number="16">
-<h1><span class="header-section-number">Chapter 16</span> ATAC-Seq</h1>
+<div id="atac-seq-1" class="section level1 hasAnchor" number="16">
+<h1><span class="header-section-number">Chapter 16</span> ATAC-Seq<a href="atac-seq-1.html#atac-seq-1" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is incomplete! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-14" class="section level2" number="16.1">
-<h2><span class="header-section-number">16.1</span> Learning Objectives</h2>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" width="100%" /></p>
+<div id="learning-objectives-14" class="section level2 hasAnchor" number="16.1">
+<h2><span class="header-section-number">16.1</span> Learning Objectives<a href="atac-seq-1.html#learning-objectives-14" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ATAC-Seq data collection and processing workflow. Identify the next steps for your particular ATAC-Seq data. Formulate questions to ask about your ATAC-Seq data" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-atac-seq-analysis" class="section level2" number="16.2">
-<h2><span class="header-section-number">16.2</span> What are the goals of ATAC-Seq analysis?</h2>
+<div id="what-are-the-goals-of-atac-seq-analysis" class="section level2 hasAnchor" number="16.2">
+<h2><span class="header-section-number">16.2</span> What are the goals of ATAC-Seq analysis?<a href="atac-seq-1.html#what-are-the-goals-of-atac-seq-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The goals of ATAC-seq are to identify the accessible regions of the genome in a particular set of samples. These data allow us to understand the relationships between the chromatin accessibility patterns and cell states, and to understand the mechanistic causes and consequences of these chromatin accessibility patterns.</p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png" title="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " alt="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " width="100%" /></p>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png" alt="What does accessibility to chromatin represent? In ATAC-seq we are able to sequence open chromatin and find out DNA sequences where chromatin is accessible for activity. " width="100%" /></p>
 <p>ATAC-seq data is generated by fragmenting the genome with the Tn5 endonuclease and sequencing the shorter DNA fragments. While most of the genome is associated with protein complexes that preclude the digestion of DNA by Tn5, some regions of the genome have accessible chromatin that can be cleaved by Tn5 resulting in short (&lt;500bp) fragments. These regions of the genome are of biological interest as they are likely to harbor transcription factor binding sites and to constitute cis-regulatory elements, genomic regions that are involved in the regulation of gene expression.</p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" title="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" alt="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" width="100%" /></p>
-<div id="what-questions-can-be-answered-with-atac-seq" class="section level3" number="16.2.1">
-<h3><span class="header-section-number">16.2.1</span> What questions can be answered with ATAC-seq?</h3>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png" title="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" alt="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" width="100%" /></p>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" alt="Schematic of how Tn5 fragments open chromatin + inserts adapters. This step is important for the quick protocol and low required cell inputs of ATAC-seq" width="100%" /></p>
+<div id="what-questions-can-be-answered-with-atac-seq" class="section level3 hasAnchor" number="16.2.1">
+<h3><span class="header-section-number">16.2.1</span> What questions can be answered with ATAC-seq?<a href="atac-seq-1.html#what-questions-can-be-answered-with-atac-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png" alt="What types of questions can we ask with ATAC-seq?What regions of the genome have accessible chromatin? How does accessibility differ between biological samples or change over time? What transcription factor motifs or transcription factor footprints can be found at accessible regions of interest?" width="100%" /></p>
 </div>
 </div>
-<div id="atac-seq-general-workflow-overview" class="section level2" number="16.3">
-<h2><span class="header-section-number">16.3</span> ATAC-Seq general workflow overview</h2>
+<div id="atac-seq-general-workflow-overview" class="section level2 hasAnchor" number="16.3">
+<h2><span class="header-section-number">16.3</span> ATAC-Seq general workflow overview<a href="atac-seq-1.html#atac-seq-general-workflow-overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means.</p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" title="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." alt="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." width="100%" /></p>
-<div id="data-quality-metrics" class="section level3" number="16.3.1">
-<h3><span class="header-section-number">16.3.1</span> Data quality metrics:</h3>
-<div id="pre-sequencing-qc" class="section level4" number="16.3.1.1">
-<h4><span class="header-section-number">16.3.1.1</span> Pre-sequencing QC:</h4>
-</div>
-<div id="sequencing-considerations" class="section level4" number="16.3.1.2">
-<h4><span class="header-section-number">16.3.1.2</span> Sequencing considerations:</h4>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png" title="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" alt="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" width="100%" /></p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png" title="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" alt="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" width="100%" /></p>
-</div>
-<div id="pre-alignment-qc" class="section level4" number="16.3.1.3">
-<h4><span class="header-section-number">16.3.1.3</span> Pre-alignment QC:</h4>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" title="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" /></p>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png" alt="A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means." width="100%" /></p>
+<div id="data-quality-metrics" class="section level3 hasAnchor" number="16.3.1">
+<h3><span class="header-section-number">16.3.1</span> Data quality metrics:<a href="atac-seq-1.html#data-quality-metrics" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="pre-sequencing-qc" class="section level4 hasAnchor" number="16.3.1.1">
+<h4><span class="header-section-number">16.3.1.1</span> Pre-sequencing QC:<a href="atac-seq-1.html#pre-sequencing-qc" class="anchor-section" aria-label="Anchor link to header"></a></h4>
+</div>
+<div id="sequencing-considerations" class="section level4 hasAnchor" number="16.3.1.2">
+<h4><span class="header-section-number">16.3.1.2</span> Sequencing considerations:<a href="atac-seq-1.html#sequencing-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h4>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png" alt="Single end sequencing. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting" width="100%" /></p>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png" alt="Single vs. paired end sequencing. Single. Cheaper. OK for most standard applications. Paired-end sequencing. More expensive. Useful for looking at nucleosome positioning and transcription factor footprinting. Read length &amp; read depth. 75bp or more read length (keep in mind nucleosomes are 147bp). ~50 million reads/sample usually recommended" width="100%" /></p>
+</div>
+<div id="pre-alignment-qc" class="section level4 hasAnchor" number="16.3.1.3">
+<h4><span class="header-section-number">16.3.1.3</span> Pre-alignment QC:<a href="atac-seq-1.html#pre-alignment-qc" class="anchor-section" aria-label="Anchor link to header"></a></h4>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" /></p>
 <p>A tool like FastQC or similar should be used to check for GC content, read quality and length, and primer or adapter reads prior to alignment. Trimmomatic is a useful tool for removing primer and adapter sequences if they are present. ATAC-seq experiments should be sequenced with paired-end sequencing, and existing pipelines will expect paired-end. (2 files *_R1.fastq and *_R2.fastq)</p>
 <ul>
 <li>Use fasterq-dump to download files from NCBI Sequence Read Archive - this tool will automatically split the reads in multiple files</li>
 </ul>
 </div>
-<div id="number-of-mapped-reads" class="section level4" number="16.3.1.4">
-<h4><span class="header-section-number">16.3.1.4</span> Number of mapped reads</h4>
+<div id="number-of-mapped-reads" class="section level4 hasAnchor" number="16.3.1.4">
+<h4><span class="header-section-number">16.3.1.4</span> Number of mapped reads<a href="atac-seq-1.html#number-of-mapped-reads" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html">general sequencing technologies in our previous chapter here</a>. For experiments on human samples this number should be greater than 20 million mapped unique reads.</p>
 <p><a href="https://bowtie-bio.sourceforge.net/index.shtml">Bowtie2</a> is commonly used for mapping fragments to the genome.</p>
 <p>As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html">general sequencing technologies in our previous chapter here</a>. For experiments on human samples this number should be greater than 20 million mapped unique reads.</p>
 </div>
-<div id="post-alignment-qc" class="section level4" number="16.3.1.5">
-<h4><span class="header-section-number">16.3.1.5</span> Post-alignment QC:</h4>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" title="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" /></p>
+<div id="post-alignment-qc" class="section level4 hasAnchor" number="16.3.1.5">
+<h4><span class="header-section-number">16.3.1.5</span> Post-alignment QC:<a href="atac-seq-1.html#post-alignment-qc" class="anchor-section" aria-label="Anchor link to header"></a></h4>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png" alt="Post-sequencing. Signal to noise ratio (link resources at end) Comparison with DNase hypersensitivity datasets (or other computational QC method- check current resources available)" width="100%" /></p>
 <p>Post alignment: check percent of matched, unmatched, unpaired and duplicated reads. Reads which are duplicated or unmatched should be filtered out.
 <a href="https://broadinstitute.github.io/picard/">Picard</a> is a useful tool for this step.
 Reads on the + strand should be shifted +4bp, reads on the - strand should be shifted -5 bp.</p>
 </div>
-<div id="fragment-size-distribution" class="section level4" number="16.3.1.6">
-<h4><span class="header-section-number">16.3.1.6</span> Fragment size distribution:</h4>
+<div id="fragment-size-distribution" class="section level4 hasAnchor" number="16.3.1.6">
+<h4><span class="header-section-number">16.3.1.6</span> Fragment size distribution:<a href="atac-seq-1.html#fragment-size-distribution" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>ATAC-seq data is often generated using paired end sequencing technologies, which allow for characterization of ATAC-seq fragments. Histograms of these distributions using single base pair resolution bins reveal patterns of enrichment relative to the nucleosome scale of 147bp and the DNA-helix scale ~10.5bp.</p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png" title="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" alt="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" width="100%" /></p>
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png" alt="Considerations for quality data: QC checkpoints. Pre-sequencing. Library distribution" width="100%" /></p>
 <p>When comparing ATAC-seq samples, it is important to consider the fragment size distributions of the samples being compared. Differences in the distributions could lead to results that are unrelated to biology.</p>
 </div>
-<div id="peak-calling" class="section level4" number="16.3.1.7">
-<h4><span class="header-section-number">16.3.1.7</span> Peak calling:</h4>
+<div id="peak-calling" class="section level4 hasAnchor" number="16.3.1.7">
+<h4><span class="header-section-number">16.3.1.7</span> Peak calling:<a href="atac-seq-1.html#peak-calling" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>ATAC-seq peak calling typically makes use of analysis tools developed for ChIP-seq. MACS2 is one of the most common choices for a peak calling tool, but HOMER or other common ChIP-seq peak callers are also acceptable.
 An input sample is not typically generated for ATAC-seq as it would be for a ChIP-seq experiment, so the major requirement for the peak caller is that it does not require the input control to call peaks.</p>
-<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png" title="Overview of ATAC-seq data analysis pipeline" alt="Overview of ATAC-seq data analysis pipeline" width="100%" />
+<p><img src="resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png" alt="Overview of ATAC-seq data analysis pipeline" width="100%" />
 #### Number of peaks:</p>
 <p>Although the number of accessible chromatin regions can vary from one cell type to another, there are several regions that appear to be constitutively accessible across most cell types. At least 20,000 peaks can be identified in a high quality experiment. The deeper the sequencing the more peaks will be detected in an ATAC-seq experiments. At a very high sequencing depth some of the statistically significant peaks might not be of biological interest. In an analysis of such data sets the fold enrichment relative to background, or absolute peak signal, in addition to statistical significance, ought to be taken into account.</p>
 </div>
-<div id="frip-score-fraction-of-reads-in-peaks" class="section level4" number="16.3.1.8">
-<h4><span class="header-section-number">16.3.1.8</span> FRiP score (fraction of reads in peaks)</h4>
+<div id="frip-score-fraction-of-reads-in-peaks" class="section level4 hasAnchor" number="16.3.1.8">
+<h4><span class="header-section-number">16.3.1.8</span> FRiP score (fraction of reads in peaks)<a href="atac-seq-1.html#frip-score-fraction-of-reads-in-peaks" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>In high quality ATAC-seq data a large fraction of reads overlap with peaks, while in low quality data there is a high level of fragments that map to background regions. Ideally, the FRiP score is greater than 0.3 (30 percent or more of reads overlap with peaks), with a score below 0.2 indicating low-quality data
 <Slide></p>
 </div>
-<div id="overlap-with-other-chromatin-accessibility-data" class="section level4" number="16.3.1.9">
-<h4><span class="header-section-number">16.3.1.9</span> Overlap with other chromatin accessibility data</h4>
+<div id="overlap-with-other-chromatin-accessibility-data" class="section level4 hasAnchor" number="16.3.1.9">
+<h4><span class="header-section-number">16.3.1.9</span> Overlap with other chromatin accessibility data<a href="atac-seq-1.html#overlap-with-other-chromatin-accessibility-data" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Thousands of ATAC-seq samples have been produced in human and mouse. High quality ATAC-seq data will share a substantial proportion of peaks with many of these datasets. Publicly available ATAC-seq data can be found and comparisons made at the Cistrome Data Browser [<a href="http://cistrome.org/db/" class="uri">http://cistrome.org/db/</a>].</p>
 </div>
-<div id="overlap-with-promoters" class="section level4" number="16.3.1.10">
-<h4><span class="header-section-number">16.3.1.10</span> Overlap with promoters</h4>
+<div id="overlap-with-promoters" class="section level4 hasAnchor" number="16.3.1.10">
+<h4><span class="header-section-number">16.3.1.10</span> Overlap with promoters<a href="atac-seq-1.html#overlap-with-promoters" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>The promoter regions of many genes are constitutively accessible. Examining peak overlap with regions close to known protein coding gene transcription start sites can be used as a check for data quality.</p>
 </div>
 </div>
-<div id="information-from-atac-seq-analysis" class="section level3" number="16.3.2">
-<h3><span class="header-section-number">16.3.2</span> Information from ATAC-seq analysis:</h3>
-<div id="major-approaches" class="section level4" number="16.3.2.1">
-<h4><span class="header-section-number">16.3.2.1</span> Major approaches:</h4>
+<div id="information-from-atac-seq-analysis" class="section level3 hasAnchor" number="16.3.2">
+<h3><span class="header-section-number">16.3.2</span> Information from ATAC-seq analysis:<a href="atac-seq-1.html#information-from-atac-seq-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="major-approaches" class="section level4 hasAnchor" number="16.3.2.1">
+<h4><span class="header-section-number">16.3.2.1</span> Major approaches:<a href="atac-seq-1.html#major-approaches" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>Compare changes in transcription factor motif enrichment in accessible regions between samples</li>
 <li>Compare changes in accessibility of regions (differential accessibility) between samples</li>
 <li>Footprinting - identify regions where insertion is below expected level</li>
 </ul>
 </div>
-<div id="differential-accessibility-analysis" class="section level4" number="16.3.2.2">
-<h4><span class="header-section-number">16.3.2.2</span> Differential accessibility analysis:</h4>
+<div id="differential-accessibility-analysis" class="section level4 hasAnchor" number="16.3.2.2">
+<h4><span class="header-section-number">16.3.2.2</span> Differential accessibility analysis:<a href="atac-seq-1.html#differential-accessibility-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 Differential accessibility analysis typically uses packages for RNA-seq differential expression analysis such as DEseq2, edgeR, or limma.
 <details here>
 <p>All three are available as R packages and can be installed using Bioconductor, a bioinformatics package manager for R. Unfortunately, there are no well-established packages for this analysis in other languages such as Python. Differential accessibility analysis is an approach with high potential, but care must be taken in processing and normalizing the data for accurate results.</p>
 </div>
-<div id="motif-analysis" class="section level4" number="16.3.2.3">
-<h4><span class="header-section-number">16.3.2.3</span> Motif analysis:</h4>
+<div id="motif-analysis" class="section level4 hasAnchor" number="16.3.2.3">
+<h4><span class="header-section-number">16.3.2.3</span> Motif analysis:<a href="atac-seq-1.html#motif-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Motif analysis in ATAC-seq is more complex than for ChIP-seq because a larger set of TFs are responsible for the emergence of chromatin accessible regions than for the binding sites of a particular TF. Nevertheless, in the analysis of differential ATAC-seq peaks motif analysis can be used to reveal the TFs related to differences between conditions. This type of analysis is most likely to be successful when the ATAC-seq between closely related conditions or cell types is being compared.</p>
 <p><a href="https://meme-suite.org/meme/">The MEME suite</a> has a variety of tools for motif analysis available in both web and command-line versions.</p>
 </div>
-<div id="motif-scanning" class="section level4" number="16.3.2.4">
-<h4><span class="header-section-number">16.3.2.4</span> Motif Scanning</h4>
+<div id="motif-scanning" class="section level4 hasAnchor" number="16.3.2.4">
+<h4><span class="header-section-number">16.3.2.4</span> Motif Scanning<a href="atac-seq-1.html#motif-scanning" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Motif scanning is an analysis technique which identifies putative transcription factor binding sites (TFBS) which sufficiently match a given TF motif’s position-weight matrix. PWMscan is a straightforward online tool, but not the best option for high throughput. FIMO is an alternative which can be used either on the web or the command line. This approach will identify all sites within the genome which are likely to bind a single transcription factor.</p>
 </div>
-<div id="motif-discovery" class="section level4" number="16.3.2.5">
-<h4><span class="header-section-number">16.3.2.5</span> Motif discovery:</h4>
+<div id="motif-discovery" class="section level4 hasAnchor" number="16.3.2.5">
+<h4><span class="header-section-number">16.3.2.5</span> Motif discovery:<a href="atac-seq-1.html#motif-discovery" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Homer or MEME. These tools identify overrepresented sequences within the accessible peaks, regardless of whether they match a previously defined motif.
 Once the ATAC-seq peaks are determined, the next step is to search for enriched DNA sequence motifs within these regions. This is accomplished by using motif discovery algorithms such as MEME Suite, HOMER, or DREME. These tools scan the ATAC-seq peaks for overrepresented sequence patterns, which may correspond to binding sites for specific transcription factors or other regulatory elements. The motifs discovered can be compared against existing motif databases, such as JASPAR or TRANSFAC, to annotate the potential transcription factor binding sites.</p>
 </div>
-<div id="motif-enrichment" class="section level4" number="16.3.2.6">
-<h4><span class="header-section-number">16.3.2.6</span> Motif Enrichment:</h4>
+<div id="motif-enrichment" class="section level4 hasAnchor" number="16.3.2.6">
+<h4><span class="header-section-number">16.3.2.6</span> Motif Enrichment:<a href="atac-seq-1.html#motif-enrichment" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>These motif enrichment tools will scan through and identify matches to known motif sequences within accessible sites, and additionally will quantify whether the motif is significantly enriched compared to a control sample (input, uncommon with ATAC-seq) or a shuffled sequence to mimic background.</p>
 <p>After identifying the enriched motifs, researchers can perform motif enrichment analysis to determine the significance of these motifs in the ATAC-seq peaks. This is often done using statistical tools like Fisher’s exact test or hypergeometric test, which assess the enrichment of specific motifs compared to their background occurrence in the genome. Additionally, tools like GREAT or HOMER can be employed to perform gene ontology analysis and assess the functional relevance of the identified motifs in biological processes and pathways.</p>
 <p>Overall, ATAC-seq motif enrichment analysis provides researchers with valuable insights into the regulatory landscape of the genome. By identifying enriched motifs within accessible chromatin regions, researchers can gain a deeper understanding of the transcriptional regulatory networks and potentially uncover novel transcription factors involved in specific biological processes or diseases. This analysis serves as a powerful tool for unraveling the intricacies of gene regulation and can pave the way for further investigations in functional genomics and therapeutic development.
@@ -663,8 +657,8 @@ <h4><span class="header-section-number">16.3.2.6</span> Motif Enrichment:</h4>
 </div>
 </div>
 </div>
-<div id="atac-seq-data-strengths" class="section level2" number="16.4">
-<h2><span class="header-section-number">16.4</span> ATAC-Seq data <strong>strengths</strong>:</h2>
+<div id="atac-seq-data-strengths" class="section level2 hasAnchor" number="16.4">
+<h2><span class="header-section-number">16.4</span> ATAC-Seq data <strong>strengths</strong>:<a href="atac-seq-1.html#atac-seq-data-strengths" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>The ATAC-seq is easy to adopt and has been used by many laboratories to generate high quality data for characterizing accessible chromatin in cell lines or sorted cells derived from tissues.</li>
 <li>In principle, ATAC-seq can identify a large proportion of cis-regulatory elements.</li>
@@ -672,8 +666,8 @@ <h2><span class="header-section-number">16.4</span> ATAC-Seq data <strong>streng
 <li>In comparison with histone modification ChIP-seq, ATAC-seq provides a higher resolution assessment of the cis-regulatory genomic regions. Histone modification ChIP-seq, in contrast, tends to be localized on nucleosomes flanking the site of interest and can spread to nucleosomes beyond the immediate flanking ones.</li>
 </ul>
 </div>
-<div id="atac-seq-data-limitations" class="section level2" number="16.5">
-<h2><span class="header-section-number">16.5</span> ATAC-Seq data <strong>limitations</strong>:</h2>
+<div id="atac-seq-data-limitations" class="section level2 hasAnchor" number="16.5">
+<h2><span class="header-section-number">16.5</span> ATAC-Seq data <strong>limitations</strong>:<a href="atac-seq-1.html#atac-seq-data-limitations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>ATAC-seq does not precisely identify the transcription factors or other chromatin associated factors that bind in or around chromatin accessible regions. This type of information needs to be inferred through analysis of transcription factor binding motif analysis or ChIP-seq data.</li>
 <li>Whereas ATAC-seq indicates the presence of a putative cis-regulatory element, H3K27ac ChIP-seq is able to separate accessible regions from those that are accessible and active.</li>
@@ -682,81 +676,81 @@ <h2><span class="header-section-number">16.5</span> ATAC-Seq data <strong>limita
 <li>ATAC-seq data can be biased, and affected by batch effects like any other genomics data type. When comparing ATAC-seq data good experimental design principles like the inclusion of biological replicates and consideration of controls, are needed for a meaningful outcome. .</li>
 </ul>
 </div>
-<div id="atac-seq-data-considerations" class="section level2" number="16.6">
-<h2><span class="header-section-number">16.6</span> ATAC-Seq data considerations</h2>
+<div id="atac-seq-data-considerations" class="section level2 hasAnchor" number="16.6">
+<h2><span class="header-section-number">16.6</span> ATAC-Seq data considerations<a href="atac-seq-1.html#atac-seq-data-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The nucleosome is the fundamental unit of chromatin packaging in the genome and nucleosomal DNA is far less likely to be cleaved by the Tn5 nuclease than linker DNA. When DNA is fragmented by Tn5 the positions of the endpoints relative to the nucleosomes is an important consideration. When the ends are less than 147bp apart it is likely that both ends originate from the same linker region. Longer fragments can result from cuts on opposite sides of the same nucleosome, or even opposite sides of a genomic interval that encompasses multiple nucleosomes. The short fragments are therefore most likely to be nucleosome free and provide stronger evidence for transcription factor binding sites.</p>
 <p>As will other genomics protocols, ATAC-seq data is subject to biases introduced in the ATAC-seq protocol and in the sequencing itself. Comparison of ATAC-seq data generated in different batches, by different laboratories or using different protocols might not be directly comparable. In addition, the Tn5 endonuclease does have biases in the precise DNA sequences it can cut. This should be taken into consideration when carrying out base pair resolution analyses including footprinting analysis and analysis of the effects of sequence variants on chromatin accessibility.</p>
 <p>Read depth will impact ATAC-seq signal, but enzyme strength and conditions can also alter the distribution of cuts.</p>
 <p>When using ATAC-seq data to answer biological questions it is important to understand what types of bias could impact the results. To ensure valid results the analysis needs to use appropriate statistical methods, ensure enough high quality ATAC-seq data is available, including controls, and possibly reframing the questions.</p>
 </div>
-<div id="atac-seq-analysis-tools" class="section level2" number="16.7">
-<h2><span class="header-section-number">16.7</span> ATAC-seq analysis tools</h2>
+<div id="atac-seq-analysis-tools" class="section level2 hasAnchor" number="16.7">
+<h2><span class="header-section-number">16.7</span> ATAC-seq analysis tools<a href="atac-seq-1.html#atac-seq-analysis-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <div class="warning">
 <p>This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment.</p>
 </div>
 <ul>
-<li><a href="https://github.com/macs3-project/MACS">MACS2</a><span class="citation">(<a href="#ref-zhang2008model" role="doc-biblioref">Y. Zhang et al. 2008</a>)</span>:
+<li><a href="https://github.com/macs3-project/MACS">MACS2</a><span class="citation">(<a href="#ref-zhang2008model">Y. Zhang et al. 2008</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples.</li>
 <li><strong>Cons</strong>: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases.</li>
 </ul></li>
-<li><a href="http://homer.ucsd.edu/homer/introduction/programs.html">HOMER</a><span class="citation">(<a href="#ref-heinz2010simple" role="doc-biblioref">Heinz et al. 2010</a>)</span>:
+<li><a href="http://homer.ucsd.edu/homer/introduction/programs.html">HOMER</a><span class="citation">(<a href="#ref-heinz2010simple">Heinz et al. 2010</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data.</li>
 <li><strong>Cons</strong>: may not be as accurate as other peak-calling tools in some cases.</li>
 </ul></li>
-<li><a href="https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html">ATACseqQC</a><span class="citation">(<a href="#ref-schep2017chromvar" role="doc-biblioref">Schep et al. 2017</a>)</span>:
+<li><a href="https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html">ATACseqQC</a><span class="citation">(<a href="#ref-schep2017chromvar">Schep et al. 2017</a>)</span>:
 <ul>
 <li><strong>Pros:</strong> provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity.</li>
 <li><strong>Cons</strong>: does not perform peak-calling or downstream analysis.</li>
 </ul></li>
-<li><a href="https://deeptools.readthedocs.io/en/develop/">deeptools</a><span class="citation">(<a href="#ref-ramirez2016deeptools2" role="doc-biblioref">Ramı́rez et al. 2016</a>)</span>:
+<li><a href="https://deeptools.readthedocs.io/en/develop/">deeptools</a><span class="citation">(<a href="#ref-ramirez2016deeptools2">Ramı́rez et al. 2016</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility.</li>
 <li><strong>Cons</strong>: may require some programming skills to use effectively.</li>
 </ul></li>
-<li><a href="https://reggenlab.github.io/DFilter/tutorial.html">DFilter</a> <span class="citation">(<a href="#ref-ghavi2019highly" role="doc-biblioref">Ghavi-Helm et al. 2019</a>)</span>:
+<li><a href="https://reggenlab.github.io/DFilter/tutorial.html">DFilter</a> <span class="citation">(<a href="#ref-ghavi2019highly">Ghavi-Helm et al. 2019</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases.</li>
 <li><strong>Cons</strong>: may require more computational resources than other tools.</li>
 </ul></li>
 </ul>
 </div>
-<div id="additional-tutorials-and-tools" class="section level2" number="16.8">
-<h2><span class="header-section-number">16.8</span> Additional tutorials and tools</h2>
+<div id="additional-tutorials-and-tools" class="section level2 hasAnchor" number="16.8">
+<h2><span class="header-section-number">16.8</span> Additional tutorials and tools<a href="atac-seq-1.html#additional-tutorials-and-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <div class="warning">
 <p>This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment.</p>
 </div>
 <ul>
-<li><a href="https://github.com/macs3-project/MACS">MACS2</a><span class="citation">(<a href="#ref-zhang2008model" role="doc-biblioref">Y. Zhang et al. 2008</a>)</span>:
+<li><a href="https://github.com/macs3-project/MACS">MACS2</a><span class="citation">(<a href="#ref-zhang2008model">Y. Zhang et al. 2008</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples.</li>
 <li><strong>Cons</strong>: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases.</li>
 </ul></li>
-<li><a href="http://homer.ucsd.edu/homer/introduction/programs.html">HOMER</a><span class="citation">(<a href="#ref-heinz2010simple" role="doc-biblioref">Heinz et al. 2010</a>)</span>:
+<li><a href="http://homer.ucsd.edu/homer/introduction/programs.html">HOMER</a><span class="citation">(<a href="#ref-heinz2010simple">Heinz et al. 2010</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data.</li>
 <li><strong>Cons</strong>: may not be as accurate as other peak-calling tools in some cases.</li>
 </ul></li>
-<li><a href="https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html">ATACseqQC</a><span class="citation">(<a href="#ref-schep2017chromvar" role="doc-biblioref">Schep et al. 2017</a>)</span>:
+<li><a href="https://bioconductor.org/packages/release/bioc/html/ATACseqQC.html">ATACseqQC</a><span class="citation">(<a href="#ref-schep2017chromvar">Schep et al. 2017</a>)</span>:
 <ul>
 <li><strong>Pros:</strong> provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity.</li>
 <li><strong>Cons</strong>: does not perform peak-calling or downstream analysis.</li>
 </ul></li>
-<li><a href="https://deeptools.readthedocs.io/en/develop/">deeptools</a><span class="citation">(<a href="#ref-ramirez2016deeptools2" role="doc-biblioref">Ramı́rez et al. 2016</a>)</span>:
+<li><a href="https://deeptools.readthedocs.io/en/develop/">deeptools</a><span class="citation">(<a href="#ref-ramirez2016deeptools2">Ramı́rez et al. 2016</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility.</li>
 <li><strong>Cons</strong>: may require some programming skills to use effectively.</li>
 </ul></li>
-<li><a href="https://reggenlab.github.io/DFilter/tutorial.html">DFilter</a> <span class="citation">(<a href="#ref-ghavi2019highly" role="doc-biblioref">Ghavi-Helm et al. 2019</a>)</span>:
+<li><a href="https://reggenlab.github.io/DFilter/tutorial.html">DFilter</a> <span class="citation">(<a href="#ref-ghavi2019highly">Ghavi-Helm et al. 2019</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases.</li>
 <li><strong>Cons</strong>: may require more computational resources than other tools.</li>
 </ul></li>
 </ul>
 </div>
-<div id="additional-tutorials-and-tools-1" class="section level2" number="16.9">
-<h2><span class="header-section-number">16.9</span> Additional tutorials and tools</h2>
+<div id="additional-tutorials-and-tools-1" class="section level2 hasAnchor" number="16.9">
+<h2><span class="header-section-number">16.9</span> Additional tutorials and tools<a href="atac-seq-1.html#additional-tutorials-and-tools-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://training.galaxyproject.org/training-material/topics/epigenetics/tutorials/atac-seq/tutorial.html">A Galaxy based tutorial for ATAC-seq</a> - Galaxy is a good recommendation for those new to informatics who would like a cloud-based GUI option to use for the analysis of their data.</li>
 <li><a href="https://macs3-project.github.io/MACS/">MACS - Model-based analysis for ChIP-Seq</a> - A command line tool for the identification of transcription factor binding sites. Can be used with ChIP-seq or ATAC-seq.</li>
@@ -765,16 +759,16 @@ <h2><span class="header-section-number">16.9</span> Additional tutorials and too
 <li><a href="https://github.com/zang-lab/SELMA">SELMA - Simplex Encoded Linear Model for Accessible Chromatin</a> - SELMA is a python based tool for the assessment of biases in Chromatin based data.</li>
 </ul>
 </div>
-<div id="online-visualization-tools" class="section level2" number="16.10">
-<h2><span class="header-section-number">16.10</span> Online Visualization tools</h2>
+<div id="online-visualization-tools" class="section level2 hasAnchor" number="16.10">
+<h2><span class="header-section-number">16.10</span> Online Visualization tools<a href="atac-seq-1.html#online-visualization-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="http://cistrome.org/db/#/">Cistrome DB</a> - a visual tool to allow you to browse your ATAC-seq data.</li>
 <li><a href="http://xena.ucsc.edu/">UCSC Xena</a> is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ATAC-seq data.</li>
 <li><a href="https://software.broadinstitute.org/software/igv/">Integrative Genomics Viewer (IGV)</a> is a track-based browser for interactively exploring genomic data mapped to a reference genome.</li>
 </ul>
 </div>
-<div id="more-resources-about-atac-seq-data" class="section level2" number="16.11">
-<h2><span class="header-section-number">16.11</span> More resources about ATAC-seq data</h2>
+<div id="more-resources-about-atac-seq-data" class="section level2 hasAnchor" number="16.11">
+<h2><span class="header-section-number">16.11</span> More resources about ATAC-seq data<a href="atac-seq-1.html#more-resources-about-atac-seq-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://training.galaxyproject.org/training-material/topics/epigenetics/tutorials/atac-seq/slides.html#1">ATAC-seq overview from Galaxy</a> - these slides explain the overarching concepts of ATAC-seq.</li>
 <li><a href="https://github.com/harvardinformatics/ATAC-seq">ATAC seq guidelines from Harvard</a> - this workflow runs through step by step how to analysis ATAC-seq data and what different parameters mean.</li>
@@ -787,7 +781,7 @@ <h2><span class="header-section-number">16.11</span> More resources about ATAC-s
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-ghavi2019highly" class="csl-entry">
 Ghavi-Helm, Yad, Aleksander Jankowski, Sascha Meiers, Rebecca R Viales, Jan O Korbel, and Eileen EM Furlong. 2019. <span>“Highly Rearranged Chromosomes Reveal Uncoupling Between Genome Topology and Gene Expression.”</span> <em>Nature Genetics</em> 51 (8): 1272–82.
@@ -806,10 +800,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -879,7 +880,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/bulk-rna-seq-1.html b/docs/no_toc/bulk-rna-seq-1.html
index b6c89f91..281c768d 100644
--- a/docs/no_toc/bulk-rna-seq-1.html
+++ b/docs/no_toc/bulk-rna-seq-1.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 12 Bulk RNA-seq | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 12 Bulk RNA-seq | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="rna-methods-overview.html"/>
 <link rel="next" href="single-cell-rna-seq.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 12 Bulk RNA-seq | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,37 +535,37 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="bulk-rna-seq-1" class="section level1" number="12">
-<h1><span class="header-section-number">Chapter 12</span> Bulk RNA-seq</h1>
+<div id="bulk-rna-seq-1" class="section level1 hasAnchor" number="12">
+<h1><span class="header-section-number">Chapter 12</span> Bulk RNA-seq<a href="bulk-rna-seq-1.html#bulk-rna-seq-1" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-10" class="section level2" number="12.1">
-<h2><span class="header-section-number">12.1</span> Learning Objectives</h2>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png" title="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" alt="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" width="100%" /></p>
+<div id="learning-objectives-10" class="section level2 hasAnchor" number="12.1">
+<h2><span class="header-section-number">12.1</span> Learning Objectives<a href="bulk-rna-seq-1.html#learning-objectives-10" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png" alt="This chapter will demonstrate how to: Understand the basics of RNA-Seq data collection and processing workflow. Identify the next steps for your particular RNA-seq data. Formulate questions to ask about your RNA-seq data" width="100%" /></p>
 </div>
-<div id="where-rna-seq-data-comes-from" class="section level2" number="12.2">
-<h2><span class="header-section-number">12.2</span> Where RNA-seq data comes from</h2>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_5.png" title="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." alt="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." width="100%" /></p>
+<div id="where-rna-seq-data-comes-from" class="section level2 hasAnchor" number="12.2">
+<h2><span class="header-section-number">12.2</span> Where RNA-seq data comes from<a href="bulk-rna-seq-1.html#where-rna-seq-data-comes-from" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_5.png" alt="Bulk RNA-seq data is generated by extracting total RNA and then isolating RNA specific species by either Poly-A selection, Ribo depletion, or size selection. The isolated RNA is then converted to cDNA so it is more stable for sequencing. This cDNA is used to construct a sequencing library. Lastly PCR amplification is used to make many copies to use for sequencing." width="100%" /></p>
 </div>
-<div id="rna-seq-workflow" class="section level2" number="12.3">
-<h2><span class="header-section-number">12.3</span> RNA-seq workflow</h2>
+<div id="rna-seq-workflow" class="section level2 hasAnchor" number="12.3">
+<h2><span class="header-section-number">12.3</span> RNA-seq workflow<a href="bulk-rna-seq-1.html#rna-seq-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses.</p>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png" title="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " alt="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " width="100%" /></p>
-<p>In this chapter we will highlight some of the more popular RNA-seq tools, that are generally suitable for most experiment data but there is no “one size fits all” for computational analysis of RNA-seq data <span class="citation">(<a href="#ref-Conesa2016" role="doc-biblioref">Conesa et al. 2016</a>)</span>. You may find tools out there that better suit your needs than the ones we discuss here.</p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png" alt="In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. " width="100%" /></p>
+<p>In this chapter we will highlight some of the more popular RNA-seq tools, that are generally suitable for most experiment data but there is no “one size fits all” for computational analysis of RNA-seq data <span class="citation">(<a href="#ref-Conesa2016">Conesa et al. 2016</a>)</span>. You may find tools out there that better suit your needs than the ones we discuss here.</p>
 </div>
-<div id="rna-seq-data-strengths" class="section level2" number="12.4">
-<h2><span class="header-section-number">12.4</span> RNA-seq data <strong>strengths</strong></h2>
+<div id="rna-seq-data-strengths" class="section level2 hasAnchor" number="12.4">
+<h2><span class="header-section-number">12.4</span> RNA-seq data <strong>strengths</strong><a href="bulk-rna-seq-1.html#rna-seq-data-strengths" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>RNA-seq can give you an idea of the transcriptional activity of a sample.</li>
 <li>RNA-seq has a more dynamic range of quantification than gene expression microarrays are able to measure.</li>
 <li>RNA-seq is able to be used for transcript discovery unlike gene expression microarrays.</li>
 </ul>
 </div>
-<div id="rna-seq-data-limitations" class="section level2" number="12.5">
-<h2><span class="header-section-number">12.5</span> RNA-seq data <strong>limitations</strong></h2>
+<div id="rna-seq-data-limitations" class="section level2 hasAnchor" number="12.5">
+<h2><span class="header-section-number">12.5</span> RNA-seq data <strong>limitations</strong><a href="bulk-rna-seq-1.html#rna-seq-data-limitations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>RNA-seq suffers from a lot of the common sequence biases which are further worsened by PCR amplification steps. We discussed some of the sequence biases in the <a href="">previous sequencing chapter</a>.</p>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142e3de7ce8_0_19.png" title="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." alt="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." width="100%" /></p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142e3de7ce8_0_19.png" alt="RNA-seq data has various biases introduced to the data upon data generation. RNA targets are more likely to be picked up if they are long, if they are from the 3 prime end, have a particular GC content and have a particular read start sequence." width="100%" /></p>
 <p>These biases are nicely covered in <a href="https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/">this blog by Mike Love</a> and we’ll summarize them here:</p>
 <ul>
 <li><strong>Fragment length</strong>: Longer transcripts are more likely to be identified than shorter transcripts because there’s more material to pull from.<br />
@@ -581,22 +575,22 @@ <h2><span class="header-section-number">12.5</span> RNA-seq data <strong>limitat
 <li><strong>Read start bias</strong>: Certain reads are more likely to be bound by random hexamer primers than others.</li>
 </ul>
 <p><em>Main Takeaway</em>: When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way.</p>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png" title="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." alt="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." width="100%" /></p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png" alt="When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way." width="100%" /></p>
 </div>
-<div id="rna-seq-data-considerations" class="section level2" number="12.6">
-<h2><span class="header-section-number">12.6</span> RNA-seq data considerations</h2>
-<div id="ribo-minus-vs-poly-a-selection" class="section level3" number="12.6.1">
-<h3><span class="header-section-number">12.6.1</span> Ribo minus vs poly A selection</h3>
+<div id="rna-seq-data-considerations" class="section level2 hasAnchor" number="12.6">
+<h2><span class="header-section-number">12.6</span> RNA-seq data considerations<a href="bulk-rna-seq-1.html#rna-seq-data-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="ribo-minus-vs-poly-a-selection" class="section level3 hasAnchor" number="12.6.1">
+<h3><span class="header-section-number">12.6.1</span> Ribo minus vs poly A selection<a href="bulk-rna-seq-1.html#ribo-minus-vs-poly-a-selection" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Most of the RNA in the cell is not mRNA or noncoding RNAs of interest, but instead loads of ribosomal RNA a. So before you can prepare and sequence your data you need to isolate the RNAs to those you are interested in. There are two major methods to do this:</p>
 <ul>
 <li><strong>Poly A selection</strong> - Keep only RNAs that have poly A tails – remember that mRNAs and some kinds of noncoding RNAs have poly A tails added to them after they are transcribed. A drawback of this method is that transcripts that are not generally polyadenylated: microRNAs, snoRNAs, certain long noncoding RNAs, or immature transcripts will be discarded. There is also generally a worse 3’ bias with this method since you are selecting based on poly A tails on the 3’ end.</li>
 <li><strong>Ribo-minus</strong> - Subtract all the ribosomal RNA and be left with an RNA pool of interest. A drawback of this method is that you will need to use greater sequencing depths than you would with poly A selection (because there is more material in your resulting transcript pool).</li>
 </ul>
 <p><a href="https://blog.sitoolsbiotech.com/2019/08/ribo-depletion-rna-seq-ribosomal-rna-depletion-method-works-best/">This blog by Sitools Biotech does a good summary</a> of the pros and cons of either selection method.</p>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png" title="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." alt="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." width="100%" /></p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png" alt="Poly A selection advantages: lower sequencing depth needed. Greater exonic coverage. Disadvantages of Poly A selection is that it does not detect non-polyA transcripts including miRNAs, snoRNAs, and some lncRNAs. It obtains less information on immature transcripts. It performs poorly for degraded RNA or Formalin-Fixed Paraffin-Embedded (FFPE) samples Bias towards 3’ end of transcripts. Cannot be used for prokaryotes. Ribo minus advantages are: It is able to detect small and non-polyadenylated RNAs. It detects long and short transcripts (no 3’ bias). It has better performance on degraded RNa or FFPE samples. It is applicable for prokaryotes. It can be applied toward other abundant RNA. The disadvantages of Ribo minus is that it will collect more intronic reads and immature RNAs (if you are not interested in those). And thus because of the greater quantity of the returned RNA pool. It requires greater sequencing depths." width="100%" /></p>
 </div>
-<div id="transcriptome-mapping" class="section level3" number="12.6.2">
-<h3><span class="header-section-number">12.6.2</span> Transcriptome mapping</h3>
+<div id="transcriptome-mapping" class="section level3 hasAnchor" number="12.6.2">
+<h3><span class="header-section-number">12.6.2</span> Transcriptome mapping<a href="bulk-rna-seq-1.html#transcriptome-mapping" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>How do you know which read belongs to which transcript? This is where alignment comes into play for RNA-seq There are two major approaches we will discuss with examples of tools that employ them.</p>
 <ul>
 <li><p><strong>Traditional aligners</strong> - Align your data to a reference using standard alignment algorithms. Can be very computationally intensive. Traditional alignment is the original approach to alignment which takes each read and finds where and how in the genome/transcriptome it aligns. If you are interested in identifying the intracacies of different splices and their boundaries, you may need to use one of these traditional alignment methods. But for common quantification purposes, you may want to look into pseudo alignment to save you time.
@@ -615,10 +609,10 @@ <h3><span class="header-section-number">12.6.2</span> Transcriptome mapping</h3>
 <li><p><strong>Reference free assembly</strong> - The first two methods we’ve discussed employ aligning to a reference genome or transcriptome. But alternatively, if you are much more interested in transcript identification or you are working with a model organism that doesn’t have a well characterized reference genome/transcriptome, then de novo assembly is another approach to take. As you may suspect, this is the most computationally demanding approach and also requires deeper sequencing depth than alignment to a reference. But depending on your goals, this may be your preferred option.</p></li>
 </ul>
 <p>These strategies are discussed at greater length <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0881-8">in this excellent manuscript by Conesa et al, 2016</a>.</p>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png" title="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." alt="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." width="100%" /></p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png" alt="TopHat Uses an expectation-maximization approach that estimates transcript abundances. Cufflinks is designed to take advantage of PE reads, and may use GTF information to identify expressed transcripts, or can infer transcripts de novo from the mapping data alone. RSEM, eXpress, Sailfish, Kallisto, and Salmon - Quantify expression from transcriptome mapping and allocate multi-mapping reads among transcript and output within-sample normalized values corrected for sequencing biases. NURD Provides an efficient way of estimating transcript expression from SE reads with a low memory and computing cost." width="100%" /></p>
 </div>
-<div id="abundance-measures" class="section level3" number="12.6.3">
-<h3><span class="header-section-number">12.6.3</span> Abundance measures</h3>
+<div id="abundance-measures" class="section level3 hasAnchor" number="12.6.3">
+<h3><span class="header-section-number">12.6.3</span> Abundance measures<a href="bulk-rna-seq-1.html#abundance-measures" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>If your RNA-seq data has already been processed, it may have abundance measure reported with it already. But there are various types of abundance measures used – what do they represent?</p>
 <ul>
 <li><strong>raw counts</strong> - this is a raw number of how many times a transcript was counted in a sample.</li>
@@ -658,11 +652,11 @@ <h3><span class="header-section-number">12.6.3</span> Abundance measures</h3>
 <blockquote>
 <p>When you use TPM, the sum of all TPMs in each sample are the same. This makes it easier to compare the proportion of reads that mapped to a gene in each sample. In contrast, with RPKM and FPKM, the sum of the normalized reads in each sample may be different, and this makes it harder to compare samples directly.</p>
 </blockquote>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png" title="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." alt="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." width="100%" /></p>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png" alt="When looking for analysis tools, pay attention to what abundance measures the tool expects to be given with respect to what transformations have already been done to your data (if any)." width="100%" /></p>
 </div>
-<div id="rna-seq-downstream-analysis-tools" class="section level3" number="12.6.4">
-<h3><span class="header-section-number">12.6.4</span> RNA-seq downstream analysis tools</h3>
-<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_80.png" title="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" alt="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" width="100%" /></p>
+<div id="rna-seq-downstream-analysis-tools" class="section level3 hasAnchor" number="12.6.4">
+<h3><span class="header-section-number">12.6.4</span> RNA-seq downstream analysis tools<a href="bulk-rna-seq-1.html#rna-seq-downstream-analysis-tools" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p><img src="resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_80.png" alt="For DESeq: Read count distribution assumes a negative binomial distribution. Raw counts are expected. Replicates are not dealt with.Normalization is done with respect to Library size. For edgeR: Read count distribution assumes Bayesian methods for negative binomial distribution Input is raw counts. Yes, can deal with replicates. Normalization is done with respect to Library size, TMM, RLE, Upperquartile. For baySeq: Read count distribution assumes bayesian methods for negative binomial distribution. Input is raw counts. Can deal with replicates. Library size, Quantile, TMM For NOISeq: Read count distribution is assumed to be Non-parametric. Input is raw or normalized counts Does not deal with replicates. Normalization is done with respect to Library size, RPKM, TMM, Upperquartile" width="100%" /></p>
 <ul>
 <li><a href="https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html#:~:text=Complex%20heatmaps%20are%20efficient%20to,and%20supports%20various%20annotation%20graphics.">ComplexHeatmap</a> is great for visualizations</li>
 <li><a href="https://www.bioconductor.org/packages/release/bioc/html/DESeq2.html">DESEq2</a> and <a href="https://www.bioconductor.org/packages/release/bioc/html/edgeR.html">edgeR</a> are great for differential expression analyses.</li>
@@ -672,8 +666,8 @@ <h3><span class="header-section-number">12.6.4</span> RNA-seq downstream analysi
 </ul>
 </div>
 </div>
-<div id="visualization-gui-tools" class="section level2" number="12.7">
-<h2><span class="header-section-number">12.7</span> Visualization GUI tools</h2>
+<div id="visualization-gui-tools" class="section level2 hasAnchor" number="12.7">
+<h2><span class="header-section-number">12.7</span> Visualization GUI tools<a href="bulk-rna-seq-1.html#visualization-gui-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://webmev.tm4.org">WebMeV</a> uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data.</li>
 <li><a href="http://xena.ucsc.edu/">UCSC Xena</a> is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data.</li>
@@ -681,28 +675,28 @@ <h2><span class="header-section-number">12.7</span> Visualization GUI tools</h2>
 <li><a href="https://www.ndexbio.org/#/">Network Data Exchange (NDEx)</a> is a project that provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge.</li>
 </ul>
 </div>
-<div id="rna-seq-data-resources" class="section level2" number="12.8">
-<h2><span class="header-section-number">12.8</span> RNA-seq data resources</h2>
+<div id="rna-seq-data-resources" class="section level2 hasAnchor" number="12.8">
+<h2><span class="header-section-number">12.8</span> RNA-seq data resources<a href="bulk-rna-seq-1.html#rna-seq-data-resources" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://maayanlab.cloud/archs4/">ARCHS4</a> (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA.</li>
 <li><a href="https://www.refine.bio/">Refine.bio</a> - a repository of uniformly processed and normalized, ready-to-use transcriptome data from publicly available sources.</li>
 </ul>
 </div>
-<div id="more-reading-about-rna-seq-data" class="section level2" number="12.9">
-<h2><span class="header-section-number">12.9</span> More reading about RNA-seq data</h2>
+<div id="more-reading-about-rna-seq-data" class="section level2 hasAnchor" number="12.9">
+<h2><span class="header-section-number">12.9</span> More reading about RNA-seq data<a href="bulk-rna-seq-1.html#more-reading-about-rna-seq-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://alexslemonade.github.io/refinebio-examples/03-rnaseq/00-intro-to-rnaseq.html">Refine.bio’s introduction to RNA-seq</a></li>
-<li><a href="https://www.youtube.com/watch?v=tlf6wYJrwKY">StatQuest: A gentle introduction to RNA-seq</a> <span class="citation">(<a href="#ref-Starmer2017-rnaseq" role="doc-biblioref">Starmer 2017</a>)</span>.</li>
-<li><a href="https://bitesizebio.com/13542/what-everyone-should-know-about-rna-seq/">A general background on the wet lab methods of RNA-seq</a> <span class="citation">(<a href="#ref-Hadfield2016" role="doc-biblioref">Hadfield 2016</a>)</span>.</li>
-<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5143225/">Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation</a> <span class="citation">(<a href="#ref-Love2016" role="doc-biblioref">M. I. Love, Hogenesch, and Irizarry 2016</a>)</span>.</li>
-<li><a href="https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/">Mike Love blog post about sequencing biases</a> <span class="citation">(<a href="#ref-bias-blog" role="doc-biblioref">M. Love 2016</a>)</span></li>
-<li><a href="https://pdfs.semanticscholar.org/9d16/997f5de72d6c606fef3d673db70e5d1d8e1e.pdf?_ga=2.131436679.965169313.1600175795-124991789.1600175795">Biases in Illumina transcriptome sequencing caused by random hexamer priming</a> <span class="citation">(<a href="#ref-Hansen2010" role="doc-biblioref">Hansen, Brenner, and Dudoit 2010</a>)</span>.</li>
-<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4121056/">Computation for RNA-seq and ChIP-seq studies</a> <span class="citation">(<a href="#ref-Pepke2009" role="doc-biblioref">Pepke, Wold, and Mortazavi 2009</a>)</span>.</li>
+<li><a href="https://www.youtube.com/watch?v=tlf6wYJrwKY">StatQuest: A gentle introduction to RNA-seq</a> <span class="citation">(<a href="#ref-Starmer2017-rnaseq">Starmer 2017</a>)</span>.</li>
+<li><a href="https://bitesizebio.com/13542/what-everyone-should-know-about-rna-seq/">A general background on the wet lab methods of RNA-seq</a> <span class="citation">(<a href="#ref-Hadfield2016">Hadfield 2016</a>)</span>.</li>
+<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5143225/">Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation</a> <span class="citation">(<a href="#ref-Love2016">M. I. Love, Hogenesch, and Irizarry 2016</a>)</span>.</li>
+<li><a href="https://mikelove.wordpress.com/2016/09/26/rna-seq-fragment-sequence-bias/">Mike Love blog post about sequencing biases</a> <span class="citation">(<a href="#ref-bias-blog">M. Love 2016</a>)</span></li>
+<li><a href="https://pdfs.semanticscholar.org/9d16/997f5de72d6c606fef3d673db70e5d1d8e1e.pdf?_ga=2.131436679.965169313.1600175795-124991789.1600175795">Biases in Illumina transcriptome sequencing caused by random hexamer priming</a> <span class="citation">(<a href="#ref-Hansen2010">Hansen, Brenner, and Dudoit 2010</a>)</span>.</li>
+<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4121056/">Computation for RNA-seq and ChIP-seq studies</a> <span class="citation">(<a href="#ref-Pepke2009">Pepke, Wold, and Mortazavi 2009</a>)</span>.</li>
 </ul>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-Conesa2016" class="csl-entry">
 Conesa, Ana, Pedro Madrigal, Sonia Tarazona, David Gomez-Cabrero, Alejandra Cervera, Andrew McPherson, Michał Wojciech Szcześniak, et al. 2016. <span>“A Survey of Best Practices for <span>RNA</span>-Seq Data Analysis.”</span> <em>Genome Biology</em> 17 (1). <a href="https://doi.org/10.1186/s13059-016-0881-8">https://doi.org/10.1186/s13059-016-0881-8</a>.
@@ -727,10 +721,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -800,7 +801,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/chip-seq-1.html b/docs/no_toc/chip-seq-1.html
index 7a2d8433..465e5c18 100644
--- a/docs/no_toc/chip-seq-1.html
+++ b/docs/no_toc/chip-seq-1.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 18 ChIP-Seq | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 18 ChIP-Seq | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="single-cell-atac-seq-1.html"/>
 <link rel="next" href="cutrun-and-cuttag.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 18 ChIP-Seq | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,18 +535,18 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="chip-seq-1" class="section level1" number="18">
-<h1><span class="header-section-number">Chapter 18</span> ChIP-Seq</h1>
+<div id="chip-seq-1" class="section level1 hasAnchor" number="18">
+<h1><span class="header-section-number">Chapter 18</span> ChIP-Seq<a href="chip-seq-1.html#chip-seq-1" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-16" class="section level2" number="18.1">
-<h2><span class="header-section-number">18.1</span> Learning Objectives</h2>
-<p><img src="resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" width="100%" /></p>
+<div id="learning-objectives-16" class="section level2 hasAnchor" number="18.1">
+<h2><span class="header-section-number">18.1</span> Learning Objectives<a href="chip-seq-1.html#learning-objectives-16" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of ChIP-Seq data collection and processing workflow. Identify the next steps for your particular ChIP-Seq data. Formulate questions to ask about your ChIP-Seq data" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-chip-seq-analysis" class="section level2" number="18.2">
-<h2><span class="header-section-number">18.2</span> What are the goals of ChIP-Seq analysis?</h2>
-<p><img src="resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png" title="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." alt="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." width="100%" /></p>
+<div id="what-are-the-goals-of-chip-seq-analysis" class="section level2 hasAnchor" number="18.2">
+<h2><span class="header-section-number">18.2</span> What are the goals of ChIP-Seq analysis?<a href="chip-seq-1.html#what-are-the-goals-of-chip-seq-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png" alt="The goal of ChIP-seq is to identify, for a particular DNA binding protein, all of the DNA sequences that it binds to." width="100%" /></p>
 <p>ChIP-Seq (chromatin immunoprecipitation sequencing) and related approaches are used to identify genome-wide binding sites of specific proteins or protein complexes. Given the diversity of interactions at the DNA-protein interface, sequencing-based methods for targeted chromatin capture have evolved to meet precise research needs and improve the quality of the results. Specifically, ChIP-Seq builds on protein immunoprecipitation techniques (IP) by applying next generation sequencing to a pulldown product. IP followed by sequencing can be applied to any nucleic-acid binding protein for which an antibody is available, including a known or putative transcription factor (TF), chromatin remodeler or histone modifications, or other DNA- or chromatin-specific factors. ChiP-Seq approaches have been honed to increase signal-to-noise, reduce input material, and more specifically map protein-DNA interactions, for example by treating the IP product with a exonuclease that chews-back unprotected DNA end (e.g. ChIP-exo).</p>
 <p>The main goals of analysis for ChIP-Seq approaches are:</p>
 <ul>
@@ -563,8 +557,8 @@ <h2><span class="header-section-number">18.2</span> What are the goals of ChIP-S
 <li><strong>Integration with other -omics data:</strong> Given the expansive repositories of publicly available sequencing data, creating a comprehensive narrative from a ChIP-Seq experiment usually involves comparison with other types of sequencing data. Just like how a ChIP-Seq peak list can be interpreted through existing genome annotations, other sequencing data can be interpreted through the binding sites identified from a given ChIP-Seq experiment. For example, a sequence variant might be enriched for or against in protein binding sites versus previously identified motifs. This would suggest that a mutation would alter DNA-protein interactions. Binding of a specific gene-regulatory element might also correlate with changes in gene expression.</li>
 </ul>
 </div>
-<div id="chip-seq-general-workflow-overview" class="section level2" number="18.3">
-<h2><span class="header-section-number">18.3</span> ChIP-Seq general workflow overview</h2>
+<div id="chip-seq-general-workflow-overview" class="section level2 hasAnchor" number="18.3">
+<h2><span class="header-section-number">18.3</span> ChIP-Seq general workflow overview<a href="chip-seq-1.html#chip-seq-general-workflow-overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>&lt;TODO: add data formats in a graphical format&gt;</p>
 <p>A key contribution of large consortia, such as the ENCODE consortium, are standardized processing workflows to facilitate the integration of ChIP-seq data generated in different labs. While the exact data processing needs of any given experiment may vary, established pipelines provide a helpful starting point. In choosing a data processing workflow, it is essential to note the input data format. For example, the read length should be considered, as well as the sequencing paradigm (i.e. whether the data is single-end or paired-end).</p>
 <p>The most generic steps for processing ChIP-Seq data are:</p>
@@ -577,8 +571,8 @@ <h2><span class="header-section-number">18.3</span> ChIP-Seq general workflow ov
 <li><strong>Integrative analysis:</strong> Finally, integrative analysis with other -omics data can be performed to gain biological insights into the ChIP-Seq data. This can involve interpreting ChiP-Seq data through existing annotations by looking at signal enrichment in different genomic regions, like transcription start sites (TSSs), gene bodies, and previously-identified cis-regulatory elements (CREs). ChIP-Seq data can even be interpreted through other ChIP-seq data to see if features overlap with statistical testing for similarity using packages like BEDTools and Bedops.</li>
 </ul>
 </div>
-<div id="chip-seq-data-strengths" class="section level2" number="18.4">
-<h2><span class="header-section-number">18.4</span> ChIP-Seq data <strong>strengths</strong>:</h2>
+<div id="chip-seq-data-strengths" class="section level2 hasAnchor" number="18.4">
+<h2><span class="header-section-number">18.4</span> ChIP-Seq data <strong>strengths</strong>:<a href="chip-seq-1.html#chip-seq-data-strengths" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful tool for understanding the genomic locations where a specific protein or protein complex binds.</p>
 <p>ChIP-Seq is particularly good at showing or illustrating:</p>
 <ul>
@@ -588,8 +582,8 @@ <h2><span class="header-section-number">18.4</span> ChIP-Seq data <strong>streng
 <li><strong>Differential binding analysis:</strong> ChIP-Seq can be used to compare the binding of a protein or protein complex in different conditions or cell types, which can provide insight into the mechanisms that regulate protein binding and the impact of different cellular states on the regulatory networks.</li>
 </ul>
 </div>
-<div id="chip-seq-data-limitations" class="section level2" number="18.5">
-<h2><span class="header-section-number">18.5</span> ChIP-Seq data <strong>limitations</strong>:</h2>
+<div id="chip-seq-data-limitations" class="section level2 hasAnchor" number="18.5">
+<h2><span class="header-section-number">18.5</span> ChIP-Seq data <strong>limitations</strong>:<a href="chip-seq-1.html#chip-seq-data-limitations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful technique, but there are several biases, caveats, and problems that can arise when analyzing ChIP-Seq data.</p>
 <p>Some of the most common biases, caveats, and problems are:</p>
 <ul>
@@ -601,16 +595,16 @@ <h2><span class="header-section-number">18.5</span> ChIP-Seq data <strong>limita
 <li><strong>Interpretation of binding sites:</strong> Finally, the interpretation of binding sites identified by ChIP-Seq can be complex and requires additional validation to confirm their biological relevance and function. Notably, ChIP-Seq cannot distinguish direct protein-DNA interaction from indirect binding (e.g. where a protein may bind another protein that binds to DNA).</li>
 </ul>
 </div>
-<div id="chip-seq-data-considerations" class="section level2" number="18.6">
-<h2><span class="header-section-number">18.6</span> ChIP-Seq data considerations</h2>
+<div id="chip-seq-data-considerations" class="section level2 hasAnchor" number="18.6">
+<h2><span class="header-section-number">18.6</span> ChIP-Seq data considerations<a href="chip-seq-1.html#chip-seq-data-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>As a general guideline, a minimum sequencing depth of 20 million reads is recommended for ChIP-seq experiments in Drosophila, whereas 40–50 million reads is a practical minimum for most marks in human tissue (PMID: 24598259). However, this depth may not be sufficient for some analyses, particularly for studies that require high resolution or low signal-to-noise ratio. In such cases, deeper sequencing may be necessary to achieve the desired level of sensitivity and specificity.</p>
 <p>In general, epitopes that cover large sequence space (e.g. repressive histone modification such as H3K27me3) require greater sequencing depth than epitopes confined to more narrow genomic regions (e.g. active histone modifications such as H3K4 methylation and H3K27ac). ChIP-seq for TFs may require even less sequencing depth; however, low antibody specificity may necessitate deeper sequencing due to low signal-to-noise.
 In practice, the depth of sequencing required for ChIP-seq experiments can vary widely depending on the specific experimental design and research question. It is important to perform a pilot study or use appropriate statistical methods to estimate the necessary sequencing depth for a given experiment. Choosing a specific antibody is essential, otherwise even deep sequencing may not recover signal over high background. Sequencing depth should also account for genome size (e.g. larger genome requires deeper sequencing).</p>
 </div>
-<div id="chip-seq-analysis-tools" class="section level2" number="18.7">
-<h2><span class="header-section-number">18.7</span> ChiP-seq analysis tools</h2>
-<div id="tools-for-quality-checks" class="section level3" number="18.7.1">
-<h3><span class="header-section-number">18.7.1</span> Tools for quality checks</h3>
+<div id="chip-seq-analysis-tools" class="section level2 hasAnchor" number="18.7">
+<h2><span class="header-section-number">18.7</span> ChiP-seq analysis tools<a href="chip-seq-1.html#chip-seq-analysis-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="tools-for-quality-checks" class="section level3 hasAnchor" number="18.7.1">
+<h3><span class="header-section-number">18.7.1</span> Tools for quality checks<a href="chip-seq-1.html#tools-for-quality-checks" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">FastQC</a> is a widely used tool that is used to assess the quality of sequencing data. It analyzes the raw sequencing data and generates a report that provides an overview of various metrics such as base quality, sequence length distribution, and GC content.</li>
 <li><a href="https://broadinstitute.github.io/picard/">Picard tools</a> and <a href="http://www.htslib.org/">SAMtools</a>: Picard tools and SAMtools are two collections of command-line tools that are used to manipulate and analyze high-throughput sequencing data. They can be used to check the quality of the data, remove duplicates, and generate summary statistics.</li>
@@ -619,8 +613,8 @@ <h3><span class="header-section-number">18.7.1</span> Tools for quality checks</
 </ul>
 <p>These tools are just a few examples of the many quality control tools available for ChIP-Seq analysis. The choice of tool(s) to use will depend on the specific analysis being performed and the preferences of the user.</p>
 </div>
-<div id="tools-for-peak-calling" class="section level3" number="18.7.2">
-<h3><span class="header-section-number">18.7.2</span> Tools for Peak calling:</h3>
+<div id="tools-for-peak-calling" class="section level3 hasAnchor" number="18.7.2">
+<h3><span class="header-section-number">18.7.2</span> Tools for Peak calling:<a href="chip-seq-1.html#tools-for-peak-calling" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://pypi.org/project/MACS2/">MACS2 (Model-based Analysis of ChIP-Seq)</a> is a widely used tool for peak calling in ChIP-Seq data. It uses a Poisson distribution to model the local noise and identifies peaks based on the fold enrichment over the background noise.</li>
 <li><a href="https://zanglab.github.io/SICER2/">SICER</a>: Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) is a peak caller that takes into account the spatial clustering of enriched regions in ChIP-Seq data. It uses a clustering algorithm to identify peaks based on the local density of enriched regions.</li>
@@ -628,8 +622,8 @@ <h3><span class="header-section-number">18.7.2</span> Tools for Peak calling:</h
 <li><a href="https://github.com/gersteinlab/PeakSeq">PeakSeq</a> is a peak caller that uses a Bayesian approach to identify enriched regions in ChIP-Seq data. It models the relationship between the read counts and the signal-to-noise ratio and identifies peaks based on the posterior probability of enrichment.</li>
 </ul>
 </div>
-<div id="tools-for-differential-analysis" class="section level3" number="18.7.3">
-<h3><span class="header-section-number">18.7.3</span> Tools for Differential Analysis</h3>
+<div id="tools-for-differential-analysis" class="section level3 hasAnchor" number="18.7.3">
+<h3><span class="header-section-number">18.7.3</span> Tools for Differential Analysis<a href="chip-seq-1.html#tools-for-differential-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://bioconductor.org/packages/release/bioc/html/DESeq2.html">DESeq2</a>: This is a widely used R package for differential analysis of sequencing count data, including ChIP-seq. It uses a negative binomial model to normalize and test for differential enrichment of ChIP-seq peaks.</li>
 <li><a href="https://bioconductor.org/packages/release/bioc/html/edgeR.html">edgeR</a>: Another popular R package for differential expression analysis of RNA-seq data, edgeR can also be used for differential analysis of ChIP-seq data. It uses a generalized linear model to estimate differential enrichment and has been shown to be effective for ChIP-seq data with low read counts.
@@ -643,8 +637,8 @@ <h3><span class="header-section-number">18.7.3</span> Tools for Differential Ana
 <li><a href="http://dbtoolkit.cistrome.org/">Cistrome DB</a>: The website allows users to upload their enriched regions, returning TF ChIP-seq, DNase-seq or ATAC-seq samples with similar profiles.</li>
 </ul>
 </div>
-<div id="motif-analysis-1" class="section level3" number="18.7.4">
-<h3><span class="header-section-number">18.7.4</span> Motif Analysis</h3>
+<div id="motif-analysis-1" class="section level3 hasAnchor" number="18.7.4">
+<h3><span class="header-section-number">18.7.4</span> Motif Analysis<a href="chip-seq-1.html#motif-analysis-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://meme-suite.org/meme/">MEME Suite</a>: The MEME Suite is a comprehensive suite of tools for motif analysis, including motif discovery and motif-based sequence analysis. It includes tools for discovering de novo motifs from ChIP-Seq data and for searching for known motifs in the regions bound by the protein of interest.</li>
 <li><a href="http://homer.ucsd.edu/homer/">HOMER</a> is a suite of tools for motif discovery and analysis. It includes tools for identifying de novo motifs from ChIP-Seq data, as well as for searching for known motifs in the regions bound by the protein of interest. HOMER also provides tools for performing gene ontology analysis and pathway analysis based on the identified motifs.</li>
@@ -652,8 +646,8 @@ <h3><span class="header-section-number">18.7.4</span> Motif Analysis</h3>
 <li><a href="https://meme-suite.org/meme/doc/centrimo.html">CentriMo</a>is a tool for identifying enriched motifs in ChIP-Seq data based on the position of the motif relative to the peak summit. It can be used to identify motifs that are enriched at the center of the peak, as well as those that are enriched near the edges of the peak.</li>
 </ul>
 </div>
-<div id="tools-for-preprocessing" class="section level3" number="18.7.5">
-<h3><span class="header-section-number">18.7.5</span> Tools for preprocessing</h3>
+<div id="tools-for-preprocessing" class="section level3 hasAnchor" number="18.7.5">
+<h3><span class="header-section-number">18.7.5</span> Tools for preprocessing<a href="chip-seq-1.html#tools-for-preprocessing" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="http://www.usadellab.org/cms/index.php?page=trimmomatic">Trimmomatic</a> is a widely used tool for trimming and filtering Illumina sequencing data. It is often used to remove low-quality reads, adapter sequences, and other artifacts that can affect downstream analysis.</li>
 <li><a href="https://cutadapt.readthedocs.io/en/stable/">Cutadapt</a> is another popular tool for trimming adapter sequences from high-throughput sequencing data. It is particularly useful for removing adapters that contain degenerate nucleotides or that have been ligated with variable lengths.</li>
@@ -662,8 +656,8 @@ <h3><span class="header-section-number">18.7.5</span> Tools for preprocessing</h
 <li><a href="http://www.htslib.org/">BEDTools</a> is a powerful suite of tools for working with genomic intervals, such as those generated by ChIP-Seq peak calling. It can be used for operations such as intersecting, merging, and subtracting intervals.</li>
 </ul>
 </div>
-<div id="tools-for-making-visualizations" class="section level3" number="18.7.6">
-<h3><span class="header-section-number">18.7.6</span> Tools for making visualizations</h3>
+<div id="tools-for-making-visualizations" class="section level3 hasAnchor" number="18.7.6">
+<h3><span class="header-section-number">18.7.6</span> Tools for making visualizations<a href="chip-seq-1.html#tools-for-making-visualizations" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://software.broadinstitute.org/software/igv/">Integrative Genomics Viewer (IGV)</a> is a popular genome browser that is widely used for the visualization of genomic data, including ChIP-Seq data. It provides a user-friendly interface for exploring genomic data at different levels of resolution, from the whole-genome level down to individual nucleotides.</li>
 <li>The <a href="https://genome.ucsc.edu/">UCSC Genome Browser</a> is another widely used genome browser that can be used to visualize ChIP-Seq data. It provides an intuitive interface for navigating and visualizing genomic data, including the ability to zoom in and out and to overlay multiple data tracks.</li>
@@ -672,8 +666,8 @@ <h3><span class="header-section-number">18.7.6</span> Tools for making visualiza
 <li><a href="http://gehlenborglab.org/research/projects/cistrome-explorer/">Cistrome-Explorer</a> A web-based visualization of compendia of ATAC-seq and histone modification ChIP-seq data for diverse samples, represented as a heatmap. Users can upload their ChIP-seq peak sets to assess the tissue specificity of their regions on the genome.</li>
 </ul>
 </div>
-<div id="tools-for-making-heatmaps" class="section level3" number="18.7.7">
-<h3><span class="header-section-number">18.7.7</span> Tools for making heatmaps</h3>
+<div id="tools-for-making-heatmaps" class="section level3 hasAnchor" number="18.7.7">
+<h3><span class="header-section-number">18.7.7</span> Tools for making heatmaps<a href="chip-seq-1.html#tools-for-making-heatmaps" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://github.com/fidelram/deepTools">Deeptools</a> is a widely used package for analyzing ChIP-seq data, and it includes a tool called “plotHeatmap” that can generate heatmaps from ChIP-seq data.</li>
 <li><a href="https://software.broadinstitute.org/software/igv/">Integrative Genomics Viewer (IGV)</a> is a popular tool for visualizing and exploring genomic data. It includes a heatmap function that can be used to generate heatmaps from ChIP-seq data.</li>
@@ -686,8 +680,8 @@ <h3><span class="header-section-number">18.7.7</span> Tools for making heatmaps<
 <p>The Cistrome Project has a large collection of human and mouse ChIP-seq, DNase-seq and ATAC-seq data, as well as tools for analyzing user generate ChIP-seq data with publicly available samples. These tools include the Cistrome Data Browser toolkit function that can find publicly available datasets that are similar to a ChIP-Seq peak set, and Cistrome-GO for gene ontology analysis of TF ChIP-seq target genes.</p>
 </div>
 </div>
-<div id="more-resources-about-chip-seq-data" class="section level2" number="18.8">
-<h2><span class="header-section-number">18.8</span> More resources about ChiP-seq data</h2>
+<div id="more-resources-about-chip-seq-data" class="section level2 hasAnchor" number="18.8">
+<h2><span class="header-section-number">18.8</span> More resources about ChiP-seq data<a href="chip-seq-1.html#more-resources-about-chip-seq-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>&lt;TODO: Put links to any resources and tutorials that are useful for ChIP-Seq data&gt;</p>
 <ul>
 <li><a href="https://liulab-dfci.github.io/bioinfo-combio/chip.html">Shirley Liu’s Computational biology course</a></li>
@@ -705,10 +699,17 @@ <h2><span class="header-section-number">18.8</span> More resources about ChiP-se
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -778,7 +779,7 @@ <h2><span class="header-section-number">18.8</span> More resources about ChiP-se
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/chromatin-methods-overview.html b/docs/no_toc/chromatin-methods-overview.html
index bc4aebd1..64b480d5 100644
--- a/docs/no_toc/chromatin-methods-overview.html
+++ b/docs/no_toc/chromatin-methods-overview.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 15 Chromatin Methods Overview | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 15 Chromatin Methods Overview | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="spatial-transcriptomics-1.html"/>
 <link rel="next" href="atac-seq-1.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 15 Chromatin Methods Overview | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,24 +535,24 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="chromatin-methods-overview" class="section level1" number="15">
-<h1><span class="header-section-number">Chapter 15</span> Chromatin Methods Overview</h1>
+<div id="chromatin-methods-overview" class="section level1 hasAnchor" number="15">
+<h1><span class="header-section-number">Chapter 15</span> Chromatin Methods Overview<a href="chromatin-methods-overview.html#chromatin-methods-overview" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is incomplete! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 <p>In its existing form, this chapter has been written with AI and still needs further verification by experts.</p>
 </div>
-<div id="learning-objectives-13" class="section level2" number="15.1">
-<h2><span class="header-section-number">15.1</span> Learning Objectives</h2>
-<p><img src="resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png" title="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." alt="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." width="100%" /></p>
+<div id="learning-objectives-13" class="section level2 hasAnchor" number="15.1">
+<h2><span class="header-section-number">15.1</span> Learning Objectives<a href="chromatin-methods-overview.html#learning-objectives-13" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png" alt="This chapter will demonstrate how to: Understand the goals and data collection processes for chromatin assays. Compare and contrast ATAC-seq, Single cell ATAC-seq, ChIP-seq, CUT&amp;RUN and CUT&amp;Tag." width="100%" /></p>
 </div>
-<div id="why-are-people-interested-in-chromatin" class="section level2" number="15.2">
-<h2><span class="header-section-number">15.2</span> Why are people interested in chromatin?</h2>
+<div id="why-are-people-interested-in-chromatin" class="section level2 hasAnchor" number="15.2">
+<h2><span class="header-section-number">15.2</span> Why are people interested in chromatin?<a href="chromatin-methods-overview.html#why-are-people-interested-in-chromatin" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Chromatin plays a crucial role in regulating gene expression, which is essential for a wide range of biological processes. It is the complex of DNA and proteins that make up the structure of chromosomes in the nucleus of a cell. The DNA in chromatin is packaged around histone proteins in a way that can either promote or inhibit access to the DNA by other proteins that control gene expression. Specifically, chromatin structure can affect the ability of transcription factors and RNA polymerase to bind to and transcribe genes.</p>
 <p>Changes in chromatin structure can lead to changes in gene expression, which can have profound effects on cell function and development. For example, chromatin remodeling is a key step in cell differentiation, during which cells become specialized and take on specific functions. Dysregulation of chromatin structure can also lead to the development of diseases, such as cancer, in which aberrant gene expression contributes to uncontrolled cell growth and proliferation.</p>
 <p>Therefore, understanding the mechanisms that regulate chromatin structure and function is crucial for advancing our understanding of cellular processes, disease development, and potential therapies. This is why chromatin research has become a major area of focus in molecular biology and genomics research.</p>
 </div>
-<div id="what-kinds-of-questions-can-chromatin-answer" class="section level2" number="15.3">
-<h2><span class="header-section-number">15.3</span> What kinds of questions can chromatin answer?</h2>
+<div id="what-kinds-of-questions-can-chromatin-answer" class="section level2 hasAnchor" number="15.3">
+<h2><span class="header-section-number">15.3</span> What kinds of questions can chromatin answer?<a href="chromatin-methods-overview.html#what-kinds-of-questions-can-chromatin-answer" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>How are genes turned on and off in response to developmental cues or environmental stimuli?</li>
 <li>What are the mechanisms by which chromatin structure is altered during cell differentiation and development?</li>
@@ -567,8 +561,8 @@ <h2><span class="header-section-number">15.3</span> What kinds of questions can
 <li>How is chromatin structure altered in diseases such as cancer, and how can this knowledge be used to develop new therapies?</li>
 <li>How can we manipulate chromatin structure to selectively activate or repress specific genes, and what are the potential applications of such approaches?</li>
 </ul>
-<div id="chromatin-is-involved-in-a-variety-of-biological-processes" class="section level3" number="15.3.1">
-<h3><span class="header-section-number">15.3.1</span> Chromatin is involved in a variety of biological processes:</h3>
+<div id="chromatin-is-involved-in-a-variety-of-biological-processes" class="section level3 hasAnchor" number="15.3.1">
+<h3><span class="header-section-number">15.3.1</span> Chromatin is involved in a variety of biological processes:<a href="chromatin-methods-overview.html#chromatin-is-involved-in-a-variety-of-biological-processes" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><strong>Gene expression</strong>: Chromatin structure and organization play a crucial role in regulating gene expression. The packaging of DNA around histone proteins can either promote or inhibit access to the DNA by other proteins that control gene expression.</li>
 <li><strong>DNA replication and repair</strong>: Chromatin structure can also affect DNA replication and repair. For example, histone modifications and chromatin remodeling can facilitate access to DNA replication and repair machinery.</li>
@@ -579,30 +573,30 @@ <h3><span class="header-section-number">15.3.1</span> Chromatin is involved in a
 </ul>
 </div>
 </div>
-<div id="comparison-of-technologies" class="section level2" number="15.4">
-<h2><span class="header-section-number">15.4</span> Comparison of technologies</h2>
-<p><img src="resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_5.png" title="A table that compares all the technologies:" alt="A table that compares all the technologies:" width="100%" /></p>
-<div id="atac-seq" class="section level3" number="15.4.1">
-<h3><span class="header-section-number">15.4.1</span> ATAC-seq:</h3>
+<div id="comparison-of-technologies" class="section level2 hasAnchor" number="15.4">
+<h2><span class="header-section-number">15.4</span> Comparison of technologies<a href="chromatin-methods-overview.html#comparison-of-technologies" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_5.png" alt="A table that compares all the technologies:" width="100%" /></p>
+<div id="atac-seq" class="section level3 hasAnchor" number="15.4.1">
+<h3><span class="header-section-number">15.4.1</span> ATAC-seq:<a href="chromatin-methods-overview.html#atac-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>ATAC-seq (Assay for Transposase Accessible Chromatin using sequencing) is a technique that uses transposases to fragment DNA and insert sequencing adapters into accessible chromatin regions. The DNA fragments are then sequenced to identify regions of open chromatin. This technique is widely used to study the epigenetic regulation of gene expression.</p>
-<div id="when-to-use-atac-seq" class="section level4" number="15.4.1.1">
-<h4><span class="header-section-number">15.4.1.1</span> When to use ATAC-seq:</h4>
+<div id="when-to-use-atac-seq" class="section level4 hasAnchor" number="15.4.1.1">
+<h4><span class="header-section-number">15.4.1.1</span> When to use ATAC-seq:<a href="chromatin-methods-overview.html#when-to-use-atac-seq" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>When you want to study the epigenetic regulation of gene expression.</li>
 <li>When you want to identify open chromatin regions associated with regulatory elements such as enhancers and promoters.</li>
 <li>When you want to study various cell types and tissues, including difficult-to-access cell types.</li>
 </ul>
 </div>
-<div id="advantages" class="section level4" number="15.4.1.2">
-<h4><span class="header-section-number">15.4.1.2</span> Advantages:</h4>
+<div id="advantages" class="section level4 hasAnchor" number="15.4.1.2">
+<h4><span class="header-section-number">15.4.1.2</span> Advantages:<a href="chromatin-methods-overview.html#advantages" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>ATAC-seq is a simple and cost-effective technique that requires a low amount of starting material.</li>
 <li>It allows the identification of open chromatin regions, which are usually associated with regulatory elements such as enhancers and promoters.</li>
 <li>ATAC-seq can be used to study various cell types and tissues, including difficult-to-access cell types.</li>
 </ul>
 </div>
-<div id="disadvantages" class="section level4" number="15.4.1.3">
-<h4><span class="header-section-number">15.4.1.3</span> Disadvantages:</h4>
+<div id="disadvantages" class="section level4 hasAnchor" number="15.4.1.3">
+<h4><span class="header-section-number">15.4.1.3</span> Disadvantages:<a href="chromatin-methods-overview.html#disadvantages" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>ATAC-seq can have high background noise due to non-specific cleavage of chromatin.</li>
 <li>It may miss lowly accessible regions due to a bias towards highly accessible regions.</li>
@@ -610,27 +604,27 @@ <h4><span class="header-section-number">15.4.1.3</span> Disadvantages:</h4>
 </ul>
 </div>
 </div>
-<div id="single-cell-atac-seq" class="section level3" number="15.4.2">
-<h3><span class="header-section-number">15.4.2</span> Single-cell ATAC-seq:</h3>
+<div id="single-cell-atac-seq" class="section level3 hasAnchor" number="15.4.2">
+<h3><span class="header-section-number">15.4.2</span> Single-cell ATAC-seq:<a href="chromatin-methods-overview.html#single-cell-atac-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Single-cell ATAC-seq is a technique that combines single-cell sequencing and ATAC-seq to identify open chromatin regions in individual cells. This technique allows the study of epigenetic heterogeneity between cells and the identification of cell-specific regulatory elements.</p>
-<div id="when-to-use-single-cell-atac-seq" class="section level4" number="15.4.2.1">
-<h4><span class="header-section-number">15.4.2.1</span> When to use single-cell ATAC-seq:</h4>
+<div id="when-to-use-single-cell-atac-seq" class="section level4 hasAnchor" number="15.4.2.1">
+<h4><span class="header-section-number">15.4.2.1</span> When to use single-cell ATAC-seq:<a href="chromatin-methods-overview.html#when-to-use-single-cell-atac-seq" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>When you want to study the epigenetic heterogeneity between cells and identify cell-specific regulatory elements.</li>
 <li>When you want to identify rare cell types or rare cell states that may be missed by bulk techniques.</li>
 <li>When you want to study the epigenetic dynamics of cells in response to environmental changes.</li>
 </ul>
 </div>
-<div id="advantages-1" class="section level4" number="15.4.2.2">
-<h4><span class="header-section-number">15.4.2.2</span> Advantages:</h4>
+<div id="advantages-1" class="section level4 hasAnchor" number="15.4.2.2">
+<h4><span class="header-section-number">15.4.2.2</span> Advantages:<a href="chromatin-methods-overview.html#advantages-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>Single-cell ATAC-seq allows the identification of open chromatin regions in individual cells, which provides cell-specific epigenetic information.</li>
 <li>It can identify rare cell types and rare cell states that may be missed by bulk techniques.</li>
 <li>It can be used to study the epigenetic dynamics of cells in response to environmental changes.</li>
 </ul>
 </div>
-<div id="disadvantages-1" class="section level4" number="15.4.2.3">
-<h4><span class="header-section-number">15.4.2.3</span> Disadvantages:</h4>
+<div id="disadvantages-1" class="section level4 hasAnchor" number="15.4.2.3">
+<h4><span class="header-section-number">15.4.2.3</span> Disadvantages:<a href="chromatin-methods-overview.html#disadvantages-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>Single-cell ATAC-seq can have a higher level of technical noise due to the low amount of starting material.</li>
 <li>It can be challenging to obtain high-quality single-cell suspensions from tissues.</li>
@@ -638,19 +632,19 @@ <h4><span class="header-section-number">15.4.2.3</span> Disadvantages:</h4>
 </ul>
 </div>
 </div>
-<div id="chip-seq" class="section level3" number="15.4.3">
-<h3><span class="header-section-number">15.4.3</span> ChIP-seq:</h3>
+<div id="chip-seq" class="section level3 hasAnchor" number="15.4.3">
+<h3><span class="header-section-number">15.4.3</span> ChIP-seq:<a href="chromatin-methods-overview.html#chip-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>ChIP-seq (Chromatin Immunoprecipitation sequencing) is a technique that uses antibodies to isolate specific DNA-protein complexes, such as transcription factors or histone modifications. The DNA fragments associated with the protein complexes are then sequenced to identify the genomic regions that are bound by the protein.</p>
-<div id="advantages-2" class="section level4" number="15.4.3.1">
-<h4><span class="header-section-number">15.4.3.1</span> Advantages:</h4>
+<div id="advantages-2" class="section level4 hasAnchor" number="15.4.3.1">
+<h4><span class="header-section-number">15.4.3.1</span> Advantages:<a href="chromatin-methods-overview.html#advantages-2" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>ChIP-seq allows the identification of specific protein-DNA interactions, which provides information on the regulation of gene expression.</li>
 <li>It can be used to study the epigenetic changes associated with specific cellular processes, such as differentiation or development.</li>
 <li>ChIP-seq can identify the binding sites of transcription factors, which can be used to identify regulatory elements such as enhancers and promoters.</li>
 </ul>
 </div>
-<div id="disadvantages-2" class="section level4" number="15.4.3.2">
-<h4><span class="header-section-number">15.4.3.2</span> Disadvantages:</h4>
+<div id="disadvantages-2" class="section level4 hasAnchor" number="15.4.3.2">
+<h4><span class="header-section-number">15.4.3.2</span> Disadvantages:<a href="chromatin-methods-overview.html#disadvantages-2" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>ChIP-seq requires a high amount of starting material and can be costly.</li>
 <li>It can have a high level of background noise due to non-specific binding of antibodies.</li>
@@ -658,11 +652,11 @@ <h4><span class="header-section-number">15.4.3.2</span> Disadvantages:</h4>
 </ul>
 </div>
 </div>
-<div id="cutrun" class="section level3" number="15.4.4">
-<h3><span class="header-section-number">15.4.4</span> CUT&amp;RUN</h3>
-<p>CUT&amp;RUN (Cleavage Under Targets &amp; Release Using Nuclease) is a relatively new genomic method that involves the targeted cleavage of DNA by a specific antibody or protein of interest, followed by the release and sequencing of the DNA fragments. The CUT&amp;RUN method was developed as a more streamlined alternative to the ChIP-seq (Chromatin Immunoprecipitation sequencing) method, which involves a more complex series of steps <span class="citation">Skene and Henikoff (<a href="#ref-skene2018targeted" role="doc-biblioref">2018</a>)</span>.</p>
-<div id="how-cutrun-works" class="section level4" number="15.4.4.1">
-<h4><span class="header-section-number">15.4.4.1</span> How CUT&amp;RUN works:</h4>
+<div id="cutrun" class="section level3 hasAnchor" number="15.4.4">
+<h3><span class="header-section-number">15.4.4</span> CUT&amp;RUN<a href="chromatin-methods-overview.html#cutrun" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p>CUT&amp;RUN (Cleavage Under Targets &amp; Release Using Nuclease) is a relatively new genomic method that involves the targeted cleavage of DNA by a specific antibody or protein of interest, followed by the release and sequencing of the DNA fragments. The CUT&amp;RUN method was developed as a more streamlined alternative to the ChIP-seq (Chromatin Immunoprecipitation sequencing) method, which involves a more complex series of steps <span class="citation">Skene and Henikoff (<a href="#ref-skene2018targeted">2018</a>)</span>.</p>
+<div id="how-cutrun-works" class="section level4 hasAnchor" number="15.4.4.1">
+<h4><span class="header-section-number">15.4.4.1</span> How CUT&amp;RUN works:<a href="chromatin-methods-overview.html#how-cutrun-works" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Cells are permeabilized and incubated with a specific antibody or protein of interest. This antibody or protein is fused to a protein called Protein A-Micrococcal Nuclease (pA-MNase).
 After incubation, the pA-MNase is activated and cleaves the DNA in the vicinity of the bound antibody or protein of interest.
 The released DNA fragments are then purified and sequenced to identify the genomic regions that were bound by the antibody or protein of interest.</p>
@@ -674,11 +668,11 @@ <h4><span class="header-section-number">15.4.4.1</span> How CUT&amp;RUN works:</
 </ul>
 </div>
 </div>
-<div id="cuttag" class="section level3" number="15.4.5">
-<h3><span class="header-section-number">15.4.5</span> CUT&amp;Tag</h3>
-<p>CUT&amp;Tag (Cleavage Under Targets and Tagmentation) is similar to CUT&amp;RUN. It was developed as an improvement over CUT&amp;RUN, with the goal of reducing the amount of background noise and improving the efficiency of the method <span class="citation">(<a href="#ref-kaya2019cut" role="doc-biblioref">Kaya-Okur et al. 2019</a>)</span>.</p>
-<div id="how-cuttag-works" class="section level4" number="15.4.5.1">
-<h4><span class="header-section-number">15.4.5.1</span> How CUT&amp;Tag works:</h4>
+<div id="cuttag" class="section level3 hasAnchor" number="15.4.5">
+<h3><span class="header-section-number">15.4.5</span> CUT&amp;Tag<a href="chromatin-methods-overview.html#cuttag" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p>CUT&amp;Tag (Cleavage Under Targets and Tagmentation) is similar to CUT&amp;RUN. It was developed as an improvement over CUT&amp;RUN, with the goal of reducing the amount of background noise and improving the efficiency of the method <span class="citation">(<a href="#ref-kaya2019cut">Kaya-Okur et al. 2019</a>)</span>.</p>
+<div id="how-cuttag-works" class="section level4 hasAnchor" number="15.4.5.1">
+<h4><span class="header-section-number">15.4.5.1</span> How CUT&amp;Tag works:<a href="chromatin-methods-overview.html#how-cuttag-works" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ol style="list-style-type: decimal">
 <li>Cells are permeabilized and incubated with a specific antibody or protein of interest, which is fused to a protein called Protein A-Tn5 transposase.</li>
 <li>The Protein A-Tn5 transposase inserts sequencing adapters into the genomic DNA in the vicinity of the bound antibody or protein of interest.</li>
@@ -686,8 +680,8 @@ <h4><span class="header-section-number">15.4.5.1</span> How CUT&amp;Tag works:</
 </ol>
 <p>Like CUT&amp;RUN, CUT&amp;Tag allows for the specific cleavage of DNA in the vicinity of a target protein or antibody, but the addition of sequencing adapters in CUT&amp;Tag occurs directly in the nucleus, prior to DNA release. This results in less background noise and more efficient DNA recovery.</p>
 </div>
-<div id="advantages-3" class="section level4" number="15.4.5.2">
-<h4><span class="header-section-number">15.4.5.2</span> Advantages:</h4>
+<div id="advantages-3" class="section level4 hasAnchor" number="15.4.5.2">
+<h4><span class="header-section-number">15.4.5.2</span> Advantages:<a href="chromatin-methods-overview.html#advantages-3" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>CUT&amp;Tag has a lower level of background noise and higher sensitivity due to the addition of sequencing adapters in situ.</li>
 <li>CUT&amp;Tag requires less input material than CUT&amp;RUN, which makes it a more efficient method.</li>
@@ -696,19 +690,19 @@ <h4><span class="header-section-number">15.4.5.2</span> Advantages:</h4>
 <p>Overall, both CUT&amp;RUN and CUT&amp;Tag are powerful genomic methods that allow for the efficient study of protein-DNA interactions and epigenetics. The choice between the two methods may depend on the specific research question and the availability of specific reagents or equipment.</p>
 </div>
 </div>
-<div id="gro-seq-global-run-on-sequencing" class="section level3" number="15.4.6">
-<h3><span class="header-section-number">15.4.6</span> GRO-seq (Global Run-On sequencing)</h3>
-<p>Allows for the genome-wide analysis of transcriptional activity by measuring the nascent RNA transcripts that are actively being synthesized by RNA polymerase. GRO-seq is a high-throughput sequencing-based technique that provides a snapshot of the transcriptional landscape of a cell <span class="citation">Park and Won (<a href="#ref-park2018gene" role="doc-biblioref">2018</a>)</span>.</p>
+<div id="gro-seq-global-run-on-sequencing" class="section level3 hasAnchor" number="15.4.6">
+<h3><span class="header-section-number">15.4.6</span> GRO-seq (Global Run-On sequencing)<a href="chromatin-methods-overview.html#gro-seq-global-run-on-sequencing" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p>Allows for the genome-wide analysis of transcriptional activity by measuring the nascent RNA transcripts that are actively being synthesized by RNA polymerase. GRO-seq is a high-throughput sequencing-based technique that provides a snapshot of the transcriptional landscape of a cell <span class="citation">Park and Won (<a href="#ref-park2018gene">2018</a>)</span>.</p>
 </div>
-<div id="how-gro-seq-works" class="section level3" number="15.4.7">
-<h3><span class="header-section-number">15.4.7</span> How GRO-seq works:</h3>
+<div id="how-gro-seq-works" class="section level3 hasAnchor" number="15.4.7">
+<h3><span class="header-section-number">15.4.7</span> How GRO-seq works:<a href="chromatin-methods-overview.html#how-gro-seq-works" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ol style="list-style-type: decimal">
 <li>Nuclei are isolated from cells and incubated with a biotinylated nucleotide triphosphate, which is incorporated into nascent RNA transcripts by RNA polymerase.</li>
 <li>The labeled RNA is then selectively captured using streptavidin beads, and the RNA is reverse-transcribed into cDNA.</li>
 <li>The cDNA is then sequenced to identify the regions of the genome that are actively transcribed.</li>
 </ol>
-<div id="advantages-4" class="section level4" number="15.4.7.1">
-<h4><span class="header-section-number">15.4.7.1</span> Advantages:</h4>
+<div id="advantages-4" class="section level4 hasAnchor" number="15.4.7.1">
+<h4><span class="header-section-number">15.4.7.1</span> Advantages:<a href="chromatin-methods-overview.html#advantages-4" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li>Its ability to distinguish between the sense and antisense strands of transcribed RNA</li>
 <li>Its ability to quantify the level of transcriptional activity in individual genes</li>
@@ -720,7 +714,7 @@ <h4><span class="header-section-number">15.4.7.1</span> Advantages:</h4>
 </div>
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-kaya2019cut" class="csl-entry">
 Kaya-Okur, Hicran S, Suzanna J Wu, Catherine A Codomo, Emily S Pledger, Timothy D Bryson, and Jorja G Henikoff. 2019. <span>“CUT&amp;tag for Efficient Epigenomic Profiling of Small Samples and Single Cells.”</span> <em>Nature Communications</em> 10 (1): 1930.
@@ -733,10 +727,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -806,7 +807,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/considerations-for-choosing-tools.html b/docs/no_toc/considerations-for-choosing-tools.html
index 97512f8c..6de63940 100644
--- a/docs/no_toc/considerations-for-choosing-tools.html
+++ b/docs/no_toc/considerations-for-choosing-tools.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 4 Considerations for choosing tools | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 4 Considerations for choosing tools | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="guidelines-for-good-metadata.html"/>
 <link rel="next" href="general-data-analysis-tools.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 4 Considerations for choosing tools | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,62 +535,62 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="considerations-for-choosing-tools" class="section level1" number="4">
-<h1><span class="header-section-number">Chapter 4</span> Considerations for choosing tools</h1>
-<div id="learning-objectives-2" class="section level2" number="4.1">
-<h2><span class="header-section-number">4.1</span> Learning Objectives</h2>
-<p><img src="resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png" title="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " alt="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " width="100%" /></p>
+<div id="considerations-for-choosing-tools" class="section level1 hasAnchor" number="4">
+<h1><span class="header-section-number">Chapter 4</span> Considerations for choosing tools<a href="considerations-for-choosing-tools.html#considerations-for-choosing-tools" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<div id="learning-objectives-2" class="section level2 hasAnchor" number="4.1">
+<h2><span class="header-section-number">4.1</span> Learning Objectives<a href="considerations-for-choosing-tools.html#learning-objectives-2" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png" alt="This chapter will demonstrate how to: Recognize the key aspects of a tool that you should consider when constructing an analysis. Form questions to ask others for advice regarding your data " width="100%" /></p>
 </div>
-<div id="overview" class="section level2" number="4.2">
-<h2><span class="header-section-number">4.2</span> Overview</h2>
+<div id="overview" class="section level2 hasAnchor" number="4.2">
+<h2><span class="header-section-number">4.2</span> Overview<a href="considerations-for-choosing-tools.html#overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>In this course, we will introduce you to the fundamentals of various data types and give you advice about choosing tutorials and tools whenever possible. However, it is critical to note that there is no “one size fits all” when it comes to genomic data decisions. Instead, our goals are to equip you with the knowledge you need as well as the questions you need to ask yourself (or others) when making decisions about your genomics data.</p>
 <p>We will discuss the following considerations you should gather information and otherwise ponder when comparing one or more tools for your analysis:</p>
-<p><img src="resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png" title="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " alt="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " width="100%" /></p>
-<div id="is-this-tool-appropriate-for-your-data-type" class="section level3" number="4.2.1">
-<h3><span class="header-section-number">4.2.1</span> Is this tool appropriate for your data type?</h3>
+<p><img src="resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png" alt="Considerations for choosing tools: Is it appropriate for your data type? Is in an interface or programming language you feel comfortable with? How much computing power do you have? Are there benchmarking papers that compare the tool options? Is the tool well documented and usable? Is the tool well-maintained? Is the tool generally accepted by the field? " width="100%" /></p>
+<div id="is-this-tool-appropriate-for-your-data-type" class="section level3 hasAnchor" number="4.2.1">
+<h3><span class="header-section-number">4.2.1</span> Is this tool appropriate for your data type?<a href="considerations-for-choosing-tools.html#is-this-tool-appropriate-for-your-data-type" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Certain tools are built for certain kinds of data. In each data-type-specific chapter we will attempt to point you tools that are appropriate for the given data type. However, note that some tools also might require tweaks in parameters for non-standard data collection methods. If you were not sure of the data collection methods used for your data type, be sure to follow the data type specific advice in the chapter to find out the information about your data that you need to know to make an informed decision.</p>
 </div>
-<div id="is-this-tool-appropriate-for-your-scientific-question" class="section level3" number="4.2.2">
-<h3><span class="header-section-number">4.2.2</span> Is this tool appropriate for your scientific question?</h3>
+<div id="is-this-tool-appropriate-for-your-scientific-question" class="section level3 hasAnchor" number="4.2.2">
+<h3><span class="header-section-number">4.2.2</span> Is this tool appropriate for your scientific question?<a href="considerations-for-choosing-tools.html#is-this-tool-appropriate-for-your-scientific-question" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Some tools may be appropriate for the general data type, but might mask information you will need to answer your particular scientific question or hypothesis. For example, for RNA-seq if you are interested in splice variants, you may not be able to use certain alignment tools that do not differentiate between splice variants.</p>
 <p>Be sure to make your goals and scientific questions clear when asking for advice or guidance. Some tools may be applicable to certain scientific questions, but other accommodations or preprocessing may need to be done</p>
 </div>
-<div id="is-this-tool-in-an-interface-or-programming-language-you-feel-comfortable-with" class="section level3" number="4.2.3">
-<h3><span class="header-section-number">4.2.3</span> Is this tool in an interface or programming language you feel comfortable with?</h3>
+<div id="is-this-tool-in-an-interface-or-programming-language-you-feel-comfortable-with" class="section level3 hasAnchor" number="4.2.3">
+<h3><span class="header-section-number">4.2.3</span> Is this tool in an interface or programming language you feel comfortable with?<a href="considerations-for-choosing-tools.html#is-this-tool-in-an-interface-or-programming-language-you-feel-comfortable-with" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Genomics and informatics tools can be classified into two groups based on how you interact with them. These groups are 1) command line or 2) graphics user interface (GUI). GUIs are tools that you can use by clicking and pointing with your mouse whereas command line tools require input through writing out commands.</p>
 <p>Command line tools often lend to greater reproducibility of an analysis since a script can have all the steps needed to re-run analysis. This makes it so you could re-run and reproduce your results with one command instead of lots of clicking various buttons in particular order as you would need to do with a GUI based tool.</p>
 <p>Your level of comfort or willingness/time available to learn a programming language like R or Python will influence what tool options you have. If you are unfamiliar and uncomfortable writing in R, Python, or Bash scripting, this will influence what tools you have available to you or whether you will need to enlist more outside help.</p>
 <p>If you are interested in learning to use command line, we have many resources and recommendations for you to use for learning in this next chapter. However, if you do not have the bandwidth or motivation to learn how to code, you will want to gravitate toward tools that have GUIs.</p>
 </div>
-<div id="how-much-computing-power-do-you-have" class="section level3" number="4.2.4">
-<h3><span class="header-section-number">4.2.4</span> How much computing power do you have?</h3>
+<div id="how-much-computing-power-do-you-have" class="section level3 hasAnchor" number="4.2.4">
+<h3><span class="header-section-number">4.2.4</span> How much computing power do you have?<a href="considerations-for-choosing-tools.html#how-much-computing-power-do-you-have" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Some tools require a lot more computing resources (or runtime) than others. Many institutions have cloud computing resources or high powered computing clusters for your use. <a href="https://jhudatascience.org/Computing_for_Cancer_Informatics/computing-resources.html">We’ll recommend you to our Computing Course for more information about this</a>.</p>
 <p>But your computing budget access, and time allotment, may influence what tools you would like to use for a project. For example, for RNA seq data alignment, traditional aligners that use the genome take an order of magnitude greater amount of time to run than quantifying transcripts with pseudo alignment based tools. For many applications pseudoaligners are perfectly appropriate and efficient choices that can be run on a laptop. But if you prefer a traditional aligner because you are interested in something that is not detected by pseudosligners such as splice variants, then you may want to look into using some computing resources for this task. All these decisions need to be weighed in balance with each other.</p>
 </div>
-<div id="are-there-benchmarking-papers-that-compare-this-tool-to-other-options" class="section level3" number="4.2.5">
-<h3><span class="header-section-number">4.2.5</span> Are there benchmarking papers that compare this tool to other options?</h3>
+<div id="are-there-benchmarking-papers-that-compare-this-tool-to-other-options" class="section level3 hasAnchor" number="4.2.5">
+<h3><span class="header-section-number">4.2.5</span> Are there benchmarking papers that compare this tool to other options?<a href="considerations-for-choosing-tools.html#are-there-benchmarking-papers-that-compare-this-tool-to-other-options" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Some tools and their algorithms have been more thoroughly examined and tested than others. And this doesn’t always align to a tool’s popularity. Seek out the literature and what studies have been done comparing this tool to others like it. Keep in mind the tool developer’s own bias if the paper is coming directly from the group or individual who is the creator of the tool. Developers will be more likely to understand and know how to tweak parameters of their own tool properly, while not necessarily spending as much time testing and adjusting tools made by others. This concept has sometimes been called the <a href="https://www.biorxiv.org/content/10.1101/385534v4">“Continental Breakfast Included”</a> concept.</p>
 </div>
-<div id="is-the-tool-well-documented-and-usable" class="section level3" number="4.2.6">
-<h3><span class="header-section-number">4.2.6</span> Is the tool well documented and usable?</h3>
+<div id="is-the-tool-well-documented-and-usable" class="section level3 hasAnchor" number="4.2.6">
+<h3><span class="header-section-number">4.2.6</span> Is the tool well documented and usable?<a href="considerations-for-choosing-tools.html#is-the-tool-well-documented-and-usable" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Well documented and usable tools can be very powerful. Poorly documented tools which may lead to unknown parameters or other mishandling of the data if it has not been made clear by the tool developers and maintainers. Good understanding of what a tool is doing with the data you give it is perhaps more important than using fancy algorithms that are unclear. Not only does documentation and usability increase your ability to use a tool, but your analysis will be more reproducible if others can also understand the tools that you used.</p>
 <p>The existence of forums and user groups for particular tools, not only makes it a useful resource for you for analysis, troubleshooting and interpretation of your results, but it also indicates a particular drive for the tool to continue to be maintained and developed overtime.</p>
 </div>
-<div id="is-the-tool-well-maintained" class="section level3" number="4.2.7">
-<h3><span class="header-section-number">4.2.7</span> Is the tool well maintained?</h3>
+<div id="is-the-tool-well-maintained" class="section level3 hasAnchor" number="4.2.7">
+<h3><span class="header-section-number">4.2.7</span> Is the tool well maintained?<a href="considerations-for-choosing-tools.html#is-the-tool-well-maintained" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>If a tool is actively being maintained this will aid in the reproducibility of your results. Tools on GitHub (an open-source platform for software) or other repositories often indicate when latest updates to a tool were made. Ideally updates are being made regularly to the tool, but a lack of updates does not speak well for the future existence of the tool. A tool that is not well maintained or supported may deprecate and make it increasingly difficult if not possible to reproduce, re-run or further develop your analysis.</p>
 </div>
-<div id="is-the-tool-generally-accepted-by-the-field" class="section level3" number="4.2.8">
-<h3><span class="header-section-number">4.2.8</span> Is the tool generally accepted by the field?</h3>
+<div id="is-the-tool-generally-accepted-by-the-field" class="section level3 hasAnchor" number="4.2.8">
+<h3><span class="header-section-number">4.2.8</span> Is the tool generally accepted by the field?<a href="considerations-for-choosing-tools.html#is-the-tool-generally-accepted-by-the-field" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>While tool popularity should not be the only consideration when choosing a tool, it is an aspect that can influence communication or acceptance of your results. All things being equal, it can be better to choose a tool that is more accepted by the community as tried and true, and well benchmarked as opposed to the bleeding edge technology that may have not been truly scrutinized yet. In an analysis it is perhaps more valuable to know and weigh the known limitations of an older tool than to use a newer tool whose limitations may not have been identified yet (but it certainly will have its own limitations identified in time).</p>
 </div>
 </div>
-<div id="coming-to-a-decision" class="section level2" number="4.3">
-<h2><span class="header-section-number">4.3</span> Coming to a decision</h2>
+<div id="coming-to-a-decision" class="section level2 hasAnchor" number="4.3">
+<h2><span class="header-section-number">4.3</span> Coming to a decision<a href="considerations-for-choosing-tools.html#coming-to-a-decision" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>It’s important to note that the questions we will discuss here need to be considered in balance of one another. Rarely should you make a decision about a tool without considering all of these items congruently. For example, some tools may have better benchmarking but if it is more computationally costly and you do not have access to the necessary computing resources to run the tool, then you may need to consider other options.</p>
 </div>
-<div id="more-resources" class="section level2" number="4.4">
-<h2><span class="header-section-number">4.4</span> More resources</h2>
+<div id="more-resources" class="section level2 hasAnchor" number="4.4">
+<h2><span class="header-section-number">4.4</span> More resources<a href="considerations-for-choosing-tools.html#more-resources" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><p><a href="https://hutchdatascience.org/code_review/more_resources.html">A longer list of tools and resources can be found here</a></p></li>
 <li><p><a href="https://datatrail-jhu.github.io/DataTrail/index.html">DataTrail curriculum</a></p></li>
@@ -608,10 +602,17 @@ <h2><span class="header-section-number">4.4</span> More resources</h2>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -681,7 +682,7 @@ <h2><span class="header-section-number">4.4</span> More resources</h2>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/cutrun-and-cuttag.html b/docs/no_toc/cutrun-and-cuttag.html
index 73070508..05d6ab87 100644
--- a/docs/no_toc/cutrun-and-cuttag.html
+++ b/docs/no_toc/cutrun-and-cuttag.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 19 CUT&amp;RUN and CUT&amp;Tag | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 19 CUT&amp;RUN and CUT&amp;Tag | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="chip-seq-1.html"/>
 <link rel="next" href="dna-methylation-sequencing.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 19 CUT&amp;RUN and CUT&amp;Tag | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,21 +535,21 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="cutrun-and-cuttag" class="section level1" number="19">
-<h1><span class="header-section-number">Chapter 19</span> CUT&amp;RUN and CUT&amp;Tag</h1>
+<div id="cutrun-and-cuttag" class="section level1 hasAnchor" number="19">
+<h1><span class="header-section-number">Chapter 19</span> CUT&amp;RUN and CUT&amp;Tag<a href="cutrun-and-cuttag.html#cutrun-and-cuttag" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-17" class="section level2" number="19.1">
-<h2><span class="header-section-number">19.1</span> Learning Objectives</h2>
+<div id="learning-objectives-17" class="section level2 hasAnchor" number="19.1">
+<h2><span class="header-section-number">19.1</span> Learning Objectives<a href="cutrun-and-cuttag.html#learning-objectives-17" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a3974d9533_0_10.png" width="100%" /></p>
 </div>
-<div id="technologies" class="section level2" number="19.2">
-<h2><span class="header-section-number">19.2</span> Technologies</h2>
+<div id="technologies" class="section level2 hasAnchor" number="19.2">
+<h2><span class="header-section-number">19.2</span> Technologies<a href="cutrun-and-cuttag.html#technologies" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2aeb8138211_0_0.png" width="100%" /></p>
 </div>
-<div id="advantages-of-cutrun-and-cuttag-over-the-traditional-chip-seq-technology" class="section level2" number="19.3">
-<h2><span class="header-section-number">19.3</span> Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology</h2>
+<div id="advantages-of-cutrun-and-cuttag-over-the-traditional-chip-seq-technology" class="section level2 hasAnchor" number="19.3">
+<h2><span class="header-section-number">19.3</span> Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology<a href="cutrun-and-cuttag.html#advantages-of-cutrun-and-cuttag-over-the-traditional-chip-seq-technology" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><strong>Lower Cell Number and Less Starting Material Requirement</strong>: CUT&amp;RUN and CUT&amp;Tag can be performed with much lower cell number than ChIP-seq. This is particularly beneficial when working with rare cell types or limited biological samples. The CUT&amp;RUN and CUT&amp;Tag techniques involve less sample manipulation compared to ChIP-seq. This minimizes the risk of losing material and potential artifacts from extensive sample handling and processing.</li>
 </ul>
@@ -572,8 +566,8 @@ <h2><span class="header-section-number">19.3</span> Advantages of CUT&amp;RUN an
 <li><p><strong>Cost-Effectiveness</strong>: In addition to high efficiency in sequencing the target region, due to the lower requirement for reagents and enzymes, CUT&amp;RUN and CUT&amp;Tag can be more cost-effective, especially in high-throughput settings.</p></li>
 <li><p><strong>More Efficient Protocol Workflow and Faster Turnaround Time</strong>: The protocol for CUT&amp;RUN and CUT&amp;Tag is more streamlined and less labor-intensive than ChIP-seq. It eliminates the need for sonication, DNA purification, and ligation steps, simplifying the procedure. The overall protocols of CUT&amp;RUN and CUT&amp;Tag are generally quicker and more straightforward than ChIP-seq, leading to faster experiment turnaround times.</p></li>
 </ul>
-<div id="cutrun-1" class="section level3" number="19.3.1">
-<h3><span class="header-section-number">19.3.1</span> CUT&amp;RUN</h3>
+<div id="cutrun-1" class="section level3 hasAnchor" number="19.3.1">
+<h3><span class="header-section-number">19.3.1</span> CUT&amp;RUN<a href="cutrun-and-cuttag.html#cutrun-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><strong>Cleavage Under Targets and Release Using Nuclease</strong>, <strong>CUT&amp;RUN</strong> for short, is an antibody-targeted chromatin profiling method to measure the histone modification enrichment or transcription factor binding. This is a more advanced technology for epigenomic landscape profiling compared to the traditional ChIP-seq technology and known for its easy implementation and low cost. The procedure is carried out in situ where micrococcal nuclease tethered to protein A binds to an antibody of choice and cuts immediately adjacent DNA, releasing DNA-bound to the antibody target. Therefore, CUT&amp;RUN produces precise transcription factor or histone modification profiles while avoiding crosslinking and solubilization issues. Extremely low backgrounds make profiling possible with typically one-tenth of the sequencing depth required for ChIP-seq and permit profiling using low cell numbers (i.e., a few hundred cells) without losing quality.</p>
 <!-- [Henikoff lab](https://research.fredhutch.org/henikoff/en.html) constructed a 6xHis and HA-tagged protein A-protein G-MNase fusion (pAG-MNase) that allows direct binding of mouse antibodies that bind poorly to protein A, eliminating the need for a secondary antibody. The His tag allows the purification of pAG-MNase with a commercial kit, while the HA tag can be used for pulling out pAG-MNase chromatin complexes for CUT&RUN.ChIP. Henikoff lab developed low salt and high calcium conditions that prevent diffusion of released complexes into the supernatant, allowing for longer digestion times and increased yields without increased cleavage at non-specific accessible sites. E. coli DNA carried over from pA-MNase or pAG-MNase preparation is sufficient for internal calibration of samples without adding heterologous spike-in DNA. -->
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_0.png" width="100%" /></p>
@@ -592,8 +586,8 @@ <h3><span class="header-section-number">19.3.1</span> CUT&amp;RUN</h3>
 <li><a href="https://www.protocols.io/view/cut-amp-run-targeted-in-situ-genome-wide-profiling-14egnr4ql5dy/v3">CUT&amp;RUN: Targeted in situ genome-wide profiling with high efficiency for low cell numbers (Version 3)</a></li>
 <li><a href="https://www.protocols.io/view/cut-run-with-drosophila-tissues-14egnx28pl5d/v1">CUT&amp;RUN with Drosophila tissues (Version 1)</a></li>
 </ol>
-<div id="autocutrun" class="section level4" number="19.3.1.1">
-<h4><span class="header-section-number">19.3.1.1</span> AutoCUT&amp;RUN</h4>
+<div id="autocutrun" class="section level4 hasAnchor" number="19.3.1.1">
+<h4><span class="header-section-number">19.3.1.1</span> AutoCUT&amp;RUN<a href="cutrun-and-cuttag.html#autocutrun" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>CUT&amp;RUN has been automated using a Beckman Biomek FX liquid-handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. DNA end polishing and direct ligation of adapters permit sample-to-Illumina library processing of 96 samples in two days. AutoCUT&amp;RUN can be used for cell-type specific gene activity and enhancer profiling based on histone modifications and transcription factors, including in frozen tissue samples of tumor xenografts.</p>
 <ul>
 <li>Publication:</li>
@@ -609,8 +603,8 @@ <h4><span class="header-section-number">19.3.1.1</span> AutoCUT&amp;RUN</h4>
 </ol>
 </div>
 </div>
-<div id="cuttag-1" class="section level3" number="19.3.2">
-<h3><span class="header-section-number">19.3.2</span> CUT&amp;Tag</h3>
+<div id="cuttag-1" class="section level3 hasAnchor" number="19.3.2">
+<h3><span class="header-section-number">19.3.2</span> CUT&amp;Tag<a href="cutrun-and-cuttag.html#cuttag-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><strong>Cleavage Under Targets and Tagmentation</strong>, <strong>CUT&amp;Tag</strong> for short, is an enzyme tethering approach to profiling chromatin proteins, including histone marks and RNA Pol II. CUT&amp;Tag generates sequence-ready libraries without the need for end polishing and adapter ligation. It uses a proteinA-Tn5 fusion to tether Tn5 transposase near the site of an antibody to a chromatin protein of interest. A secondary antibody, such as guinea pig anti-rabbit antibody, is used to increase the efficiency of tethering the pA-Tn5 to the target primary antibody. The pA-Tn5 complex is pre-loaded with sequencing adapters that insert into adjacent DNA upon activation with magnesium. CUT&amp;Tag has a very low background and can be performed in a single tube in as little as a day, though primary antibodies are typically incubated overnight. It can also be used with the ICELL8 nano dispensation system to profile single cells.</p>
 <p>A streamlined CUT&amp;Tag protocol was introduced by the <a href="https://research.fredhutch.org/henikoff/en.html">Henikoff Lab</a> that suppresses DNA accessibility artifacts to ensure high-fidelity mapping of the antibody-targeted protein and improves the signal-to-noise ratio over current chromatin profiling methods. Streamlined CUT&amp;Tag can be performed in a single PCR tube, from cells to amplified libraries, providing low-cost genome-wide chromatin maps. By simplifying library preparation, CUT&amp;Tag-direct requires less than a day at the bench, from live cells to sequencing-ready barcoded libraries. As a result of low background levels, barcoded and pooled CUT&amp;Tag libraries can be sequenced for as little as $25 per sample. This enables routine genome-wide profiling of chromatin proteins and modifications and requires no special skills or equipment.</p>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_7.png" width="100%" /></p>
@@ -630,8 +624,8 @@ <h3><span class="header-section-number">19.3.2</span> CUT&amp;Tag</h3>
 <li><a href="https://www.protocols.io/view/3xflag-patn5-protein-purification-and-meds-loading-j8nlke4e5l5r/v1">3XFlag-pATn5 Protein Purification and MEDS-loading (5x scale, 2L volume, Version 1)</a></li>
 <li><a href="https://www.protocols.io/view/cut-tag-with-drosophila-tissues-3byl4kkprvo5/v1">CUT&amp;Tag with Drosophila tissues (Version 1)</a></li>
 </ol>
-<div id="autocuttag" class="section level4" number="19.3.2.1">
-<h4><span class="header-section-number">19.3.2.1</span> AutoCUT&amp;Tag</h4>
+<div id="autocuttag" class="section level4 hasAnchor" number="19.3.2.1">
+<h4><span class="header-section-number">19.3.2.1</span> AutoCUT&amp;Tag<a href="cutrun-and-cuttag.html#autocuttag" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>CUT&amp;Tag has been automated using a Beckman Coulter Biomek FX liquid handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. AutoCUT&amp;Tag can be used to profile the gene targets of fusions of the KMT2A lysine methyltransferase to other chromatin proteins, which characterize lymphoid, myeloid, and mixed lineage leukemias, uncovering heterogeneities that may underlie lineage plasticity.</p>
 <ul>
 <li>Publication:</li>
@@ -648,8 +642,8 @@ <h4><span class="header-section-number">19.3.2.1</span> AutoCUT&amp;Tag</h4>
 <li><a href="https://www.protocols.io/view/autocut-amp-tag-streamlined-genome-wide-profiling-14egn819qg5d/v1">AutoCUT&amp;Tag: streamlined genome-wide profiling of chromatin proteins on a liquid handling robot (Version 1)</a></li>
 </ol>
 </div>
-<div id="cutac" class="section level4" number="19.3.2.2">
-<h4><span class="header-section-number">19.3.2.2</span> CUTAC</h4>
+<div id="cutac" class="section level4 hasAnchor" number="19.3.2.2">
+<h4><span class="header-section-number">19.3.2.2</span> CUTAC<a href="cutrun-and-cuttag.html#cutac" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Cleavage Under Targeted Accessible Chromatin, CUTAC, for short, is a simple modification of the Tn5 transposase-mediated antibody-directed CUT&amp;Tag method that provides high-quality accessibility mapping in parallel with mapping of specific components of the chromatin landscape. Findings imply that regulatory sites detected by hyperaccessibility mapping are coupled to the initiation of RNA Polymerase II transcription via H3K4 methylation. CUTAC requires few resources and is sufficiently simple that it can be performed from nuclei to purified sequencing-ready libraries in single PCR tubes on a home workbench.</p>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_14.png" width="100%" /></p>
 <ul>
@@ -667,16 +661,16 @@ <h4><span class="header-section-number">19.3.2.2</span> CUTAC</h4>
 </div>
 </div>
 </div>
-<div id="differences-between-cutrun-and-cuttag" class="section level2" number="19.4">
-<h2><span class="header-section-number">19.4</span> Differences between CUT&amp;RUN and CUT&amp;Tag</h2>
+<div id="differences-between-cutrun-and-cuttag" class="section level2 hasAnchor" number="19.4">
+<h2><span class="header-section-number">19.4</span> Differences between CUT&amp;RUN and CUT&amp;Tag<a href="cutrun-and-cuttag.html#differences-between-cutrun-and-cuttag" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><p>CUT&amp;RUN is more suitable than CUT&amp;Tag for transcription factor (TF) profiling because the salt will compete with TF binding to DNA during the high salt incubation. TF depending on the motif affinity, only binds to a few DNA basepairs, and TF binding can be weak and compelled by salt. As demonstrated by <a href="https://www.nature.com/articles/s41467-019-09982-5">Kaya-Okur et al. 2019</a>, the CUT&amp;Tag signal of CTCF, one of the strongest binding factors, can be observed but become relatively weak. Therefore, it can be challenging for the peak caller to detect the enrichment of CTCF profiled by CUT&amp;Tag. Hence, it can also be hard to find the motif pattern practically.</p></li>
 <li><p>CUT&amp;Tag is more suitable for histone modification and RNA polymerase profiling as DNA wraps around the histone and RNA polymerase structure inserts and grabs the DNA. The DNA binding from both histone modification marks and PolII is strong. CUT&amp;Tag for histone modification also showed moderately higher signals compared to CUT&amp;RUN throughout the list of sites in <a href="https://www.nature.com/articles/s41467-019-09982-5">Kaya-Okur et al. 2019</a>.</p></li>
 <li><p>CUT&amp;RUN must be followed by DNA end polishing and adapter ligation to prepare sequencing libraries, which increases the time, cost, and effort of the overall procedure. Moreover, the release of MNase-cleaved fragments into the supernatant with CUT&amp;RUN is not well-suited for application to single-cell platforms.</p></li>
 </ul>
 </div>
-<div id="limitation-of-cutrun-and-cuttag" class="section level2" number="19.5">
-<h2><span class="header-section-number">19.5</span> Limitation of CUT&amp;RUN and CUT&amp;Tag</h2>
+<div id="limitation-of-cutrun-and-cuttag" class="section level2 hasAnchor" number="19.5">
+<h2><span class="header-section-number">19.5</span> Limitation of CUT&amp;RUN and CUT&amp;Tag<a href="cutrun-and-cuttag.html#limitation-of-cutrun-and-cuttag" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><p>Dependency on Antibody Quality: Similar to ChIP-seq, CUT&amp;RUN and CUT&amp;Tag’s success heavily relies on the quality and specificity of the antibodies used. High-quality, highly specific antibodies are essential for reliable results, and the lack of such antibodies can limit the application of this technique.</p></li>
 <li><p>Likelihood of Over-digestion of DNA: Due to inappropriate timing of the Magnesium-dependent Tn5 reaction with CUT&amp;RUN, DNA can be over-cut, a similar limitation exists for contemporary ChIP-Seq protocols where enzymatic or sonicated DNA shearing must be optimized.</p></li>
@@ -685,12 +679,12 @@ <h2><span class="header-section-number">19.5</span> Limitation of CUT&amp;RUN an
 <li><p>Challenges in Detecting Low Abundance TFs: While CUT&amp;RUN and CUT&amp;Tag are more sensitive than ChIP-seq, they can still face challenges in detecting TFs present in very low abundance in the cell.</p></li>
 </ul>
 </div>
-<div id="general-data-analysis-workflow" class="section level2" number="19.6">
-<h2><span class="header-section-number">19.6</span> General Data Analysis Workflow</h2>
+<div id="general-data-analysis-workflow" class="section level2 hasAnchor" number="19.6">
+<h2><span class="header-section-number">19.6</span> General Data Analysis Workflow<a href="cutrun-and-cuttag.html#general-data-analysis-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>CUT&amp;RUN and CUT&amp;Tag data analysis share a very similar strategy. Data analysis generally involves raw sequencing data alignment, quality control, normalization, peak calling, visualization, differential analysis, and other specific analyses for target scientific discoveries. A detailed data processing and analysis tutorial with reproducible codes and demo data can be found at <a href="https://www.protocols.io/view/cut-amp-tag-data-processing-and-analysis-tutorial-e6nvw93x7gmk/v1">CUT&amp;Tag Data Processing and Analysis Tutorial</a>,</p>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_22.png" width="100%" /></p>
-<div id="adapter-trimming" class="section level3" number="19.6.1">
-<h3><span class="header-section-number">19.6.1</span> Adapter Trimming</h3>
+<div id="adapter-trimming" class="section level3 hasAnchor" number="19.6.1">
+<h3><span class="header-section-number">19.6.1</span> Adapter Trimming<a href="cutrun-and-cuttag.html#adapter-trimming" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>If the read length is long, adapter trimming may be needed for more accurate alignment results. However, for CUT&amp;RUN and CUT&amp;Tag, if the read length is short (i.e., 25bp per end), the aligner can use a “soft-match” style algorithm to handle the remaining adapter at the end of the read. Therefore, the adapter trimming is not necessary in that scenario.</p>
 <ul>
 <li><a href="https://cutadapt.readthedocs.io/en/stable/">Cutadapt</a>: Cutadapt finds and removes adapter sequences, primers, poly-A tails, and other types of unwanted sequences from your high-throughput sequencing reads. It can remove a wide range of adapter sequences and is not limited to Illumina-specific adapters. Users can specify multiple adapter sequences. Cutadapt supports quality trimming, though with less granularity than Trimmomatic. It can be used for both paired-end and single-end reads and allows for filtering based on length after trimming.</li>
@@ -701,8 +695,8 @@ <h3><span class="header-section-number">19.6.1</span> Adapter Trimming</h3>
 <li><a href="http://www.usadellab.org/cms/?page=trimmomatic">Trimmomatic</a>: A flexible trimmer for Illumina Sequence Data. It trims low-quality bases from the start and end of the reads and scans the read with a sliding window to trim based on average quality. Trimmomatic can also remove Illumina-specific adapters with an option to specify custom adapter sequences. It is known for its high precision and flexibility. It can handle paired-end and single-end data.</li>
 </ul>
 </div>
-<div id="alignment-1" class="section level3" number="19.6.2">
-<h3><span class="header-section-number">19.6.2</span> Alignment</h3>
+<div id="alignment-1" class="section level3 hasAnchor" number="19.6.2">
+<h3><span class="header-section-number">19.6.2</span> Alignment<a href="cutrun-and-cuttag.html#alignment-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml">Bowtie2</a>: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100 characters to relatively large (e.g., mammalian) genomes. When aligning paired-end reads to the reference genome, filter and keep read pairs whose fragment lengths are between 10bp and 1000bp. Detailed recommended parameters can be found in the [tutorial].</li>
 </ul>
@@ -712,8 +706,8 @@ <h3><span class="header-section-number">19.6.2</span> Alignment</h3>
 <li><a href="https://bio-bwa.sourceforge.net/bwa.shtml">BWA</a>: BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.</li>
 </ul>
 </div>
-<div id="quality-control" class="section level3" number="19.6.3">
-<h3><span class="header-section-number">19.6.3</span> Quality control</h3>
+<div id="quality-control" class="section level3 hasAnchor" number="19.6.3">
+<h3><span class="header-section-number">19.6.3</span> Quality control<a href="cutrun-and-cuttag.html#quality-control" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>The quality of the aligned data can be evaluated from the following aspects:</p>
 <ul>
 <li><p><strong>Sequencing depth</strong>: Check the number of reads mapped to the genome to see if it matches the expected sequencing depth. CUT&amp;RUN/CUT&amp;Tag data typically has very low backgrounds, so as few as 1 million mapped fragments can give robust profiles for a histone modification in the human genome.</p></li>
@@ -725,51 +719,51 @@ <h3><span class="header-section-number">19.6.3</span> Quality control</h3>
 </ul>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_0.png" width="100%" /></p>
 </div>
-<div id="normalization" class="section level3" number="19.6.4">
-<h3><span class="header-section-number">19.6.4</span> Normalization</h3>
-<div id="spike-in-scaling" class="section level4" number="19.6.4.1">
-<h4><span class="header-section-number">19.6.4.1</span> Spike-in Scaling</h4>
+<div id="normalization" class="section level3 hasAnchor" number="19.6.4">
+<h3><span class="header-section-number">19.6.4</span> Normalization<a href="cutrun-and-cuttag.html#normalization" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="spike-in-scaling" class="section level4 hasAnchor" number="19.6.4.1">
+<h4><span class="header-section-number">19.6.4.1</span> Spike-in Scaling<a href="cutrun-and-cuttag.html#spike-in-scaling" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>E. coli DNA is carried along with bacterially-produced pA-Tn5 protein and gets tagmented non-specifically during the reaction. The fraction of total reads that map to the E.coli genome depends on the yield of epitope-targeted CUT&amp;Tag and roso depends on the number of cells used and the abundance of that epitope in chromatin. Since a constant amount of pATn5 is added to CUT&amp;Tag reactions and brings along a fixed amount of E. coli DNA, E. coli reads can be used to normalize epitope abundance across experiments.</p>
 <p>The underlying assumption is that the ratio of fragments mapped to the primary genome to the E. coli genome (or other added DNA sequences if pA-Tn5 is purified and E.coli is not available anymore) is the same for a series of samples, each using the same number of cells. Because of this assumption, we do not normalize between experiments or batches of pATn5, which can have very different amounts of carry-over E. coli DNA. Using a constant C to avoid small fractions in normalized data, we define a scaling factor S as</p>
 <p><span class="math inline">\(S = \frac{C}{(Fragments Mapped To E.coli Genome)}\)</span></p>
 <p><span class="math inline">\(Normalized coverage = (Primary Genome Coverage) * S\)</span></p>
 <p>The scaling can be done using <a href="https://bedtools.readthedocs.io/en/latest/">bedtools</a>, genomecov function and parameter “-scale”.</p>
 </div>
-<div id="sequencing-depth-and-coverage-normalization" class="section level4" number="19.6.4.2">
-<h4><span class="header-section-number">19.6.4.2</span> Sequencing depth and coverage normalization</h4>
+<div id="sequencing-depth-and-coverage-normalization" class="section level4 hasAnchor" number="19.6.4.2">
+<h4><span class="header-section-number">19.6.4.2</span> Sequencing depth and coverage normalization<a href="cutrun-and-cuttag.html#sequencing-depth-and-coverage-normalization" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Without a spike-in, normalization to eliminate the sequencing depth and coverage variations can be done by the following formula:</p>
 <p>Normalized Count = <span class="math inline">\(\frac{Raw Count}{Sum of Fragments Coverage} * Genome_Size\)</span></p>
 <p>Sum of Fragments Coverage = sum of all fragment lengths. Namely, Sum_of_Fragments_Coverage includes both the sequencing depth and coverage information. Note that only fragments that are within 1bp~1000bp are considered.</p>
 </div>
 </div>
-<div id="peak-calling-2" class="section level3" number="19.6.5">
-<h3><span class="header-section-number">19.6.5</span> Peak Calling</h3>
-<div id="seacr" class="section level4" number="19.6.5.1">
-<h4><span class="header-section-number">19.6.5.1</span> <a href="https://github.com/FredHutch/SEACR">SEACR</a></h4>
+<div id="peak-calling-2" class="section level3 hasAnchor" number="19.6.5">
+<h3><span class="header-section-number">19.6.5</span> Peak Calling<a href="cutrun-and-cuttag.html#peak-calling-2" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="seacr" class="section level4 hasAnchor" number="19.6.5.1">
+<h4><span class="header-section-number">19.6.5.1</span> <a href="https://github.com/FredHutch/SEACR">SEACR</a><a href="cutrun-and-cuttag.html#seacr" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>The <a href="https://epigeneticsandchromatin.biomedcentral.com/articles/10.1186/s13072-019-0287-4">Sparse Enrichment Analysis for CUT&amp;RUN</a>, SEACR for short, is a R package designed to call peaks and enriched regions from chromatin profiling data with very low backgrounds (i.e., regions with no read coverage) that are typical for CUT&amp;Tag chromatin profiling experiments. SEACR requires bedGraph files from paired-end sequencing as input and defines peaks as contiguous blocks of basepair coverage that do not overlap with blocks of background signal delineated in the IgG control dataset. If IgG control is available, use the IgG sample as the “control sample” and choose the “norm stringent” setting. If IgG is unavailable, users can use the “top *% peaks” by only providing the target marker sample.</p>
 <p>Web server:</p>
 <p><a href="https://seacr.fredhutch.org/">Peak calling by Sparse Enrichment Analysis for CUT&amp;RUN (SEACR) Web Interface</a></p>
 </div>
-<div id="macs2" class="section level4" number="19.6.5.2">
-<h4><span class="header-section-number">19.6.5.2</span> <a href="https://pypi.org/project/MACS2/">MACS2</a></h4>
+<div id="macs2" class="section level4 hasAnchor" number="19.6.5.2">
+<h4><span class="header-section-number">19.6.5.2</span> <a href="https://pypi.org/project/MACS2/">MACS2</a><a href="cutrun-and-cuttag.html#macs2" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>The <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-9-r137">Model-based Analysis of ChIP-Seq</a> version 2, MACS2 for short, is widely used for identifying transcription factor binding sites and histone modification regions in ChIP-Seq data. MACS2 has been widely adapted to analyze the CUT&amp;RUN/CUT&amp;Tag data. Installation details can be found at <a href="https://github.com/taoliu/MACS/wiki" class="uri">https://github.com/taoliu/MACS/wiki</a>.</p>
 </div>
-<div id="seacr-vs-macs2" class="section level4" number="19.6.5.3">
-<h4><span class="header-section-number">19.6.5.3</span> SEACR vs MACS2</h4>
+<div id="seacr-vs-macs2" class="section level4 hasAnchor" number="19.6.5.3">
+<h4><span class="header-section-number">19.6.5.3</span> SEACR vs MACS2<a href="cutrun-and-cuttag.html#seacr-vs-macs2" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><p>SEACR is better suited for datasets with broad signal enrichment, such as H3K27me3, where peaks are broader and can continuously cover a large genomic region. MACS2 excels in datasets with sharp peaks, such as H3K4me3, where peaks are concentrated and isolated from the background and adjacent peaks.</p></li>
 <li><p>SEACR uses a straightforward thresholding approach, which can be more intuitive but may miss some nuances in the data. MACS2 uses a more complex statistical model to identify peaks, offering potentially greater accuracy but at the cost of computational complexity.</p></li>
 <li><p>SEACR offers more flexibility in handling different types of CUT&amp;RUN/CUT&amp;Tag data, especially in the absence of control samples or the control samples are of low quality. MACS2 generally requires high-quality control samples for best performance and is less flexible in this regard.</p></li>
 </ul>
 </div>
-<div id="fragment-proportion-in-peaks-regions-frips" class="section level4" number="19.6.5.4">
-<h4><span class="header-section-number">19.6.5.4</span> FRagment proportion in Peaks regions (FRiPs)</h4>
+<div id="fragment-proportion-in-peaks-regions-frips" class="section level4 hasAnchor" number="19.6.5.4">
+<h4><span class="header-section-number">19.6.5.4</span> FRagment proportion in Peaks regions (FRiPs)<a href="cutrun-and-cuttag.html#fragment-proportion-in-peaks-regions-frips" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Fragment proportion in Peak Regions, FRiPs for short, is also a critical signal-to-noise measurement. Although sequencing depths for CUT&amp;Tag are typically only 1-5 million reads, the low background of the method usually results in high FRiP scores. In other words, it measures the percentage of sequencing resources accurately allocated to the target epitope regions. Note that the number of peaks and FRiPs typically increase with the sequencing depth and mappable fragment number, therefore comparisons should be done by downsampling samples to the same number of fragment. For example, the comparison across technologies in <a href="https://elifesciences.org/articles/63274">Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation</a> Figure 5A:</p>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a7dbc78b0e_0_0.png" width="100%" /></p>
 </div>
 </div>
-<div id="visualization-1" class="section level3" number="19.6.6">
-<h3><span class="header-section-number">19.6.6</span> Visualization</h3>
+<div id="visualization-1" class="section level3 hasAnchor" number="19.6.6">
+<h3><span class="header-section-number">19.6.6</span> Visualization<a href="cutrun-and-cuttag.html#visualization-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><p><a href="https://igv.org/doc/desktop/">Integrative Genomic Viewer</a>: IGV visualizes the chromatin landscape in regions using a genome browser. It provides a web app version and a local desktop version that is easy to use.</p></li>
 <li><p><a href="https://genome.ucsc.edu/">UCSC Genome Browser</a>: UCSC Genome Browser provides the most comprehensive supplementary genome information.</p></li>
@@ -777,8 +771,8 @@ <h3><span class="header-section-number">19.6.6</span> Visualization</h3>
 </ul>
 <p><img src="resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_6.png" width="100%" /></p>
 </div>
-<div id="differential-analysis" class="section level3" number="19.6.7">
-<h3><span class="header-section-number">19.6.7</span> Differential Analysis</h3>
+<div id="differential-analysis" class="section level3 hasAnchor" number="19.6.7">
+<h3><span class="header-section-number">19.6.7</span> Differential Analysis<a href="cutrun-and-cuttag.html#differential-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><p><a href="https://greenleaflab.github.io/chromVAR/reference/getCounts.html">chromVAR - getCounts</a>. The “getCounts” function in the chromVAR R package can convert an aligned bam file into a region by sample matrix, where the region can be genomic binning or peaks. The differential detection analysis can be performed on the region by sample matrix.</p></li>
 <li><p><a href="https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html">DESeq2</a>: <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8">Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2</a>
@@ -790,8 +784,8 @@ <h3><span class="header-section-number">19.6.7</span> Differential Analysis</h3>
 </ul>
 </div>
 </div>
-<div id="more-resources-about-cutrun-and-cuttag-data-analysis" class="section level2" number="19.7">
-<h2><span class="header-section-number">19.7</span> More resources about CUT&amp;RUN and CUT&amp;Tag data analysis</h2>
+<div id="more-resources-about-cutrun-and-cuttag-data-analysis" class="section level2 hasAnchor" number="19.7">
+<h2><span class="header-section-number">19.7</span> More resources about CUT&amp;RUN and CUT&amp;Tag data analysis<a href="cutrun-and-cuttag.html#more-resources-about-cutrun-and-cuttag-data-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><p><a href="https://bitbucket.org/qzhudfci/cutruntools/src/master/">CUT&amp;RUNTools</a>: <a href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1802-4">a flexible pipeline for CUT&amp;RUN processing and footprint analysis</a>. CUT&amp;RUNTools is a flexible and general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&amp;RUN primary cleavage data. CUT&amp;RUNTools extracts endonuclease cut site information from sequences of short-read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&amp;RUN.</p></li>
 <li><p><a href="https://github.com/fl-yu/CUT-RUNTools-2.0">CUT&amp;RUNTools 2.0</a>: <a href="https://academic.oup.com/bioinformatics/article/38/1/252/6318389?login=true">a pipeline for single-cell and bulk-level CUT&amp;RUN and CUT&amp;Tag data analysis</a>. CUT&amp;RUNTools 2.0 is a major update of CUT&amp;RUNTools, including a set of new features specially designed for CUT&amp;RUN and CUT&amp;Tag experiments. Both the bulk and single-cell data can be processed, analyzed, and interpreted using CUT&amp;RUNTools 2.0.</p></li>
@@ -802,10 +796,17 @@ <h2><span class="header-section-number">19.7</span> More resources about CUT&amp
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -875,7 +876,7 @@ <h2><span class="header-section-number">19.7</span> More resources about CUT&amp
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/dna-methods-overview.html b/docs/no_toc/dna-methods-overview.html
index 9df74c75..d2d01521 100644
--- a/docs/no_toc/dna-methods-overview.html
+++ b/docs/no_toc/dna-methods-overview.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 9 DNA Methods Overview | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 9 DNA Methods Overview | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="annotating-genomes.html"/>
 <link rel="next" href="whole-genome-or-exome-sequencing.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 9 DNA Methods Overview | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,32 +535,32 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="dna-methods-overview" class="section level1" number="9">
-<h1><span class="header-section-number">Chapter 9</span> DNA Methods Overview</h1>
+<div id="dna-methods-overview" class="section level1 hasAnchor" number="9">
+<h1><span class="header-section-number">Chapter 9</span> DNA Methods Overview<a href="dna-methods-overview.html#dna-methods-overview" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-7" class="section level2" number="9.1">
-<h2><span class="header-section-number">9.1</span> Learning Objectives</h2>
-<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png" title="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" width="100%" /></p>
+<div id="learning-objectives-7" class="section level2 hasAnchor" number="9.1">
+<h2><span class="header-section-number">9.1</span> Learning Objectives<a href="dna-methods-overview.html#learning-objectives-7" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection for DNA sequence collection and variant identification. Compare and contrast the following methods: DNA/SNP microarrays, Whole Genome Sequencing, Whole Exome Sequencing, and Targeted Sequencing" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-analyzing-dna-sequences" class="section level2" number="9.2">
-<h2><span class="header-section-number">9.2</span> What are the goals of analyzing DNA sequences?</h2>
+<div id="what-are-the-goals-of-analyzing-dna-sequences" class="section level2 hasAnchor" number="9.2">
+<h2><span class="header-section-number">9.2</span> What are the goals of analyzing DNA sequences?<a href="dna-methods-overview.html#what-are-the-goals-of-analyzing-dna-sequences" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>There are several larger goals behind DNA sequencing experiments ranging from assembling whole genomes, to identifying variation or performing a functional genomic analysis or comparative genomic study. Each of these has implications when studying disease.</p>
 <ul>
 <li><p>Assembling whole genomes:</p>
-<p>Because an organism’s genome determines how an organism develops and functions <span class="citation">(<a href="#ref-NHGRIGlossary2024" role="doc-biblioref">NHGRI 2024</a>)</span>, an important task in the genomics field is assembling the genome of an organism from sequencing reads. This assembly process attempts to reconstruct how the sequencing reads overlap or fit together <span class="citation">(<a href="#ref-Schatz2010" role="doc-biblioref">Schatz, Delcher, and Salzberg 2010</a>; <a href="#ref-Li_Durbin_2024" role="doc-biblioref">Li and Durbin 2024</a>)</span>. Recent examples of genome assembly in the genomics field include a complete 3.055 billion-base pair sequence of the human reference genome which was published by the Telomere-to-Telomere (T2T) Consortium <span class="citation">(<a href="#ref-Nurk2022" role="doc-biblioref">2022</a>)</span>, the T2T-CHM13 version (followed not long after by the complete sequence of the human Y chromosome <span class="citation">(<a href="#ref-Rhie2023" role="doc-biblioref">2023</a>)</span>). A goal of the field is to better capture human genetic diversity by creating a reference pangenome, assembled from multiple donors within the population <span class="citation">(<a href="#ref-Taylor_2024" role="doc-biblioref">2024</a>)</span>. Genome assemblies are an important part of genomics beyond human genomics research; there are reference gnomes available for most model organisms as well as many plants, animals, and pathogens, with more and more being published at a high frequency <span class="citation">(<a href="#ref-Miller2023" role="doc-biblioref">Miller, Zimin, and Gordus 2023</a>; <a href="#ref-Alonge2022" role="doc-biblioref">Alonge et al. 2022</a>; <a href="#ref-Gershman2023" role="doc-biblioref">Gershman et al. 2023</a>; <a href="#ref-Sistrom2016" role="doc-biblioref">Sistrom et al. 2016</a>)</span>. These reference genomes each act as an extensive compilation of the observed DNA sequence of genes, regulatory elements, etc. and the related coordinate systems for these elements, such that, for the corresponding organism, sequencing reads from other experiments can be mapped or aligned to the reference in order to localize where that read was in the genome. In the case of cancer informatics, a recent approach utilized personalized genome assembly to more accurately detect tumor somatic mutations. This is likely to be an area of future research for application in precision medicine <span class="citation">(<a href="#ref-Xiao2022" role="doc-biblioref">Xiao et al. 2022</a>; <a href="#ref-Ermini_Driguez_2024" role="doc-biblioref">Ermini and Driguez 2024</a>)</span>.</p></li>
+<p>Because an organism’s genome determines how an organism develops and functions <span class="citation">(<a href="#ref-NHGRIGlossary2024">NHGRI 2024</a>)</span>, an important task in the genomics field is assembling the genome of an organism from sequencing reads. This assembly process attempts to reconstruct how the sequencing reads overlap or fit together <span class="citation">(<a href="#ref-Schatz2010">Schatz, Delcher, and Salzberg 2010</a>; <a href="#ref-Li_Durbin_2024">Li and Durbin 2024</a>)</span>. Recent examples of genome assembly in the genomics field include a complete 3.055 billion-base pair sequence of the human reference genome which was published by the Telomere-to-Telomere (T2T) Consortium <span class="citation">(<a href="#ref-Nurk2022">2022</a>)</span>, the T2T-CHM13 version (followed not long after by the complete sequence of the human Y chromosome <span class="citation">(<a href="#ref-Rhie2023">2023</a>)</span>). A goal of the field is to better capture human genetic diversity by creating a reference pangenome, assembled from multiple donors within the population <span class="citation">(<a href="#ref-Taylor_2024">2024</a>)</span>. Genome assemblies are an important part of genomics beyond human genomics research; there are reference gnomes available for most model organisms as well as many plants, animals, and pathogens, with more and more being published at a high frequency <span class="citation">(<a href="#ref-Miller2023">Miller, Zimin, and Gordus 2023</a>; <a href="#ref-Alonge2022">Alonge et al. 2022</a>; <a href="#ref-Gershman2023">Gershman et al. 2023</a>; <a href="#ref-Sistrom2016">Sistrom et al. 2016</a>)</span>. These reference genomes each act as an extensive compilation of the observed DNA sequence of genes, regulatory elements, etc. and the related coordinate systems for these elements, such that, for the corresponding organism, sequencing reads from other experiments can be mapped or aligned to the reference in order to localize where that read was in the genome. In the case of cancer informatics, a recent approach utilized personalized genome assembly to more accurately detect tumor somatic mutations. This is likely to be an area of future research for application in precision medicine <span class="citation">(<a href="#ref-Xiao2022">Xiao et al. 2022</a>; <a href="#ref-Ermini_Driguez_2024">Ermini and Driguez 2024</a>)</span>.</p></li>
 <li><p>Identifying variation:</p>
-<p>Variant caller software is used within the field of genomics to identify places where reads from a DNA sequencing experiment differ from a comparative reference genome sequence <span class="citation">(<a href="#ref-NHGRIfactsheet2022" role="doc-biblioref">NHGRI 2022</a>)</span>. Variants may be as small as single nucleotide differences (single-nucleotide polymorphisms or SNPs) or much larger (50 base pairs or more) structural variation (SVs) such as duplications, deletions, insertions, inversions, translocations <span class="citation">(<a href="#ref-Wong2011" role="doc-biblioref">Wong, Hudson, and McPherson 2011</a>)</span>. (Shorter insertions or deletions are termed indels.) The SVs involving gains or losses in genomic DNA can lead to copy number variations (CNVs). Mutation and structural variants are very common in cancer as well as larger-scale catastrophic genomic rearrangements <span class="citation">(<a href="#ref-Zhang2022" role="doc-biblioref">C.-Z. Zhang and Pellman 2022</a>)</span>. Overall, variants may be rare in a population or fairly common <span class="citation">(<a href="#ref-Audano2019" role="doc-biblioref">Audano et al. 2019</a>)</span>. Further, variants may be somatic or germline variants: germline variants are hereditary and will be passed down from parent to offspring; in the offspring, the variant will be present in every cell, while somatic variants are generally not hereditary and present only in some cells rather than every cell <span class="citation">(<a href="#ref-NHSFrost2022" role="doc-biblioref">Frost 2022</a>)</span>. Because variation, specifically genetic diversity is a necessary part of a healthy species <span class="citation">(<a href="#ref-GeneticDiversity" role="doc-biblioref"><span>“What Is Genetic Diversity and Why Does It Matter?”</span> n.d.</a>)</span> and because variation, specifically mutations/variants may cause disease, identifying variation is a common goal in a DNA sequencing workflow. An example of research focusing on studying genetic diversity in humans is the 1000 Genomes Project which recently expanded its resource of sequenced genomes and in doing so discovered even more variation present in the population <span class="citation">(<a href="#ref-Byrska-Bishop2022" role="doc-biblioref">Byrska-Bishop et al. 2022</a>)</span>.</p></li>
+<p>Variant caller software is used within the field of genomics to identify places where reads from a DNA sequencing experiment differ from a comparative reference genome sequence <span class="citation">(<a href="#ref-NHGRIfactsheet2022">NHGRI 2022</a>)</span>. Variants may be as small as single nucleotide differences (single-nucleotide polymorphisms or SNPs) or much larger (50 base pairs or more) structural variation (SVs) such as duplications, deletions, insertions, inversions, translocations <span class="citation">(<a href="#ref-Wong2011">Wong, Hudson, and McPherson 2011</a>)</span>. (Shorter insertions or deletions are termed indels.) The SVs involving gains or losses in genomic DNA can lead to copy number variations (CNVs). Mutation and structural variants are very common in cancer as well as larger-scale catastrophic genomic rearrangements <span class="citation">(<a href="#ref-Zhang2022">C.-Z. Zhang and Pellman 2022</a>)</span>. Overall, variants may be rare in a population or fairly common <span class="citation">(<a href="#ref-Audano2019">Audano et al. 2019</a>)</span>. Further, variants may be somatic or germline variants: germline variants are hereditary and will be passed down from parent to offspring; in the offspring, the variant will be present in every cell, while somatic variants are generally not hereditary and present only in some cells rather than every cell <span class="citation">(<a href="#ref-NHSFrost2022">Frost 2022</a>)</span>. Because variation, specifically genetic diversity is a necessary part of a healthy species <span class="citation">(<a href="#ref-GeneticDiversity"><span>“What Is Genetic Diversity and Why Does It Matter?”</span> n.d.</a>)</span> and because variation, specifically mutations/variants may cause disease, identifying variation is a common goal in a DNA sequencing workflow. An example of research focusing on studying genetic diversity in humans is the 1000 Genomes Project which recently expanded its resource of sequenced genomes and in doing so discovered even more variation present in the population <span class="citation">(<a href="#ref-Byrska-Bishop2022">Byrska-Bishop et al. 2022</a>)</span>.</p></li>
 <li><p>Functional genomic analysis:</p>
-<p>Genomes contain more than just genes (the coding sequences that will be transcribed and translated into a protein); they also contain functional elements such as promoters, enhancers, or silencers that modulate the expression of genes <span class="citation">(<a href="#ref-Kellis2014" role="doc-biblioref">Kellis et al. 2014</a>)</span>. Further, differential gene expression is the phenomenon by which cells with the same DNA sequence show different patterns of gene expression. Functional genomic analyses aim to better understand differential gene expression and the impact of genetic variation found in functional elements. For example, many human genetic variants associated with common traits and diseases are localized in or near known functional elements <span class="citation">(<a href="#ref-Hindorff2009" role="doc-biblioref">Hindorff et al. 2009</a>)</span>. These variants may impact gene expression due to either changes in transcription factor binding at that site, or resulting epigenetic changes, which are defined as chemical modifications of chromatin or nucleotides beyond the DNA sequence. Such epigenetic modifications, which include histone marks and DNA methylation, can alter DNA compaction and influence a functional element’s accessibility for transcriptional machinery (e.g., if the element isn’t accessible, transcription may not occur; while previously the element was accessible and the gene could be transcribed). In later sections, methods that study epigenetic modifications like chromatin accessibility, DNA methylation, or binding of specific proteins will be discussed. All of these methods support functional genomic analyses and are important for better understanding differential gene expression and the impact of genetic variants located in functional elements may have on disease occurrence. A somewhat recent and high profile example of a functional genomic analysis centers again on work from the T2T Consortium. Not only did they publish a new, complete reference genome, but they also studied the epigenetic landscape in the newly resolved regions of the genome and pointed to potential newly discovered functional elements in a region previously thought to be transcriptionally inactive <span class="citation">(<a href="#ref-Gershman2022" role="doc-biblioref">Gershman et al. 2022</a>)</span>.</p></li>
+<p>Genomes contain more than just genes (the coding sequences that will be transcribed and translated into a protein); they also contain functional elements such as promoters, enhancers, or silencers that modulate the expression of genes <span class="citation">(<a href="#ref-Kellis2014">Kellis et al. 2014</a>)</span>. Further, differential gene expression is the phenomenon by which cells with the same DNA sequence show different patterns of gene expression. Functional genomic analyses aim to better understand differential gene expression and the impact of genetic variation found in functional elements. For example, many human genetic variants associated with common traits and diseases are localized in or near known functional elements <span class="citation">(<a href="#ref-Hindorff2009">Hindorff et al. 2009</a>)</span>. These variants may impact gene expression due to either changes in transcription factor binding at that site, or resulting epigenetic changes, which are defined as chemical modifications of chromatin or nucleotides beyond the DNA sequence. Such epigenetic modifications, which include histone marks and DNA methylation, can alter DNA compaction and influence a functional element’s accessibility for transcriptional machinery (e.g., if the element isn’t accessible, transcription may not occur; while previously the element was accessible and the gene could be transcribed). In later sections, methods that study epigenetic modifications like chromatin accessibility, DNA methylation, or binding of specific proteins will be discussed. All of these methods support functional genomic analyses and are important for better understanding differential gene expression and the impact of genetic variants located in functional elements may have on disease occurrence. A somewhat recent and high profile example of a functional genomic analysis centers again on work from the T2T Consortium. Not only did they publish a new, complete reference genome, but they also studied the epigenetic landscape in the newly resolved regions of the genome and pointed to potential newly discovered functional elements in a region previously thought to be transcriptionally inactive <span class="citation">(<a href="#ref-Gershman2022">Gershman et al. 2022</a>)</span>.</p></li>
 <li><p>Comparative genomics:</p>
-<p>A common saying in the genomics field is that structure determines function and conserved structure may be constrained such that there is an important function which needs to be conserved <span class="citation">(<a href="#ref-Alföldi_Lindblad-Toh_2013" role="doc-biblioref">Alföldi and Lindblad-Toh 2013</a>)</span>. Further, similarities in structure may be due to shared ancestry through the processes of evolution; therefore, some comparative genomics studies aim to infer homology or an evolutionary relationship from structural similarity <span class="citation">(<a href="#ref-Pearson2013" role="doc-biblioref">Pearson 2013</a>)</span>. More pertinent to the topics discussed previously, comparative genomics studies are also useful for identifying functional elements <span class="citation">(<a href="#ref-Taylor2006" role="doc-biblioref">J. Taylor et al. 2006</a>)</span> and variants associated with disease (e.g., by comparing the genomes of those with the disease and those without it and identifying differences) <span class="citation">(<a href="#ref-Alföldi_Lindblad-Toh_2013" role="doc-biblioref">Alföldi and Lindblad-Toh 2013</a>; <a href="#ref-Eichler_2019" role="doc-biblioref">Eichler 2019</a>)</span>.</p></li>
+<p>A common saying in the genomics field is that structure determines function and conserved structure may be constrained such that there is an important function which needs to be conserved <span class="citation">(<a href="#ref-Alföldi_Lindblad-Toh_2013">Alföldi and Lindblad-Toh 2013</a>)</span>. Further, similarities in structure may be due to shared ancestry through the processes of evolution; therefore, some comparative genomics studies aim to infer homology or an evolutionary relationship from structural similarity <span class="citation">(<a href="#ref-Pearson2013">Pearson 2013</a>)</span>. More pertinent to the topics discussed previously, comparative genomics studies are also useful for identifying functional elements <span class="citation">(<a href="#ref-Taylor2006">J. Taylor et al. 2006</a>)</span> and variants associated with disease (e.g., by comparing the genomes of those with the disease and those without it and identifying differences) <span class="citation">(<a href="#ref-Alföldi_Lindblad-Toh_2013">Alföldi and Lindblad-Toh 2013</a>; <a href="#ref-Eichler_2019">Eichler 2019</a>)</span>.</p></li>
 </ul>
 </div>
-<div id="comparison-of-dna-methods" class="section level2" number="9.3">
-<h2><span class="header-section-number">9.3</span> Comparison of DNA methods</h2>
-<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png" title="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." alt="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." width="100%" />
+<div id="comparison-of-dna-methods" class="section level2 hasAnchor" number="9.3">
+<h2><span class="header-section-number">9.3</span> Comparison of DNA methods<a href="dna-methods-overview.html#comparison-of-dna-methods" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png" alt="Comparing DNA Sequencing Techniques. The most common DNA sequencing techniques are described. Whole genome sequencing coverages all genes and non-coding DNA. 3.2 billion bases are covered when applied to human samples. This the most expensive of the techniques. Depth of coverage required for 99.9% sensitivity is 30X. Whole exome sequencing coverage is the exome or expressed genes. Approximately 45 million bases are sequenced. This is a cost-effective technique. The depth of coverage required for 99.9% sensitivity is 100X. Targeted gene panel sequencing coverages 50-500 genes. 20,000 to 62 million bases are sequenced. This is the most cost-effective technique. Depth of coverage is &gt;500X." width="100%" />
 There are four DNA sequencing methods discussed in this chapter. The above graph compares WGS, WXS, and Targeted gene sequencing. The last section compares all 4.</p>
 <ol style="list-style-type: decimal">
 <li>Whole genome sequencing (WGS)</li>
@@ -574,42 +568,42 @@ <h2><span class="header-section-number">9.3</span> Comparison of DNA methods</h2
 <li>Targeted gene sequencing</li>
 <li>DNA/SNP microarrays</li>
 </ol>
-<p>Compared to WXS and Targeted Gene Sequencing, WGS is the most expensive but requires the lowest depth of coverage to achieve 95% sensitivity. In other words, WGS requires sequencing each region of the genome (3.2 billion bases) 30 times in order to confidently be able to pick up all possible meaningful variants. <span class="citation">(<a href="#ref-Sims2014" role="doc-biblioref">Sims et al. 2014</a>)</span> goes into more depth on how these depths are calculated.</p>
+<p>Compared to WXS and Targeted Gene Sequencing, WGS is the most expensive but requires the lowest depth of coverage to achieve 95% sensitivity. In other words, WGS requires sequencing each region of the genome (3.2 billion bases) 30 times in order to confidently be able to pick up all possible meaningful variants. <span class="citation">(<a href="#ref-Sims2014">Sims et al. 2014</a>)</span> goes into more depth on how these depths are calculated.</p>
 <p>Alternatively, WXS is a more cost effective way to study the genome, focusing places in the genome that have open reading frames – aka generally genes that are able to be expressed. This focuses on enriching for exons and not introns so splicing variants may be missed. In this case, each gene must be sequenced 80-100x for sufficient sensitivity to pick up meaningful variants.</p>
 <p>In targeted gene sequencing, a panel of 50-500 regions of interest are selected. This technique is very applicable for studying a set of specific genes of interest at great depth to identify all varieties of mutations within those specific genes. These genes must be sequenced at much greater depth (&gt;500x) to confidently identify all meaningful variants. <a href="https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/coverage.html">This page from Illumina</a> also provides information regarding sequencing depth considerations for different modalities.</p>
 <p>Additional references:
-WGS: <span class="citation">(<a href="#ref-Bentley2008" role="doc-biblioref">Bentley et al. 2008</a>)</span>
-WES: <span class="citation">(<a href="#ref-Clark2011" role="doc-biblioref">Clark et al. 2011</a>)</span>
-Targeted: <span class="citation">(<a href="#ref-BewickeCopley2019" role="doc-biblioref">Bewicke-Copley et al. 2019</a>)</span></p>
+WGS: <span class="citation">(<a href="#ref-Bentley2008">Bentley et al. 2008</a>)</span>
+WES: <span class="citation">(<a href="#ref-Clark2011">Clark et al. 2011</a>)</span>
+Targeted: <span class="citation">(<a href="#ref-BewickeCopley2019">Bewicke-Copley et al. 2019</a>)</span></p>
 </div>
-<div id="how-to-choose-a-dna-sequencing-method" class="section level2" number="9.4">
-<h2><span class="header-section-number">9.4</span> How to choose a DNA sequencing method</h2>
+<div id="how-to-choose-a-dna-sequencing-method" class="section level2 hasAnchor" number="9.4">
+<h2><span class="header-section-number">9.4</span> How to choose a DNA sequencing method<a href="dna-methods-overview.html#how-to-choose-a-dna-sequencing-method" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Before starting any sequencing method, you likely have a research question or hypothesis in mind. In order to choose a DNA sequencing method, you will need to consider a few items in balance of each other:</p>
-<div id="what-regions-of-the-genome-pertain-to-your-research-question" class="section level3" number="9.4.1">
-<h3><span class="header-section-number">9.4.1</span> 1. What region(s) of the genome pertain to your research question?</h3>
+<div id="what-regions-of-the-genome-pertain-to-your-research-question" class="section level3 hasAnchor" number="9.4.1">
+<h3><span class="header-section-number">9.4.1</span> 1. What region(s) of the genome pertain to your research question?<a href="dna-methods-overview.html#what-regions-of-the-genome-pertain-to-your-research-question" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Is this unknown? Can it be narrowed down to non-coding or coding regions? Is there an even more specific subset of interest?</p>
 </div>
-<div id="what-does-your-project-budget-allow-for" class="section level3" number="9.4.2">
-<h3><span class="header-section-number">9.4.2</span> 2. What does your project budget allow for?</h3>
+<div id="what-does-your-project-budget-allow-for" class="section level3 hasAnchor" number="9.4.2">
+<h3><span class="header-section-number">9.4.2</span> 2. What does your project budget allow for?<a href="dna-methods-overview.html#what-does-your-project-budget-allow-for" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Some methods are much more costly than others. Cost is not only a factor for the reagents needed to sequence, but also the computing power needed to process and store the data and people’s compensation for their work on the data. All of these costs increase as the amounts of data that are collected increase. For more information on computing decisions see our <a href="https://www.itcrtraining.org/courses#h.civy2cnri95t">Computing in Cancer Informatics course</a>.</p>
 </div>
-<div id="what-is-your-detection-power-for-these-variants" class="section level3" number="9.4.3">
-<h3><span class="header-section-number">9.4.3</span> 3. What is your detection power for these variants?</h3>
+<div id="what-is-your-detection-power-for-these-variants" class="section level3 hasAnchor" number="9.4.3">
+<h3><span class="header-section-number">9.4.3</span> 3. What is your detection power for these variants?<a href="dna-methods-overview.html#what-is-your-detection-power-for-these-variants" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Detecting DNA variants is not simply a matter of yes or no, but a confidence level due to sequencing errors in data collection. Are the variants you are looking for very rare and/or small (single nucleotide or very few copy number differences)? If so you will need more samples and potentially more sequencing depth to detect these variants with confidence.</p>
 </div>
 </div>
-<div id="strengths-and-weaknesses-of-different-methods" class="section level2" number="9.5">
-<h2><span class="header-section-number">9.5</span> Strengths and Weaknesses of different methods</h2>
+<div id="strengths-and-weaknesses-of-different-methods" class="section level2 hasAnchor" number="9.5">
+<h2><span class="header-section-number">9.5</span> Strengths and Weaknesses of different methods<a href="dna-methods-overview.html#strengths-and-weaknesses-of-different-methods" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Is not much known about DNA variants in your organism or disease in question? In this instance you may want to cast a large net to explore more variants by using WGS.</p>
 <p>If previous research has identified sections of the genome that are of interest to your research question, then it’s highly advisable to not sequence the entire genome with WGS methods. Not only will whole genome sequencing be more costly, but it will decrease your statistical power to discover true positive variants of interest and increase your chances of discovering false positive variants. This is because multiple testing correction needs to be applied in instances where many tests are being done currently. In this instance, the tests being performed are across the whole genome.</p>
 <p>If your research question does not pertain to non-coding regions of the genome or splicing, then its advisable to use WXS. Recall that only about 1-2% of the genome is coding sequences meaning that if you are uninterested in noncoding regions but still use WGS then 98-99% of your data will be uninteresting to you and will only serve to increase your chances of finding false positives or cost you a lot of funding. Not only does sequencing more of the genome take more money and time but it will be more costly in time and resources in terms of the computing power needed to analyze it.</p>
 <p>Furthermore, if you are able to narrow down even further what regions are of interest this would be better in terms of cost and detection abilities. A targeted sequencing panel or DNA microarray are ideal for assaying known groups of targets. DNA microarrays are the least costly of all the methods to identify DNA variants, but with both targeted sequencing and DNA microarray you will need to find or create a custom probe or primer set. Ideally a probe or primer set that hits your regions of interest already exists commercially but if not, then you will have to design your own – which also costs time and money.</p>
-<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png" title="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." alt="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." width="100%" /></p>
+<p><img src="resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png" alt="There are three general methods we will discuss for evaluating DNA sequences. Whole Genome Sequencing (WGS) assays more of the genome than other methods but is much more costly and computationally intensive. Depending on your goals WGS may be overkill. SNP microarrays on the other hand, are much more cost effective but are not able to be used for exploratory purposes. Whole Exome Sequencing (WXS or WES) and other targeted sequencing methods allow you to survey regions of the genome in way that is more cost effective and potentially at higher depths." width="100%" /></p>
 <p>In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data.</p>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-Alföldi_Lindblad-Toh_2013" class="csl-entry">
 Alföldi, Jessica, and Kerstin Lindblad-Toh. 2013. <span>“Comparative Genomics as a Tool to Understand Evolution and Disease.”</span> <em>Genome Research</em> 23 (7): 1063–68. <a href="https://doi.org/10.1101/gr.157503.113">https://doi.org/10.1101/gr.157503.113</a>.
@@ -703,10 +697,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -776,7 +777,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/dna-methylation-sequencing.html b/docs/no_toc/dna-methylation-sequencing.html
index 56a50283..159670dc 100644
--- a/docs/no_toc/dna-methylation-sequencing.html
+++ b/docs/no_toc/dna-methylation-sequencing.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 20 DNA Methylation Sequencing | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 20 DNA Methylation Sequencing | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="cutrun-and-cuttag.html"/>
 <link rel="next" href="microbiome-sequencing.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 20 DNA Methylation Sequencing | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,84 +535,84 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="dna-methylation-sequencing" class="section level1" number="20">
-<h1><span class="header-section-number">Chapter 20</span> DNA Methylation Sequencing</h1>
+<div id="dna-methylation-sequencing" class="section level1 hasAnchor" number="20">
+<h1><span class="header-section-number">Chapter 20</span> DNA Methylation Sequencing<a href="dna-methylation-sequencing.html#dna-methylation-sequencing" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is incomplete! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-18" class="section level2" number="20.1">
-<h2><span class="header-section-number">20.1</span> Learning Objectives</h2>
-<p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png" title="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" alt="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" width="100%" /></p>
+<div id="learning-objectives-18" class="section level2 hasAnchor" number="20.1">
+<h2><span class="header-section-number">20.1</span> Learning Objectives<a href="dna-methylation-sequencing.html#learning-objectives-18" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png" alt="This chapter will demonstrate how to: Understand the basics of bisulfite sequencing data collection and processing workflow. Identify the next steps for your particular bisulfite  sequencing data. Formulate questions to ask about your bisulfite sequencing data" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-analyzing-dna-methylation" class="section level2" number="20.2">
-<h2><span class="header-section-number">20.2</span> What are the goals of analyzing DNA methylation?</h2>
+<div id="what-are-the-goals-of-analyzing-dna-methylation" class="section level2 hasAnchor" number="20.2">
+<h2><span class="header-section-number">20.2</span> What are the goals of analyzing DNA methylation?<a href="dna-methylation-sequencing.html#what-are-the-goals-of-analyzing-dna-methylation" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_10.png" width="100%" /></p>
 <p>To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite (BS) conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively.</p>
 <p>For a given sample, you will obtain a fraction, known as the Beta value, that indicates the relative abundance of the methylated and unmethylated versions of the sequence. Beta values exist then on a scale of 0 to 1 where 0 indicates none of this particular base is methylated in the sample and 1 indicates all are methylated.</p>
 <p>Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics.</p>
-<p>Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite sequencing (OxBS) [<span class="citation">Booth et al. (<a href="#ref-Booth2013" role="doc-biblioref">2013</a>)</span>. oxidative bisulfite conversion measures both 5mC and 5hmC. If you want to identify 5hmC bases you either have to pair oxBS data with BS data OR you have to use Tet-assisted bisulfite (TAB) sequencing which will exclusively tag 5hmC bases <span class="citation">(<a href="#ref-Yu2012" role="doc-biblioref">Yu et al. 2012</a>)</span>.</p>
+<p>Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite sequencing (OxBS) [<span class="citation">Booth et al. (<a href="#ref-Booth2013">2013</a>)</span>. oxidative bisulfite conversion measures both 5mC and 5hmC. If you want to identify 5hmC bases you either have to pair oxBS data with BS data OR you have to use Tet-assisted bisulfite (TAB) sequencing which will exclusively tag 5hmC bases <span class="citation">(<a href="#ref-Yu2012">Yu et al. 2012</a>)</span>.</p>
 <p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_35.png" width="100%" /></p>
 </div>
-<div id="methylation-data-considerations" class="section level2" number="20.3">
-<h2><span class="header-section-number">20.3</span> Methylation data considerations</h2>
-<div id="beta-values-binomially-distributed" class="section level3" number="20.3.1">
-<h3><span class="header-section-number">20.3.1</span> Beta values binomially distributed</h3>
+<div id="methylation-data-considerations" class="section level2 hasAnchor" number="20.3">
+<h2><span class="header-section-number">20.3</span> Methylation data considerations<a href="dna-methylation-sequencing.html#methylation-data-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="beta-values-binomially-distributed" class="section level3 hasAnchor" number="20.3.1">
+<h3><span class="header-section-number">20.3.1</span> Beta values binomially distributed<a href="dna-methylation-sequencing.html#beta-values-binomially-distributed" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Because beta values are a ratio, by their nature, they are not normally distributed data and should be treated appropriately. This means data models (like those used by the <code>limma</code> package) built for RNA-seq data should not be used on methylation data. More accurately, Beta values follow a binomial distribution.</p>
 <p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_0.png" width="100%" /></p>
 <p>This generally involves applying a generalized linear model.</p>
 </div>
-<div id="measuring-5mc-andor-5hmc" class="section level3" number="20.3.2">
-<h3><span class="header-section-number">20.3.2</span> Measuring 5mC and/or 5hmC</h3>
+<div id="measuring-5mc-andor-5hmc" class="section level3 hasAnchor" number="20.3.2">
+<h3><span class="header-section-number">20.3.2</span> Measuring 5mC and/or 5hmC<a href="dna-methylation-sequencing.html#measuring-5mc-andor-5hmc" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>If your data and questions are interested in both 5mC and 5hmC, you will have separate sequencing datasets for each sample for both the BS and OBS processed samples. 5mC is often a step toward 5hmC conversion and therefore the 5mC and 5hmC measurements are, by nature, not independent from each other. In theory, 5mC, 5hmC and unmethylated cytosines should add up to 1.</p>
 <p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_42.png" width="100%" /></p>
-<p>Because of this, its been proposed that the most appropriate way to model these data is to combine them together in a model <span class="citation">(<a href="#ref-Kochmanski2019" role="doc-biblioref">Kochmanski, Savonen, and Bernstein 2019</a>)</span>.</p>
+<p>Because of this, its been proposed that the most appropriate way to model these data is to combine them together in a model <span class="citation">(<a href="#ref-Kochmanski2019">Kochmanski, Savonen, and Bernstein 2019</a>)</span>.</p>
 <p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_49.png" width="100%" /></p>
 </div>
 </div>
-<div id="methylation-data-workflow" class="section level2" number="20.4">
-<h2><span class="header-section-number">20.4</span> Methylation data workflow</h2>
-<p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png" title="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " alt="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " width="100%" /></p>
+<div id="methylation-data-workflow" class="section level2 hasAnchor" number="20.4">
+<h2><span class="header-section-number">20.4</span> Methylation data workflow<a href="dna-methylation-sequencing.html#methylation-data-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png" alt="In a very general sense, methylation workflow involves sequence quality control and genome alignment like many other sequencing methods. But next, the data needs to be used to identify methylation calls and calculations of methylation fractions. Lastly, you will likely want to group the methylated bases together to identify what regions of the genome are differentially methylated and of interest. " width="100%" /></p>
 <p>Like other sequencing methods, you will first need to start by quality control checks. Next, you will also need to align your sequences to the genome. Then, using the base calls, you will need to make methylation calls – which are methylated and which are not. This details of step depends on whether you are measuring 5mC and/or 5hmC methylation calls. Lastly, you will likely want to use your methylation calls as a whole to identify differentially methylated regions of interest.</p>
 </div>
-<div id="methylation-tools-pros-and-cons" class="section level2" number="20.5">
-<h2><span class="header-section-number">20.5</span> Methylation Tools Pros and Cons</h2>
+<div id="methylation-tools-pros-and-cons" class="section level2 hasAnchor" number="20.5">
+<h2><span class="header-section-number">20.5</span> Methylation Tools Pros and Cons<a href="dna-methylation-sequencing.html#methylation-tools-pros-and-cons" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <div class="warning">
 <p>This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment.</p>
 </div>
-<div id="quality-control-1" class="section level3" number="20.5.1">
-<h3><span class="header-section-number">20.5.1</span> Quality control:</h3>
+<div id="quality-control-1" class="section level3 hasAnchor" number="20.5.1">
+<h3><span class="header-section-number">20.5.1</span> Quality control:<a href="dna-methylation-sequencing.html#quality-control-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">FastQC</a>: A popular tool for evaluating the quality of sequencing reads, generating various quality control plots and statistics. It is fast, easy to use and has a simple user interface <span class="citation">(<a href="#ref-andrews2010fastqc" role="doc-biblioref">Andrews, n.d.</a>)</span>.
+<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">FastQC</a>: A popular tool for evaluating the quality of sequencing reads, generating various quality control plots and statistics. It is fast, easy to use and has a simple user interface <span class="citation">(<a href="#ref-andrews2010fastqc">Andrews, n.d.</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Fast and easy to use. Very commonly used. Provides various quality control metrics and plots. Can generate reports that can be easily shared with collaborators</li>
 <li><strong>Cons</strong>: Does not perform any trimming or filtering of low-quality reads Not specifically designed for bisulfite sequencing data</li>
 </ul></li>
-<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/">Trim Galore!</a>: A wrapper tool for Cutadapt and FastQC that provides a simple way to trim adapters and low-quality reads. It also has built-in support for bisulfite sequencing data <span class="citation">(<a href="#ref-krueger2015trim" role="doc-biblioref">Krueger and Andrews, n.d.</a>)</span>.
+<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/">Trim Galore!</a>: A wrapper tool for Cutadapt and FastQC that provides a simple way to trim adapters and low-quality reads. It also has built-in support for bisulfite sequencing data <span class="citation">(<a href="#ref-krueger2015trim">Krueger and Andrews, n.d.</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Easy to use, with a simple command line interface. Automatically trims adapters and low-quality reads. Specifically designed for bisulfite sequencing data</li>
 <li><strong>Cons</strong>: Limited flexibility in terms of the trimming and filtering options. Does not provide quality control metrics or plots</li>
 </ul></li>
 </ul>
 </div>
-<div id="analysis" class="section level3" number="20.5.2">
-<h3><span class="header-section-number">20.5.2</span> Analysis:</h3>
+<div id="analysis" class="section level3 hasAnchor" number="20.5.2">
+<h3><span class="header-section-number">20.5.2</span> Analysis:<a href="dna-methylation-sequencing.html#analysis" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: A widely used tool for aligning bisulfite sequencing reads to a reference genome. It allows for paired-end and single-end reads, provides many options for handling sequencing errors and can output methylation calls in various formats <span class="citation">(<a href="#ref-liu2019bismark" role="doc-biblioref">Liu et al. 2019</a>)</span>.
+<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: A widely used tool for aligning bisulfite sequencing reads to a reference genome. It allows for paired-end and single-end reads, provides many options for handling sequencing errors and can output methylation calls in various formats <span class="citation">(<a href="#ref-liu2019bismark">Liu et al. 2019</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Performs alignment, quantification and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters</li>
 <li><strong>Cons</strong>:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome</li>
 </ul></li>
-<li><a href="https://github.com/BenLangmead/bowtie2">Bowtie2</a>: A fast and efficient aligner that can be used for bisulfite sequencing data, and can align reads to bisulfite-converted genomes or to an unconverted genome with a pre-built bisulfite index <span class="citation">(<a href="#ref-langmead2012fast" role="doc-biblioref">Langmead and Salzberg 2012</a>)</span>.
+<li><a href="https://github.com/BenLangmead/bowtie2">Bowtie2</a>: A fast and efficient aligner that can be used for bisulfite sequencing data, and can align reads to bisulfite-converted genomes or to an unconverted genome with a pre-built bisulfite index <span class="citation">(<a href="#ref-langmead2012fast">Langmead and Salzberg 2012</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Very fast and efficient, making it suitable for large datasets. Can align reads to either a bisulfite-converted genome or to an unconverted genome with a pre-built bisulfite index. Provides options for handling sequencing errors and optimizing alignment parameters</li>
 <li><strong>Cons</strong>: Does not perform methylation calling or quantification</li>
 </ul></li>
 </ul>
 </div>
-<div id="methylation-calling" class="section level3" number="20.5.3">
-<h3><span class="header-section-number">20.5.3</span> Methylation calling:</h3>
+<div id="methylation-calling" class="section level3 hasAnchor" number="20.5.3">
+<h3><span class="header-section-number">20.5.3</span> Methylation calling:<a href="dna-methylation-sequencing.html#methylation-calling" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: As well as performing alignment, Bismark can also be used to call methylation from aligned reads. It reports the percentage of cytosines methylated at each site <span class="citation">(<a href="#ref-liu2019bismark" role="doc-biblioref">Liu et al. 2019</a>)</span>.
+<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: As well as performing alignment, Bismark can also be used to call methylation from aligned reads. It reports the percentage of cytosines methylated at each site <span class="citation">(<a href="#ref-liu2019bismark">Liu et al. 2019</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Performs both alignment and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters</li>
 <li><strong>Cons</strong>:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome</li>
@@ -630,31 +624,31 @@ <h3><span class="header-section-number">20.5.3</span> Methylation calling:</h3>
 </ul></li>
 </ul>
 </div>
-<div id="methylation-quantification" class="section level3" number="20.5.4">
-<h3><span class="header-section-number">20.5.4</span> Methylation quantification:</h3>
+<div id="methylation-quantification" class="section level3 hasAnchor" number="20.5.4">
+<h3><span class="header-section-number">20.5.4</span> Methylation quantification:<a href="dna-methylation-sequencing.html#methylation-quantification" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="https://www.bioconductor.org/packages/release/bioc/html/methylKit.html">MethylKit</a>: A popular tool for quantifying methylation levels from bisulfite sequencing data. It can handle various types of data and provides options for filtering out low-quality data and detecting differentially methylated regions <span class="citation">(<a href="#ref-akalin2012methylome" role="doc-biblioref">Akalin et al. 2012</a>)</span>.
+<li><a href="https://www.bioconductor.org/packages/release/bioc/html/methylKit.html">MethylKit</a>: A popular tool for quantifying methylation levels from bisulfite sequencing data. It can handle various types of data and provides options for filtering out low-quality data and detecting differentially methylated regions <span class="citation">(<a href="#ref-akalin2012methylome">Akalin et al. 2012</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Provides various options for filtering out low-quality data and detecting differentially methylated regions. Can handle various types of data, including bisulfite sequencing and reduced representation bisulfite sequencing. Provides many visualization tools for analyzing methylation data</li>
 <li><strong>Cons</strong>: Can be computationally intensive for large datasets. Requires some knowledge of R programming language to use effectively</li>
 </ul></li>
-<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: As well as methylation calling, Bismark can also quantify methylation levels at each cytosine site. It reports the number of methylated and unmethylated reads, as well as the percentage of methylation <span class="citation">(<a href="#ref-liu2019bismark" role="doc-biblioref">Liu et al. 2019</a>)</span>.</li>
+<li><a href="https://www.bioinformatics.babraham.ac.uk/projects/bismark/">Bismark</a>: As well as methylation calling, Bismark can also quantify methylation levels at each cytosine site. It reports the number of methylated and unmethylated reads, as well as the percentage of methylation <span class="citation">(<a href="#ref-liu2019bismark">Liu et al. 2019</a>)</span>.</li>
 </ul>
 </div>
-<div id="analysis-1" class="section level3" number="20.5.5">
-<h3><span class="header-section-number">20.5.5</span> Analysis:</h3>
+<div id="analysis-1" class="section level3 hasAnchor" number="20.5.5">
+<h3><span class="header-section-number">20.5.5</span> Analysis:<a href="dna-methylation-sequencing.html#analysis-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="http://www.bioconductor.org/packages/release/bioc/vignettes/DSS/inst/doc/DSS.html">DSS</a>: A popular tool for identifying differentially methylated regions (DMRs) between groups of samples. It uses a statistical model to detect significant changes in methylation levels and reports DMRs with associated p-values <span class="citation">(<a href="#ref-feng2014dss" role="doc-biblioref">Feng and Conneely 2016</a>)</span>.
+<li><a href="http://www.bioconductor.org/packages/release/bioc/vignettes/DSS/inst/doc/DSS.html">DSS</a>: A popular tool for identifying differentially methylated regions (DMRs) between groups of samples. It uses a statistical model to detect significant changes in methylation levels and reports DMRs with associated p-values <span class="citation">(<a href="#ref-feng2014dss">Feng and Conneely 2016</a>)</span>.
 <ul>
 <li><strong>Pros</strong>: Uses a statistical model to identify differentially methylated regions between groups of samples. Provides various options for controlling false discovery rate and adjusting for multiple comparisons. Suitable for large datasets.</li>
 <li><strong>Cons</strong>: Requires some knowledge of statistical methods and programming language to use effectively. May not be suitable for smaller datasets or datasets with low coverage.</li>
 </ul></li>
-<li><a href="https://www.bioconductor.org/packages/release/bioc/html/methylKit.html">MethylKit</a>: As well as methylation quantification, MethylKit can also be used for downstream analysis, such as clustering samples based on methylation patterns and performing functional annotation of differentially methylated regions <span class="citation">(<a href="#ref-akalin2012methylome" role="doc-biblioref">Akalin et al. 2012</a>)</span>.</li>
+<li><a href="https://www.bioconductor.org/packages/release/bioc/html/methylKit.html">MethylKit</a>: As well as methylation quantification, MethylKit can also be used for downstream analysis, such as clustering samples based on methylation patterns and performing functional annotation of differentially methylated regions <span class="citation">(<a href="#ref-akalin2012methylome">Akalin et al. 2012</a>)</span>.</li>
 </ul>
 </div>
 </div>
-<div id="more-resources-2" class="section level2" number="20.6">
-<h2><span class="header-section-number">20.6</span> More resources</h2>
+<div id="more-resources-2" class="section level2 hasAnchor" number="20.6">
+<h2><span class="header-section-number">20.6</span> More resources<a href="dna-methylation-sequencing.html#more-resources-2" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://training.galaxyproject.org/training-material/topics/epigenetics/tutorials/methylation-seq/tutorial.html">DNA methylation analysis with Galaxy tutorial</a></li>
 <li><a href="https://github.com/sartorlab/mint/blob/master/README.md">The mint pipeline</a> for analyzing methylation and hydroxymethylation data.</li>
@@ -663,7 +657,7 @@ <h2><span class="header-section-number">20.6</span> More resources</h2>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-akalin2012methylome" class="csl-entry">
 Akalin, Altuna, Matthias Kormaksson, Sheng Li, Francine E Garrett-Bakelman, Maria E Figueroa, Ari Melnick, and Christopher E Mason. 2012. <span>“methylKit: A Comprehensive r Package for the Analysis of Genome-Wide DNA Methylation Profiles.”</span> <em>Genome Biology</em> 13 (10): R87. <a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-10-r87">https://genomebiology.biomedcentral.com/articles/10.1186/gb-2012-13-10-r87</a>.
@@ -694,10 +688,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -767,7 +768,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/general-data-analysis-tools.html b/docs/no_toc/general-data-analysis-tools.html
index bddb1702..d00c09be 100644
--- a/docs/no_toc/general-data-analysis-tools.html
+++ b/docs/no_toc/general-data-analysis-tools.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 5 General Data Analysis Tools | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 5 General Data Analysis Tools | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="considerations-for-choosing-tools.html"/>
 <link rel="next" href="sequencing-data.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 5 General Data Analysis Tools | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,29 +535,29 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="general-data-analysis-tools" class="section level1" number="5">
-<h1><span class="header-section-number">Chapter 5</span> General Data Analysis Tools</h1>
-<div id="learning-objectives-3" class="section level2" number="5.1">
-<h2><span class="header-section-number">5.1</span> Learning Objectives</h2>
-<p><img src="resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png" title="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" alt="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" width="100%" /></p>
+<div id="general-data-analysis-tools" class="section level1 hasAnchor" number="5">
+<h1><span class="header-section-number">Chapter 5</span> General Data Analysis Tools<a href="general-data-analysis-tools.html#general-data-analysis-tools" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<div id="learning-objectives-3" class="section level2 hasAnchor" number="5.1">
+<h2><span class="header-section-number">5.1</span> Learning Objectives<a href="general-data-analysis-tools.html#learning-objectives-3" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png" alt="This chapter will demonstrate how to: Understand the difference between command line and GUI based applications. Understand what R and Python languages are. Find many links to resources where you can learn R or Python" width="100%" /></p>
 </div>
-<div id="command-line-vs-gui" class="section level2" number="5.2">
-<h2><span class="header-section-number">5.2</span> Command Line vs GUI</h2>
+<div id="command-line-vs-gui" class="section level2 hasAnchor" number="5.2">
+<h2><span class="header-section-number">5.2</span> Command Line vs GUI<a href="general-data-analysis-tools.html#command-line-vs-gui" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do.</p>
 <p>Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here.</p>
-<div id="bash" class="section level3" number="5.2.1">
-<h3><span class="header-section-number">5.2.1</span> Bash</h3>
+<div id="bash" class="section level3 hasAnchor" number="5.2.1">
+<h3><span class="header-section-number">5.2.1</span> Bash<a href="general-data-analysis-tools.html#bash" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash.</p>
 <p>On a Mac computer, you can use bash commands by finding your <code>Terminal</code> window. Go to your search bar and search for the <code>Terminal</code>. You may want to keep this application handy.</p>
 <p>In Windows, you can use bash commands by search for <code>Command Prompt</code> application. Go to your search bar and search for <code>Command Prompt</code>. You may want to keep this application handy.</p>
 </div>
-<div id="r" class="section level3" number="5.2.2">
-<h3><span class="header-section-number">5.2.2</span> R</h3>
+<div id="r" class="section level3 hasAnchor" number="5.2.2">
+<h3><span class="header-section-number">5.2.2</span> R<a href="general-data-analysis-tools.html#r" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>R is a program commonly used for statistics and data analysis. It’s free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/genomic-tool-glossary.html">tool glossary</a>.</p>
-<div id="resources-for-learning-r" class="section level4" number="5.2.2.1">
-<h4><span class="header-section-number">5.2.2.1</span> Resources for learning R</h4>
-<div id="r-and-tidyverse" class="section level5" number="5.2.2.1.1">
-<h5><span class="header-section-number">5.2.2.1.1</span> R and Tidyverse</h5>
+<div id="resources-for-learning-r" class="section level4 hasAnchor" number="5.2.2.1">
+<h4><span class="header-section-number">5.2.2.1</span> Resources for learning R<a href="general-data-analysis-tools.html#resources-for-learning-r" class="anchor-section" aria-label="Anchor link to header"></a></h4>
+<div id="r-and-tidyverse" class="section level5 hasAnchor" number="5.2.2.1.1">
+<h5><span class="header-section-number">5.2.2.1.1</span> R and Tidyverse<a href="general-data-analysis-tools.html#r-and-tidyverse" class="anchor-section" aria-label="Anchor link to header"></a></h5>
 <ul>
 <li><a href="https://swirlstats.com/">Swirl, an interactive tutorial</a><br />
 </li>
@@ -578,8 +572,8 @@ <h5><span class="header-section-number">5.2.2.1.1</span> R and Tidyverse</h5>
 <li><a href="https://www.spl.org/books-and-media/books-and-ebooks/safari-books-online">O’Reilly books</a> available through Seattle Public Library</li>
 </ul>
 </div>
-<div id="r-notebooks" class="section level5" number="5.2.2.1.2">
-<h5><span class="header-section-number">5.2.2.1.2</span> R notebooks</h5>
+<div id="r-notebooks" class="section level5 hasAnchor" number="5.2.2.1.2">
+<h5><span class="header-section-number">5.2.2.1.2</span> R notebooks<a href="general-data-analysis-tools.html#r-notebooks" class="anchor-section" aria-label="Anchor link to header"></a></h5>
 <ul>
 <li><a href="http://rmarkdown.rstudio.com">R Markdown</a><br />
 </li>
@@ -589,8 +583,8 @@ <h5><span class="header-section-number">5.2.2.1.2</span> R notebooks</h5>
 <li><a href="https://bookdown.org/yihui/rmarkdown/">R Notebooks tutorial</a></li>
 </ul>
 </div>
-<div id="r-and-genomics" class="section level5" number="5.2.2.1.3">
-<h5><span class="header-section-number">5.2.2.1.3</span> R and Genomics</h5>
+<div id="r-and-genomics" class="section level5 hasAnchor" number="5.2.2.1.3">
+<h5><span class="header-section-number">5.2.2.1.3</span> R and Genomics<a href="general-data-analysis-tools.html#r-and-genomics" class="anchor-section" aria-label="Anchor link to header"></a></h5>
 <ul>
 <li><a href="https://github.com/AlexsLemonade/training-modules/tree/master/intro-to-R-tidyverse">Intro to R and Tidyverse course and exercises</a> from the Childhood Cancer Data Lab.</li>
 <li><a href="https://alexslemonade.github.io/refinebio-examples/index.html">Refine.bio examples</a> from the Childhood Cancer Data Lab.</li>
@@ -599,11 +593,11 @@ <h5><span class="header-section-number">5.2.2.1.3</span> R and Genomics</h5>
 </div>
 </div>
 </div>
-<div id="python" class="section level3" number="5.2.3">
-<h3><span class="header-section-number">5.2.3</span> Python</h3>
+<div id="python" class="section level3 hasAnchor" number="5.2.3">
+<h3><span class="header-section-number">5.2.3</span> Python<a href="general-data-analysis-tools.html#python" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/genomic-tool-glossary.html">tool glossary</a>.</p>
-<div id="resources-for-learning-python" class="section level4" number="5.2.3.1">
-<h4><span class="header-section-number">5.2.3.1</span> Resources for learning python</h4>
+<div id="resources-for-learning-python" class="section level4 hasAnchor" number="5.2.3.1">
+<h4><span class="header-section-number">5.2.3.1</span> Resources for learning python<a href="general-data-analysis-tools.html#resources-for-learning-python" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://jakevdp.github.io/PythonDataScienceHandbook/">Python Data Science Handbook</a></li>
 <li><a href="https://www.pythonforbiologists.org/">Python for Biologists</a></li>
@@ -611,8 +605,8 @@ <h4><span class="header-section-number">5.2.3.1</span> Resources for learning py
 </div>
 </div>
 </div>
-<div id="more-resources-1" class="section level2" number="5.3">
-<h2><span class="header-section-number">5.3</span> More resources</h2>
+<div id="more-resources-1" class="section level2 hasAnchor" number="5.3">
+<h2><span class="header-section-number">5.3</span> More resources<a href="general-data-analysis-tools.html#more-resources-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><p><a href="https://hutchdatascience.org/code_review/more_resources.html">A longer list of tools and resources can be found here</a></p></li>
 <li><p><a href="https://datatrail-jhu.github.io/DataTrail/index.html">DataTrail curriculum</a></p></li>
@@ -624,10 +618,17 @@ <h2><span class="header-section-number">5.3</span> More resources</h2>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -697,7 +698,7 @@ <h2><span class="header-section-number">5.3</span> More resources</h2>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/guidelines-for-good-metadata.html b/docs/no_toc/guidelines-for-good-metadata.html
index 73ba6701..2829477c 100644
--- a/docs/no_toc/guidelines-for-good-metadata.html
+++ b/docs/no_toc/guidelines-for-good-metadata.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 3 Guidelines for Good Metadata | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 3 Guidelines for Good Metadata | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="a-very-general-genomics-overview.html"/>
 <link rel="next" href="considerations-for-choosing-tools.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 3 Guidelines for Good Metadata | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,31 +535,31 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="guidelines-for-good-metadata" class="section level1" number="3">
-<h1><span class="header-section-number">Chapter 3</span> Guidelines for Good Metadata</h1>
-<div id="learning-objectives-1" class="section level2" number="3.1">
-<h2><span class="header-section-number">3.1</span> Learning Objectives</h2>
-<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png" title="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" alt="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" width="100%" /></p>
+<div id="guidelines-for-good-metadata" class="section level1 hasAnchor" number="3">
+<h1><span class="header-section-number">Chapter 3</span> Guidelines for Good Metadata<a href="guidelines-for-good-metadata.html#guidelines-for-good-metadata" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<div id="learning-objectives-1" class="section level2 hasAnchor" number="3.1">
+<h2><span class="header-section-number">3.1</span> Learning Objectives<a href="guidelines-for-good-metadata.html#learning-objectives-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png" alt="Learning objectives This chapter will demonstrate how to: Understand what metadata are and why they are so critical. Learn the basics of creating crystal clear, readable metadata" width="100%" /></p>
 </div>
-<div id="what-are-metadata" class="section level2" number="3.2">
-<h2><span class="header-section-number">3.2</span> What are metadata?</h2>
+<div id="what-are-metadata" class="section level2 hasAnchor" number="3.2">
+<h2><span class="header-section-number">3.2</span> What are metadata?<a href="guidelines-for-good-metadata.html#what-are-metadata" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Metadata are critically important descriptive information about your data.</p>
 <p><strong>Without metadata, the data themselves are useless or at best vastly limited.</strong></p>
 <p>Metadata describe how your data came to be, what organism or patient the data are from and include any and every relevant piece of information about the samples in your data set.</p>
-<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png" title="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" alt="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" width="100%" /></p>
+<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png" alt="Question: What are metadata? Answer: Anything and everything that should be known about your samples! Samples labeled A-H are in test tubes. A corresponding spreadsheet has metadata such as mouse id, processing date, treatment and etc. The researcher says ‘I know everything I need to know about these samples from their metadata!’" width="100%" /></p>
 <p>Metadata includes but isn’t limited to, the following example categories:</p>
-<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png" title="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" alt="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" width="100%" /></p>
+<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png" alt="Examples of metadata categories: Patient/organism of origin, Patient/organism information including: Demographics, Disease state, Treatment state, Time point (if applicable). Metadata also includes: Processing information like: Batch information and Processing details (for example: Isolation methods: Poly-A vs Ribo-minus) Metadata is Anything that should be known about the samples and their handling!" width="100%" /></p>
 <div class="warning">
 <p>At this time it’s important to note that if you work with human data or samples, your metadata will likely contain personal identifiable information (PII) and protected health information (PHI). It’s critical that you protect this information! For more details on this, we encourage you to see our <a href="https://jhudatascience.org/Ethical_Data_Handling_for_Cancer_Research/data-privacy.html">course about data management</a>.</p>
 </div>
 </div>
-<div id="how-to-create-metadata" class="section level2" number="3.3">
-<h2><span class="header-section-number">3.3</span> How to create metadata?</h2>
+<div id="how-to-create-metadata" class="section level2 hasAnchor" number="3.3">
+<h2><span class="header-section-number">3.3</span> How to create metadata?<a href="guidelines-for-good-metadata.html#how-to-create-metadata" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Where do these metadata come from? The notes and experimental design from anyone who played a part in collecting or processing the data and its original samples. If this includes you (meaning you have collected data and need to create metadata) let’s discuss how metadata can be made in the most useful and reproducible manner.</p>
-<div id="the-goals-in-creating-your-metadata" class="section level3" number="3.3.1">
-<h3><span class="header-section-number">3.3.1</span> The goals in creating your metadata:</h3>
-<div id="goal-a-make-it-crystal-clear-and-easily-readable-by-both-humans-and-computers" class="section level4" number="3.3.1.1">
-<h4><span class="header-section-number">3.3.1.1</span> Goal A: Make it <em>crystal clear</em> and <em>easily readable</em> by both humans and computers!</h4>
+<div id="the-goals-in-creating-your-metadata" class="section level3 hasAnchor" number="3.3.1">
+<h3><span class="header-section-number">3.3.1</span> The goals in creating your metadata:<a href="guidelines-for-good-metadata.html#the-goals-in-creating-your-metadata" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="goal-a-make-it-crystal-clear-and-easily-readable-by-both-humans-and-computers" class="section level4 hasAnchor" number="3.3.1.1">
+<h4><span class="header-section-number">3.3.1.1</span> Goal A: Make it <em>crystal clear</em> and <em>easily readable</em> by both humans and computers!<a href="guidelines-for-good-metadata.html#goal-a-make-it-crystal-clear-and-easily-readable-by-both-humans-and-computers" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Some examples of how to make your data crystal clear:
 - Look out for typos and spelling errors!
 - Don’t use acronyms unless you need to and then if you do need to make sure to explain what the acronym means.
@@ -578,8 +572,8 @@ <h4><span class="header-section-number">3.3.1.1</span> Goal A: Make it <em>cryst
 &gt; - Every cell is a single value.</li>
 </ul>
 </div>
-<div id="goal-b-avoid-introducing-errors-into-your-metadata-in-the-future" class="section level4" number="3.3.1.2">
-<h4><span class="header-section-number">3.3.1.2</span> Goal B: Avoid introducing errors into your metadata in the future!</h4>
+<div id="goal-b-avoid-introducing-errors-into-your-metadata-in-the-future" class="section level4 hasAnchor" number="3.3.1.2">
+<h4><span class="header-section-number">3.3.1.2</span> Goal B: Avoid introducing errors into your metadata in the future!<a href="guidelines-for-good-metadata.html#goal-b-avoid-introducing-errors-into-your-metadata-in-the-future" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Toward these two goals, <a href="https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989">this excellent article</a> by Broman &amp; Woo discusses metadata design rules. We will very briefly cover the major points here but highly suggest you read the original article.</p>
 <ol style="list-style-type: decimal">
 <li><p><em>Be Consistent</em> - Whatever labels and systems you choose, use it universally. This not only means in your metadata spreadsheet but also anywhere you are discussing your metadata variables.</p></li>
@@ -595,31 +589,38 @@ <h4><span class="header-section-number">3.3.1.2</span> Goal B: Avoid introducing
 <li><p><em>Use Data Validation to Avoid Errors</em> - set data types to have googlesheets or excel check that the data in the columns is the type of data it expects for a given variable.</p></li>
 </ol>
 <div class="warning">
-<p>Note that it is very dangerous to open gene data with Excel. According to <span class="citation">Ziemann, Eren, and El-Osta (<a href="#ref-Ziemann2016" role="doc-biblioref">2016</a>)</span>, approximately one-fifth of papers with Excel gene lists have errors. This happens because Excel wants to interpret everything as a date. We strongly caution against opening (and saving afterward) gene data in Excel.
-<img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png" title="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " alt="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " width="100%" /></p>
+<p>Note that it is very dangerous to open gene data with Excel. According to <span class="citation">Ziemann, Eren, and El-Osta (<a href="#ref-Ziemann2016">2016</a>)</span>, approximately one-fifth of papers with Excel gene lists have errors. This happens because Excel wants to interpret everything as a date. We strongly caution against opening (and saving afterward) gene data in Excel.
+<img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png" alt="‘Approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions’ Ziemann, Eren, El-Osta, 2016. On the left, a meme that shows Excel asking ‘is this a date?’ in response to seeing ‘any data at all’. " width="100%" /></p>
 </div>
 </div>
 </div>
-<div id="to-recap" class="section level3" number="3.3.2">
-<h3><span class="header-section-number">3.3.2</span> To recap:</h3>
-<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png" title="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" alt="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" width="100%" /></p>
-<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png" title="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" alt="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" width="100%" /></p>
+<div id="to-recap" class="section level3 hasAnchor" number="3.3.2">
+<h3><span class="header-section-number">3.3.2</span> To recap:<a href="guidelines-for-good-metadata.html#to-recap" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png" alt="Rules for creating metadata (from Broman &amp; Woo, 2017) Be Consistent. Choose good names for things. Write Dates as YYYY-MM-DD.No Empty Cells. Put Just One Thing in a Cell. Make it a Rectangle" width="100%" /></p>
+<p><img src="resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png" alt="Rules for creating metadata continued  (from Broman &amp; Woo, 2017). Create a Data Dictionary. No Calculations in the Raw Data Files. Do Not Use Font Color or Highlighting as Data. Make Backups. Use Data Validation to Avoid Errors" width="100%" /></p>
 <p>If you are not the person who has the information needed to create metadata, or you believe that another individual already has this information, make sure you get ahold of the metadata that correspond to your data. It will be critical for you to have to do any sort of meaningful analysis!</p>
 
 </div>
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-Ziemann2016" class="csl-entry">
 Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. <span>“Gene Name Errors Are Widespread in the Scientific Literature.”</span> <em>Genome Biology</em> 17 (1). <a href="https://doi.org/10.1186/s13059-016-1044-7">https://doi.org/10.1186/s13059-016-1044-7</a>.
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -689,7 +690,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/index.html b/docs/no_toc/index.html
index a08dd4fa..260dc906 100644
--- a/docs/no_toc/index.html
+++ b/docs/no_toc/index.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 
 <link rel="next" href="introduction.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -543,28 +537,35 @@ <h1>
 </div>
 <div id="header">
 <h1 class="title">Choosing Genomics Tools</h1>
-<p class="date"><em>May, 2024</em></p>
+<p class="date"><em>December, 2024</em></p>
 </div>
-<div id="about-this-course" class="section level1 unnumbered">
-<h1>About this Course</h1>
+<div id="about-this-course" class="section level1 unnumbered hasAnchor">
+<h1>About this Course<a href="index.html#about-this-course" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <p>This course is part of a series of courses for the <a href="https://itcr.cancer.gov/">Informatics Technology for Cancer Research (ITCR)</a> called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. This initiative is funded by the following grant: <a href="https://www.cancer.gov/">National Cancer Institute (NCI)</a> UE5 CA254170. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at <a href="www.itcrtraining.org">www.itcrtraining.org</a> for more information.</p>
-<div id="available-course-formats" class="section level2" number="0.1">
-<h2><span class="header-section-number">0.1</span> Available course formats</h2>
+<div id="available-course-formats" class="section level2 hasAnchor" number="0.1">
+<h2><span class="header-section-number">0.1</span> Available course formats<a href="index.html#available-course-formats" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>This course is available in multiple formats which allows you to take it in the way that best suites your needs. You can take it for certificate which can be for free or fee.</p>
 <ul>
-<li>The material for this course can be viewed without login requirement on this <a href="https://hutchdatascience.org/Choosing_Genomics_Tools/">Bookdown website</a>. This format might be most appropriate for you if you rely on screen-reader technology.</li>
-<li>This course can be taken for <a href="">free certification through Leanpub</a>.</li>
-<li>This course can be taken on <a href="https://www.coursera.org/learn/">Coursera for certification here</a> (but it is not available for free on Coursera).</li>
+<li>The material for this course can be viewed without login requirement on this <a href="https://hutchdatascience.org/Choosing_Genomics_Tools/">Bookdown website</a>. This format might be most appropriate for you if you rely on screen-reader technology.
+<!-- - This course can be taken for [free certification through Leanpub](). --></li>
+<li>This course can be taken on <a href="https://www.coursera.org/specializations/researchers-guide-to-omic-data/">Coursera for certification here</a> (but it is not available for free on Coursera).</li>
 <li>Our courses are open source, you can find the <a href="https://github.com/jhudsl/Choosing_Genomics_Tools">source material for this course on GitHub</a>.</li>
 </ul>
 
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -634,7 +635,7 @@ <h2><span class="header-section-number">0.1</span> Available course formats</h2>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/index.md b/docs/no_toc/index.md
index ed257385..9e7d61eb 100644
--- a/docs/no_toc/index.md
+++ b/docs/no_toc/index.md
@@ -1,6 +1,6 @@
 ---
 title: "Choosing Genomics Tools"
-date: "May, 2024"
+date: "December, 2024"
 site: bookdown::bookdown_site
 documentclass: book
 bibliography: [book.bib]
@@ -22,6 +22,6 @@ This course is part of a series of courses for the [Informatics Technology for C
 This course is available in multiple formats which allows you to take it in the way that best suites your needs. You can take it for certificate which can be for free or fee.
 
 - The material for this course can be viewed without login requirement on this [Bookdown website](https://hutchdatascience.org/Choosing_Genomics_Tools/). This format might be most appropriate for you if you rely on screen-reader technology.
-- This course can be taken for [free certification through Leanpub]().
-- This course can be taken on [Coursera for certification here](https://www.coursera.org/learn/) (but it is not available for free on Coursera).
+<!-- - This course can be taken for [free certification through Leanpub](). -->
+- This course can be taken on [Coursera for certification here](https://www.coursera.org/specializations/researchers-guide-to-omic-data/) (but it is not available for free on Coursera).
 - Our courses are open source, you can find the [source material for this course on GitHub](https://github.com/jhudsl/Choosing_Genomics_Tools).
diff --git a/docs/no_toc/introduction.html b/docs/no_toc/introduction.html
index 87eec92f..5a758928 100644
--- a/docs/no_toc/introduction.html
+++ b/docs/no_toc/introduction.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 1 Introduction | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 1 Introduction | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="index.html"/>
 <link rel="next" href="a-very-general-genomics-overview.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 1 Introduction | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,13 +535,13 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="introduction" class="section level1" number="1">
-<h1><span class="header-section-number">Chapter 1</span> Introduction</h1>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png" title="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" alt="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" width="100%" /></p>
+<div id="introduction" class="section level1 hasAnchor" number="1">
+<h1><span class="header-section-number">Chapter 1</span> Introduction<a href="introduction.html#introduction" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png" alt="Title image: Choosing Genomics Tools Written by: Candace Savonen. Part of the ITN (ITCR training Network) and created through the Johns Hopkins Data Science Lab" width="100%" /></p>
 <p>This is a <em>living</em> course meaning it is constantly changing and being updated. The goal for this course is to be a “wikipedia” of omic data.
 If you’d like to contribute, <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">you can file a pull request on GitHub</a> if you are comfortable with that sort of thing or email <code>csavonen@fredhutch.org</code> to ask how to get started.</p>
-<div id="target-audience" class="section level2" number="1.1">
-<h2><span class="header-section-number">1.1</span> Target Audience</h2>
+<div id="target-audience" class="section level2 hasAnchor" number="1.1">
+<h2><span class="header-section-number">1.1</span> Target Audience<a href="introduction.html#target-audience" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The course is intended for students in the biomedical sciences and researchers who have been given data and don’t know what to do with it or would like an overview of the different genomic data types that are out there.</p>
 <p><em>This course is written for individuals who:</em></p>
 <ul>
@@ -555,42 +549,49 @@ <h2><span class="header-section-number">1.1</span> Target Audience</h2>
 <li>Want a basic overview of genomic data types.</li>
 <li>Want to find resources for processing and interpreting genomics data.</li>
 </ul>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png" title="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" alt="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" width="100%" /></p>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png" alt="For individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of their genomic data type. Want to find resources for processing and interpreting genomics data" width="100%" /></p>
 </div>
-<div id="topics-covered" class="section level2" number="1.2">
-<h2><span class="header-section-number">1.2</span> Topics covered:</h2>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png" title=" " alt=" " width="100%" /></p>
+<div id="topics-covered" class="section level2 hasAnchor" number="1.2">
+<h2><span class="header-section-number">1.2</span> Topics covered:<a href="introduction.html#topics-covered" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png" alt=" " width="100%" /></p>
 </div>
-<div id="motivation" class="section level2" number="1.3">
-<h2><span class="header-section-number">1.3</span> Motivation</h2>
+<div id="motivation" class="section level2 hasAnchor" number="1.3">
+<h2><span class="header-section-number">1.3</span> Motivation<a href="introduction.html#motivation" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Cancer datasets are plentiful, complicated, and hold untold amounts of information regarding cancer biology. Cancer researchers are working to apply their expertise to the analysis of these vast amounts of data but training opportunities to properly equip them in these efforts can be sparse. This includes training in reproducible data analysis methods.</p>
 <p>Often students and researchers need to utilize genomic data to reach the next steps of their research but may not have formal training in computational methods or the basics of the genomic data they are attempting to utilize.</p>
 <p>Often researchers receive their genomic data processed from another lab or institution, and although they are excited to gain insights from it to inform the next steps of their research, they may not have a practical understanding of how the data they have received came to be or what needs to be done with it.</p>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_0.png" title="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" alt="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" width="100%" /></p>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_0.png" alt="This researcher is very excited because they’ve received their genomic data and are ready to gain insights from it to inform the next steps of their research. An email sent to them says ‘your data are ready’" width="100%" /></p>
 <p>As an example, data file formats may not have been covered in their training, and the data they received seems unintelligible and not as straightforward as they hoped.</p>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_13.png" title="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" alt="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" width="100%" /></p>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1221ea485b7_0_13.png" alt="The researcher may attempt to open their newly received genomic data and be completely perplexed by the file formats or what these data even represent. The researcher says ‘What is this and what do I do with it’" width="100%" /></p>
 <p>This course attempts to give this researcher the basic bearings and resources regarding their data, in hopes that they will be equipped and informed about how to obtain the insights for their researcher they originally aimed to find.</p>
 </div>
-<div id="curriculum" class="section level2" number="1.4">
-<h2><span class="header-section-number">1.4</span> Curriculum</h2>
-<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png" title="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " alt="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " width="100%" /></p>
+<div id="curriculum" class="section level2 hasAnchor" number="1.4">
+<h2><span class="header-section-number">1.4</span> Curriculum<a href="introduction.html#curriculum" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png" alt="Overall Course Learning Objectives. This course will demonstrate how too: Understand the overall workflow associated with processing their genomic data  Be aware of caveats based on their specific type of data. Find tutorials to help them process their genomic data. Choose tools for processing their genomic data. Choose tools for interpreting their genomic data " width="100%" /></p>
 <p><strong>Goal of this course:</strong><br />
 Equip learners with tutorials and resources so they can understand and interpret their genomic data in a way that helps them meet their goals and handle the data properly.
 This includes helping learners formulate questions they will need to ask others about their data</p>
 <p><strong>What is not the goal</strong><br />
 Teach learners about choosing parameters or about the ins and outs of every genomic tool they might be interested in. This course is meant to connect people to other resources that will help them with the specifics of their genomic data and help learners have more efficient and fruitful discussions about their data with bioinformatic experts.</p>
 </div>
-<div id="how-to-use-the-course" class="section level2" number="1.5">
-<h2><span class="header-section-number">1.5</span> How to use the course</h2>
+<div id="how-to-use-the-course" class="section level2 hasAnchor" number="1.5">
+<h2><span class="header-section-number">1.5</span> How to use the course<a href="introduction.html#how-to-use-the-course" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>This course is designed to be a jumping off point to more specific resources based on a genomic data type the learner has in mind (or currently on their computer). We encourage learners to follow links to resources we provide and feel free to jump around to chapters that are most useful for them.</p>
 
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -660,7 +661,7 @@ <h2><span class="header-section-number">1.5</span> How to use the course</h2>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/itcr--omic-tool-glossary.html b/docs/no_toc/itcr--omic-tool-glossary.html
index 4284767c..59a382dd 100644
--- a/docs/no_toc/itcr--omic-tool-glossary.html
+++ b/docs/no_toc/itcr--omic-tool-glossary.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 22 ITCR -omic Tool Glossary | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 22 ITCR -omic Tool Glossary | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="microbiome-sequencing.html"/>
 <link rel="next" href="about-the-authors.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 22 ITCR -omic Tool Glossary | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,8 +535,8 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="itcr--omic-tool-glossary" class="section level1" number="22">
-<h1><span class="header-section-number">Chapter 22</span> ITCR -omic Tool Glossary</h1>
+<div id="itcr--omic-tool-glossary" class="section level1 hasAnchor" number="22">
+<h1><span class="header-section-number">Chapter 22</span> ITCR -omic Tool Glossary<a href="itcr--omic-tool-glossary.html#itcr--omic-tool-glossary" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <p>Here’s all the tools that have been mentioned in this course or are otherwise recommended for your use. The list is in alphabetical order.</p>
 <ul>
 <li><a href="itcr--omic-tool-glossary.html#archs4">ARCHS4</a></li>
@@ -566,16 +560,16 @@ <h1><span class="header-section-number">Chapter 22</span> ITCR -omic Tool Glossa
 <li><a href="itcr--omic-tool-glossary.html#webmev">WebMeV</a></li>
 <li><a href="itcr--omic-tool-glossary.html#xena">Xena</a></li>
 </ul>
-<div id="archs4" class="section level2" number="22.1">
-<h2><span class="header-section-number">22.1</span> ARCHS4</h2>
+<div id="archs4" class="section level2 hasAnchor" number="22.1">
+<h2><span class="header-section-number">22.1</span> ARCHS4<a href="itcr--omic-tool-glossary.html#archs4" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>All RNA-seq and ChIP-seq sample and signature search (ARCHS4) (<a href="https://maayanlab.cloud/archs4/" class="uri">https://maayanlab.cloud/archs4/</a>) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. The ARCHS4 website provides the uniformly processed data for download and programmatic access in H5 format, and as a 3-dimensional interactive viewer and search engine. Users can search and browse the data by metadata enhanced annotations, and can submit their own gene sets for search. Subsets of selected samples can be downloaded as a tab delimited text file that is ready for loading into the R programming environment. To generate the ARCHS4 resource, the kallisto aligner is applied in an efficient parallelized cloud infrastructure. Human and mouse samples are aligned against the most recent Ensembl annotation (Ensembl 107).</p>
 </div>
-<div id="bioconductor" class="section level2" number="22.2">
-<h2><span class="header-section-number">22.2</span> Bioconductor</h2>
+<div id="bioconductor" class="section level2 hasAnchor" number="22.2">
+<h2><span class="header-section-number">22.2</span> Bioconductor<a href="itcr--omic-tool-glossary.html#bioconductor" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The mission of the Bioconductor project is to develop, support, and disseminate free open source software that facilitates rigorous and reproducible analysis of data from current and emerging biological assays. We are dedicated to building a diverse, collaborative, and welcoming community of developers and data scientists.</p>
 <p>Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community. Bioconductor is also available as Docker images.</p>
-<div id="notable-bioconductor-genomics-tools" class="section level3" number="22.2.1">
-<h3><span class="header-section-number">22.2.1</span> Notable Bioconductor genomics tools:</h3>
+<div id="notable-bioconductor-genomics-tools" class="section level3 hasAnchor" number="22.2.1">
+<h3><span class="header-section-number">22.2.1</span> Notable Bioconductor genomics tools:<a href="itcr--omic-tool-glossary.html#notable-bioconductor-genomics-tools" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><p><a href="https://bioconductor.org/packages/release/bioc/html/annotatr.html">annotatr</a></p></li>
 <li><p><a href="https://bioconductor.org/packages/release/bioc/html/ensembldb.html">ensembldb</a></p></li>
@@ -595,23 +589,23 @@ <h3><span class="header-section-number">22.2.1</span> Notable Bioconductor genom
 </ul>
 </div>
 </div>
-<div id="cancer-models" class="section level2" number="22.3">
-<h2><span class="header-section-number">22.3</span> Cancer Models</h2>
+<div id="cancer-models" class="section level2 hasAnchor" number="22.3">
+<h2><span class="header-section-number">22.3</span> Cancer Models<a href="itcr--omic-tool-glossary.html#cancer-models" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Patient Derived Cancer Models Finder (www.cancermodels.org) is a cancer research platform that aggregates clinical, genomic and functional data from patient-derived xenografts, organoids and cell lines. The PDCM Finder standardises, harmonises and integrates the complex and diverse data associated with PDCMs for cancer community.</p>
 <p>Data types used are model meta data, related clinical metadata from the sample for which the model was derived, e.g. molecular and treatment-based. Data are preprocessed, consistently semantically annotated, harmonised and FAIR.</p>
 <p>PDCM Finder contains &gt;6200 models across 13 cancer types, including rare pediatric models (17%) and models from minority ethnic backgrounds (33%), making it the largest free to consumer and open access resource of this kind.
 Get started at www.cancermodels.org to browse and query models by cancer type</p>
 </div>
-<div id="civic" class="section level2" number="22.4">
-<h2><span class="header-section-number">22.4</span> CIViC</h2>
+<div id="civic" class="section level2 hasAnchor" number="22.4">
+<h2><span class="header-section-number">22.4</span> CIViC<a href="itcr--omic-tool-glossary.html#civic" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="www.civicdb.org">CIViC</a> is a knowledgebase and curation interface for the clinical interpretation of variants in cancer. Evidence is curated from published literature describing the diagnostic, prognostic, predictive, predisposing, oncogenic, or functional role of variants in specific cancer types. Evidence submitted by community curators is revised and moderated by expert editors. Individual evidence is synthesized into gene summaries, variant summaries and variant-disease assertions of specific clinical relevance. Anyone can make use of CIViC knowledge through the open web interface or API. Information on how to use or contribute to CIViC is available in our help docs (docs.civicdb.org). The main distinguishing feature of CIViC compared to similar resources it is total commitment to open data sharing. All data are available in the Public Domain (CC0). The code is available for any use under an MIT license.</p>
 </div>
-<div id="ctat" class="section level2" number="22.5">
-<h2><span class="header-section-number">22.5</span> CTAT</h2>
+<div id="ctat" class="section level2 hasAnchor" number="22.5">
+<h2><span class="header-section-number">22.5</span> CTAT<a href="itcr--omic-tool-glossary.html#ctat" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://github.com/NCIP/Trinity_CTAT/wiki">The Trinity Cancer Transcriptome Analysis Toolkit (CTAT)</a> provides a diverse collection of tools to gain insights into the biology of cancer through the lens of the transcriptome. Using RNA-seq as input, CTAT modules enable detection of mutations, fusion transcripts, copy number aberrations, cancer-specific splicing aberrations, and oncogenic viruses including insertions into the human genome. CTAT uses both read mapping and de novo assembly methods to analyze RNA-seq, leveraging tumor bulk and single cell transcriptomes. CTAT modules provide interactive visualizations as outputs, are easily installed for local execution or run via cloud computing (eg. Terra), have detailed user guides and tutorials, and are well-supported through user forums.</p>
 </div>
-<div id="deepphe" class="section level2" number="22.6">
-<h2><span class="header-section-number">22.6</span> DeepPhe</h2>
+<div id="deepphe" class="section level2 hasAnchor" number="22.6">
+<h2><span class="header-section-number">22.6</span> DeepPhe<a href="itcr--omic-tool-glossary.html#deepphe" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://deepphe.github.io/">DeepPhe: Natural Language Processing Tools for Cancer Research</a></p>
 <p>Under development since 2014, the DeepPhe suite of software tools aims to extract deep phenotype information from the Electronic Medical Records from patients with cancer. DeepPhe combines:</p>
 <ol style="list-style-type: decimal">
@@ -626,67 +620,67 @@ <h2><span class="header-section-number">22.6</span> DeepPhe</h2>
 </ol>
 <p>DeepPhe tools are available for download and installation from the <a href="https://deepphe.github.io/">DeepPhe website</a> under an open-source license for non-commercial use.</p>
 </div>
-<div id="genetic-cancer-risk-detector-garde" class="section level2" number="22.7">
-<h2><span class="header-section-number">22.7</span> Genetic Cancer Risk Detector (GARDE)</h2>
+<div id="genetic-cancer-risk-detector-garde" class="section level2 hasAnchor" number="22.7">
+<h2><span class="header-section-number">22.7</span> Genetic Cancer Risk Detector (GARDE)<a href="itcr--omic-tool-glossary.html#genetic-cancer-risk-detector-garde" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://reimagineehr.utah.edu/innovations/garde/">Genetic Cancer Risk Detector (GARDE)</a> screens and identifies patients who meet National Comprehensive Cancer Network (NCCN) criteria for genetic evaluation of familial cancer risk based on their family history in the EHR using both structured data and natural language processing of free-text data. Patients identified by GARDE are imported into an EHR’s population health management dashboard (e.g., Epic’s Healthy Planet module) where genetic counseling staff review individual cases, select, and send bulk outreach messages to patients via chatbot and/or through the patient portal.</p>
 <p>GARDE is a population clinical decision support (CDS) platform based on Fast Healthcare Interoperability Resources (FHIR) and CDS Hooks standards to support interoperability and logic sharing beyond single vendor solutions.</p>
 </div>
-<div id="genepattern" class="section level2" number="22.8">
-<h2><span class="header-section-number">22.8</span> GenePattern</h2>
+<div id="genepattern" class="section level2 hasAnchor" number="22.8">
+<h2><span class="header-section-number">22.8</span> GenePattern<a href="itcr--omic-tool-glossary.html#genepattern" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>GenePattern, www.genepattern.org, is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. Analyses include general machine learning methods, the gene set enrichment analysis suite, ’omics-specific tools for bulk and single-cell gene expression, proteomics, flow cytometry, variant annotation, sequence variation and others, as well as cancer-specific analyses. Also included are data preprocessing and utility tools. A web-based interface provides easy, non-programmatic access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research.</p>
 <p>The GenePattern Notebook interface, notebook.genepattern.org, extends the Jupyter Notebook system to allow users to combine GenePattern analyses with text, graphics, and code to create complete research narratives. It includes many additional features to make notebooks accessible to non-programmers. The online GenePattern Notebook Workspace allows investigators to create, run, and collaborate on notebooks using only a web browser. A library of GenePattern Notebooks implementing common scientific workflows is available for investigators to use as templates and adapt to their own requirements.</p>
 <p>To get started with GenePattern you can go through the GenePattern Quick Start Tutorial, view the GenePattern User Guide, or the videos on our YouTube channel. To learn more about GenePattern Notebook, view the GenePattern Notebook Quick Start, GenePattern Notebook documentation, run through the tutorial notebooks (click the Tutorial button), or view the videos on the GenePattern Notebooks YouTube channel.</p>
 </div>
-<div id="gene-set-enrichment-analysis-gsea" class="section level2" number="22.9">
-<h2><span class="header-section-number">22.9</span> Gene Set Enrichment Analysis (GSEA)</h2>
+<div id="gene-set-enrichment-analysis-gsea" class="section level2 hasAnchor" number="22.9">
+<h2><span class="header-section-number">22.9</span> Gene Set Enrichment Analysis (GSEA)<a href="itcr--omic-tool-glossary.html#gene-set-enrichment-analysis-gsea" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://www.gsea-msigdb.org/gsea/index.jsp">Gene Set Enrichment Analysis (GSEA)</a> is a method to identify the coordinate activation or repression of groups of genes that share common biological functions, pathways, chromosomal locations, or regulation, thereby distinguishing even subtle differences between phenotypes or cellular states. Gene set-based enrichment analysis is now standard practice for interpreting global transcription profiling experiments and elucidating the biological mechanisms associated with disease and other biological phenotypes of interest. The method is more powerful than typical single-gene approaches to comparing phenotypes, as it can identify sets of genes (e.g., perturbation signatures or molecular pathways) that are coordinately up- or downregulated when each gene in the set may not be significantly differentially expressed. The GSEA software provides useful visualizations and reports for the exploration and interpretation of results. GSEA bundles direct access to the Molecular Signatures Database (MSigDB) – a comprehensive curated repository of annotated gene sets representing signatures derived from publications, pathway databases, and other sources of public data; MSigDB can also be used independently.</p>
 <p>The website for the GSEA-MSigDB resource can be found at gsea-msigdb.org. To get started with GSEA you can view the GSEA User Guide, and access the GSEA software through the downloads page or through the GSEA modules available on GenePattern. See the MSigDB section of the website for more information about MSigDB and to interactively explore the gene sets and their annotations. User support for GSEA and MSigDB is available through our help forum.</p>
 </div>
-<div id="integrative-genomics-viewer-igv" class="section level2" number="22.10">
-<h2><span class="header-section-number">22.10</span> Integrative Genomics Viewer (IGV)</h2>
+<div id="integrative-genomics-viewer-igv" class="section level2 hasAnchor" number="22.10">
+<h2><span class="header-section-number">22.10</span> Integrative Genomics Viewer (IGV)<a href="itcr--omic-tool-glossary.html#integrative-genomics-viewer-igv" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The <a href="https://software.broadinstitute.org/software/igv/">Integrative Genomics Viewer (IGV)</a> is a track-based browser for interactively exploring genomic data mapped to a reference genome. IGV supports all the standard genomic data types (aligned reads, variants, signal peaks, genome annotations, copy number variation, etc.) as well as sample information, such as clinical, phenotypic, or other attributes. IGV provides great flexibility in loading data, whether investigator generated or publicly available, directly from multiple disparate sources without the need for any pre-processing. Supported data sources include local file systems; web servers on the user’s intranet or the Internet; commercial cloud providers (Google, Amazon, Azure, Dropbox); web links to data in public repositories. Authentication to access private data on the web is supported with the industry standard OAuth protocol.</p>
 <p>IGV is available in multiple forms, including both end-user applications and versions for use by developers. The IGV website at <a href="https://igv.org" class="uri">https://igv.org</a> provides access to all modalities of IGV. Download and install the IGV Desktop application from the downloads page. To learn about using the application see the tutorial videos on the IGV YouTube channel and the online User Guide. The IGV-Web app is available at <a href="https://igv.org/app" class="uri">https://igv.org/app</a>. To learn about using the app, the Help link in the menu bar provides access to the documentation, and see also the tutorial videos on the YouTube channel. The igv.js JavaScript component is for web developers who wish to embed IGV in their web apps or portals. More information can be found in the Readme file and the Wiki in the igv.js GitHub repository. IGV user support is available through the igv-help online forum and the GitHub repositories.</p>
 </div>
-<div id="ndex" class="section level2" number="22.11">
-<h2><span class="header-section-number">22.11</span> NDEx</h2>
+<div id="ndex" class="section level2 hasAnchor" number="22.11">
+<h2><span class="header-section-number">22.11</span> NDEx<a href="itcr--omic-tool-glossary.html#ndex" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The <a href="https://www.ndexbio.org/#/">Network Data Exchange (NDEx)</a> project provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge. A distinctive feature of NDEx is that it serves as a home for models that are currently available only as figures, tables, or supplementary information, such as networks produced via systematic mining and integration of large-scale molecular data.</p>
 <p>NDEx includes features to support data distribution and access according to FAIR principles. Its full integration with Cytoscape, the popular desktop application for network analysis and visualization, provides the cloud back-end component for data I/O; so, if a network file format can be opened in Cytoscape, it can also be stored in (and retrieved from) NDEx.</p>
 <p>NDEx can be accessed via its web user interface or programmatically, via REST API and client libraries in Python, R, Java. Web applications can interface with NDEx via JavaScript: MSigDB, CRAVAT, cBioPortal and IQuery, are all examples of web applications integrated with NDEx.</p>
 <p>For more information, please review the About NDEx page.
 To get started, visit the NDEx public server: there, you can review the NDEx FAQ, access documentation, contact us, and search or browse thousands of biological network models.</p>
 </div>
-<div id="multiassayexperiment" class="section level2" number="22.12">
-<h2><span class="header-section-number">22.12</span> MultiAssayExperiment</h2>
+<div id="multiassayexperiment" class="section level2 hasAnchor" number="22.12">
+<h2><span class="header-section-number">22.12</span> MultiAssayExperiment<a href="itcr--omic-tool-glossary.html#multiassayexperiment" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://bioconductor.org/packages/MultiAssayExperiment/">MultiAssayExperiment</a> is an R/Bioconductor package that harmonizes data management, manipulation, and subsetting of multiple experimental assays performed on an overlapping set of specimens. It supports on-disk and remote data storage, and provides reshaping tools for adaptability to arbitrary downstream analysis.</p>
 <p>MultiAssayExperiment is distinct from alternative approaches in its focus on multi’omic data management and manipulation and in its integration with the Bioconductor ecosystem: it is used by more than 50 other Bioconductor packages, it provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment while supporting an open-ended mix of data classes for individual assays, and it allows subsetting by genomic ranges, row names, phenotypic data, and assays.</p>
 <p>You can get started with the MultiAssayExperiment Bioconductor package documentation, or start with prebuilt MultiAssayExperiments objects from <a href="https://bioconductor.org/packages/curatedTCGAData/">curatedTCGAData</a>, <a href="https://bioconductor.org/packages/cBioPortalData/">cBioPortalData</a>, or <a href="https://bioconductor.org/packages/SingleCellMultiModal/">SingleCellMultiModal</a>.</p>
 </div>
-<div id="opencravat" class="section level2" number="22.13">
-<h2><span class="header-section-number">22.13</span> OpenCRAVAT</h2>
+<div id="opencravat" class="section level2 hasAnchor" number="22.13">
+<h2><span class="header-section-number">22.13</span> OpenCRAVAT<a href="itcr--omic-tool-glossary.html#opencravat" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://run.opencravat.org">OpenCRAVAT</a> uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations. To get started go to opencravat.org. Download and run on your local machine, multi-user servers, at <a href="https://run.opencravat.org" class="uri">https://run.opencravat.org</a> or in the cloud. We offer a broader selection of annotation tools than comparable software and results can be explored with an interactive GUI that provides customized filtering options, interactive tables and widgets. Use it for a single sample or a large cohort, or pull single variant reports with a structured url (Example: <a href="https://run.opencravat.org/webapps/variantreport/index.html?chrom=chr11&amp;pos=48123823&amp;ref_base=A&amp;alt_base=C" class="uri">https://run.opencravat.org/webapps/variantreport/index.html?chrom=chr11&amp;pos=48123823&amp;ref_base=A&amp;alt_base=C</a> )</p>
 </div>
-<div id="pvactools" class="section level2" number="22.14">
-<h2><span class="header-section-number">22.14</span> pVACtools</h2>
+<div id="pvactools" class="section level2 hasAnchor" number="22.14">
+<h2><span class="header-section-number">22.14</span> pVACtools<a href="itcr--omic-tool-glossary.html#pvactools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines. We have built a computational framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant allele expression, peptide binding affinities, and determination whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows clinical users to efficiently generate, review, and interpret results, selecting candidate peptides for individual patient vaccine designs. Additional modules support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq and pVACfuse), prioritization, and selection using a graphical Web-based interface (pVACview), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide vaccines. pVACtools is available at <a href="http://www.pvactools.org" class="uri">http://www.pvactools.org</a>.</p>
 </div>
-<div id="tumordecon" class="section level2" number="22.15">
-<h2><span class="header-section-number">22.15</span> TumorDecon</h2>
+<div id="tumordecon" class="section level2 hasAnchor" number="22.15">
+<h2><span class="header-section-number">22.15</span> TumorDecon<a href="itcr--omic-tool-glossary.html#tumordecon" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>It is only software that includes these four digital cytometry methods in one platform, so that users can compare the results of these methods. It is the only software that includes a method for creating signature matrix from single cell gene expression data.</p>
 <p>TumorDecon software includes four deconvolution methods (DeconRNAseq [Gong2013], CIBERSORT [Newman2015], ssGSEA [Şenbabaoğlu2016], Singscore [Foroutan2018]) and several signature matrices of various cell types, including LM22. The input of this software is the gene expression profile of the tumor, and the output is the relative number of each cell type and several visualization plots. Users have an option to choose any of the implemented deconvolution methods and included signature matrices or import their own signature matrix to get the results. Additionally, TumorDecon can be used to generate customized signature matrices from single-cell RNA-sequence profiles.</p>
 <p>In addition to the 3 tutorials provided on GitHub (tutorial.py, sig_matrix_tutorial.py, &amp; full_tutorial.py) there is a User Manual available at: <a href="https://people.math.umass.edu/~aronow/TumorDecon" class="uri">https://people.math.umass.edu/~aronow/TumorDecon</a></p>
 <p>TumorDecon is available on Github (<a href="https://github.com/ShahriyariLab/TumorDecon" class="uri">https://github.com/ShahriyariLab/TumorDecon</a>) and PyPI (<a href="https://pypi.org/project/TumorDecon/" class="uri">https://pypi.org/project/TumorDecon/</a>).</p>
 <p>For more info please see: Rachel A. Aronow, Shaya Akbarinejad, Trang Le, Sumeyye Su, Leili Shahriyari, TumorDecon: A digital cytometry software, SoftwareX, Volume 18, 2022, 101072, <a href="https://doi.org/10.1016/j.softx.2022.101072" class="uri">https://doi.org/10.1016/j.softx.2022.101072</a>.</p>
 </div>
-<div id="webmev" class="section level2" number="22.16">
-<h2><span class="header-section-number">22.16</span> WebMeV</h2>
+<div id="webmev" class="section level2 hasAnchor" number="22.16">
+<h2><span class="header-section-number">22.16</span> WebMeV<a href="itcr--omic-tool-glossary.html#webmev" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://webmev.tm4.org">WebMeV</a> is an online tool that facilitates analysis of large-scale RNA-seq and other multi-omic datasets by providing intuitive access to advanced analytical methods and high-performance computing for a wide range of basic, clinical, and translational researchers. Although WebMeV provides support for “bulk” RNA-seq data, single-cell RNA-seq, and other types of -omic data and provides easy access to public data resources such as The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx)—as well as user-provided data. WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses that are increasingly required for genomic data analysis. WebMeV’s design places an emphasis on user-driven data analysis by providing users the ability to visualize, interact with, and dissect genomic data at each step in the analysis with a “point-and-click” interactive data environment. Although the primary input is normalized “count matrices,” WebMeV does include tools for data normalization and quality control and uses Dropbox and Google Drive as means of easily uploading data. Analytical methods include statistical tests for comparing cohorts, for identifying gene seats, for doing functional enrichment analysis on gene sets (GSEA), and for inferring gene regulatory network models and comparing these networks between phenotypes to understand the drivers of disease. WebMeV also provides a platform to support reproducible research and makes code for the entire system and its component methods available as open-source software code.</p>
 </div>
-<div id="xena" class="section level2" number="22.17">
-<h2><span class="header-section-number">22.17</span> Xena</h2>
+<div id="xena" class="section level2 hasAnchor" number="22.17">
+<h2><span class="header-section-number">22.17</span> Xena<a href="itcr--omic-tool-glossary.html#xena" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="http://xena.ucsc.edu/">UCSC Xena</a> is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations.</p>
 <p>Xena showcases seminal cancer genomics datasets from TCGA, the Pan-Cancer Atlas, GDC, PCAWG, ICGC, and more; a total of more than 1500 datasets across 50 cancer types. We support virtually any type of functional genomics data (sometimes known as level 3 or 4 data). This includes SNPs, INDELs, copy number variation, gene expression, ATAC-seq, DNA methylation, exon-, transcript-, miRNA-, lncRNA-expression and structural variants. We also support clinical data such as phenotype information, subtype classifications and biomarkers. All of our data is available for download via python or R APIs, or through our URL links.</p>
-<div id="questions-xena-can-help-you-answer-include" class="section level3" number="22.17.1">
-<h3><span class="header-section-number">22.17.1</span> Questions Xena can help you answer include:</h3>
+<div id="questions-xena-can-help-you-answer-include" class="section level3 hasAnchor" number="22.17.1">
+<h3><span class="header-section-number">22.17.1</span> Questions Xena can help you answer include:<a href="itcr--omic-tool-glossary.html#questions-xena-can-help-you-answer-include" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li>Is overexpression of this gene associated with better survival?</li>
 <li>What genes are differentially expressed between these two groups of samples?</li>
@@ -699,10 +693,17 @@ <h3><span class="header-section-number">22.17.1</span> Questions Xena can help y
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -772,7 +773,7 @@ <h3><span class="header-section-number">22.17.1</span> Questions Xena can help y
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections-hash.css b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections-hash.css
new file mode 100644
index 00000000..b563ec97
--- /dev/null
+++ b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections-hash.css
@@ -0,0 +1,2 @@
+/* Styles for section anchors */
+a.anchor-section::before {content: '#';font-size: 80%;}
diff --git a/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.css b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.css
new file mode 100644
index 00000000..041905f8
--- /dev/null
+++ b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.css
@@ -0,0 +1,4 @@
+/* Styles for section anchors */
+a.anchor-section {margin-left: 10px; visibility: hidden; color: inherit;}
+.hasAnchor:hover a.anchor-section {visibility: visible;}
+ul > li > .anchor-section {display: none;}
diff --git a/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.js b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.js
new file mode 100644
index 00000000..fee005d9
--- /dev/null
+++ b/docs/no_toc/libs/anchor-sections-1.1.0/anchor-sections.js
@@ -0,0 +1,11 @@
+document.addEventListener('DOMContentLoaded', function () {
+  // If section divs is used, we need to put the anchor in the child header
+  const headers = document.querySelectorAll("div.hasAnchor.section[class*='level'] > :first-child")
+
+  headers.forEach(function (x) {
+    // Add to the header node
+    if (!x.classList.contains('hasAnchor')) x.classList.add('hasAnchor')
+    // Remove from the section or div created by Pandoc
+    x.parentElement.classList.remove('hasAnchor')
+  })
+})
diff --git a/docs/no_toc/libs/gitbook-2.6.7/css/plugin-highlight.css b/docs/no_toc/libs/gitbook-2.6.7/css/plugin-highlight.css
index 2aabd3de..02c01891 100644
--- a/docs/no_toc/libs/gitbook-2.6.7/css/plugin-highlight.css
+++ b/docs/no_toc/libs/gitbook-2.6.7/css/plugin-highlight.css
@@ -133,7 +133,7 @@
 .book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code {
   /*
 
-Orginal Style from ethanschoonover.com/solarized (c) Jeremy Hull <sourdrums@gmail.com>
+Original Style from ethanschoonover.com/solarized (c) Jeremy Hull <sourdrums@gmail.com>
 
 */
   /* Solarized Green */
diff --git a/docs/no_toc/libs/gitbook-2.6.7/css/style.css b/docs/no_toc/libs/gitbook-2.6.7/css/style.css
index 1b0c622a..cba69b23 100644
--- a/docs/no_toc/libs/gitbook-2.6.7/css/style.css
+++ b/docs/no_toc/libs/gitbook-2.6.7/css/style.css
@@ -1,15 +1,13 @@
-/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*,.fa{-webkit-font-smoothing:antialiased}.fa-ul>li,sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*!
+/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*{-webkit-font-smoothing:antialiased}sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*!
  * Preboot v2
  *
  * Open sourced under MIT license by @mdo.
  * Some variables and mixins from Bootstrap (Apache 2 license).
- */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}.fa,.fa-stack{display:inline-block}/*!
+ */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}/*!
  *  Font Awesome 4.7.0 by @davegandy - http://fontawesome.io - @fontawesome
  *  License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License)
- */@font-face{font-family:FontAwesome;src:url(./fontawesome/fontawesome-webfont.ttf?v=4.7.0) format('truetype');font-weight:400;font-style:normal}.fa{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1;-moz-osx-font-smoothing:grayscale}.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:spin 2s infinite linear;-moz-animation:spin 2s infinite linear;-o-animation:spin 2s infinite linear;animation:spin 2s infinite linear}@-moz-keyframes spin{0%{-moz-transform:rotate(0)}100%{-moz-transform:rotate(359deg)}}@-webkit-keyframes spin{0%{-webkit-transform:rotate(0)}100%{-webkit-transform:rotate(359deg)}}@-o-keyframes spin{0%{-o-transform:rotate(0)}100%{-o-transform:rotate(359deg)}}@keyframes spin{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=1);-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=3);-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1);-webkit-transform:scale(-1,1);-moz-transform:scale(-1,1);-ms-transform:scale(-1,1);-o-transform:scale(-1,1);transform:scale(-1,1)}.fa-flip-vertical{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1);-webkit-transform:scale(1,-1);-moz-transform:scale(1,-1);-ms-transform:scale(1,-1);-o-transform:scale(1,-1);transform:scale(1,-1)}.fa-stack{position:relative;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-cog:before,.fa-gear:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-repeat:before,.fa-rotate-right:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-exclamation-triangle:before,.fa-warning:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-cogs:before,.fa-gears:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-floppy-o:before,.fa-save:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-sort:before,.fa-unsorted:before{content:"\f0dc"}.fa-sort-desc:before,.fa-sort-down:before{content:"\f0dd"}.fa-sort-asc:before,.fa-sort-up:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-gavel:before,.fa-legal:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-bolt:before,.fa-flash:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-clipboard:before,.fa-paste:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-chain-broken:before,.fa-unlink:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:"\f150"}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:"\f151"}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:"\f152"}.fa-eur:before,.fa-euro:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-inr:before,.fa-rupee:before{content:"\f156"}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:"\f157"}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:"\f158"}.fa-krw:before,.fa-won:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-try:before,.fa-turkish-lira:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-bank:before,.fa-institution:before,.fa-university:before{content:"\f19c"}.fa-graduation-cap:before,.fa-mortar-board:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-square:before,.fa-pied-piper:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:"\f1c5"}.fa-file-archive-o:before,.fa-file-zip-o:before{content:"\f1c6"}.fa-file-audio-o:before,.fa-file-sound-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-rebel:before{content:"\f1d0"}.fa-empire:before,.fa-ge:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-paper-plane:before,.fa-send:before{content:"\f1d8"}.fa-paper-plane-o:before,.fa-send-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:touch;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%}
+ */@font-face{font-family:'FontAwesome';src:url('./fontawesome/fontawesome-webfont.ttf?v=4.7.0') format('truetype');font-weight:normal;font-style:normal}.fa{display:inline-block;font:normal normal normal 14px/1 FontAwesome;font-size:inherit;text-rendering:auto;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-ul>li{position:relative}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:solid .08em #eee;border-radius:.1em}.fa-pull-left{float:left}.fa-pull-right{float:right}.fa.fa-pull-left{margin-right:.3em}.fa.fa-pull-right{margin-left:.3em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:fa-spin 2s infinite linear;animation:fa-spin 2s infinite linear}.fa-pulse{-webkit-animation:fa-spin 1s infinite steps(8);animation:fa-spin 1s infinite steps(8)}@-webkit-keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}@keyframes fa-spin{0%{-webkit-transform:rotate(0deg);transform:rotate(0deg)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";-webkit-transform:rotate(90deg);-ms-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";-webkit-transform:rotate(180deg);-ms-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";-webkit-transform:rotate(270deg);-ms-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1)";-webkit-transform:scale(-1, 1);-ms-transform:scale(-1, 1);transform:scale(-1, 1)}.fa-flip-vertical{-ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1)";-webkit-transform:scale(1, -1);-ms-transform:scale(1, -1);transform:scale(1, -1)}:root .fa-rotate-90,:root .fa-rotate-180,:root .fa-rotate-270,:root .fa-flip-horizontal,:root .fa-flip-vertical{filter:none}.fa-stack{position:relative;display:inline-block;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-remove:before,.fa-close:before,.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-gear:before,.fa-cog:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-rotate-right:before,.fa-repeat:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-photo:before,.fa-image:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-warning:before,.fa-exclamation-triangle:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before,.fa-bar-chart:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-gears:before,.fa-cogs:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook-f:before,.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-feed:before,.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-save:before,.fa-floppy-o:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-navicon:before,.fa-reorder:before,.fa-bars:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-unsorted:before,.fa-sort:before{content:"\f0dc"}.fa-sort-down:before,.fa-sort-desc:before{content:"\f0dd"}.fa-sort-up:before,.fa-sort-asc:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-legal:before,.fa-gavel:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-flash:before,.fa-bolt:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-paste:before,.fa-clipboard:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-unlink:before,.fa-chain-broken:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-toggle-down:before,.fa-caret-square-o-down:before{content:"\f150"}.fa-toggle-up:before,.fa-caret-square-o-up:before{content:"\f151"}.fa-toggle-right:before,.fa-caret-square-o-right:before{content:"\f152"}.fa-euro:before,.fa-eur:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-rupee:before,.fa-inr:before{content:"\f156"}.fa-cny:before,.fa-rmb:before,.fa-yen:before,.fa-jpy:before{content:"\f157"}.fa-ruble:before,.fa-rouble:before,.fa-rub:before{content:"\f158"}.fa-won:before,.fa-krw:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before,.fa-gratipay:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-toggle-left:before,.fa-caret-square-o-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-turkish-lira:before,.fa-try:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-institution:before,.fa-bank:before,.fa-university:before{content:"\f19c"}.fa-mortar-board:before,.fa-graduation-cap:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-pp:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-photo-o:before,.fa-file-picture-o:before,.fa-file-image-o:before{content:"\f1c5"}.fa-file-zip-o:before,.fa-file-archive-o:before{content:"\f1c6"}.fa-file-sound-o:before,.fa-file-audio-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-buoy:before,.fa-life-saver:before,.fa-support:before,.fa-life-ring:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-resistance:before,.fa-rebel:before{content:"\f1d0"}.fa-ge:before,.fa-empire:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-y-combinator-square:before,.fa-yc-square:before,.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-send:before,.fa-paper-plane:before{content:"\f1d8"}.fa-send-o:before,.fa-paper-plane-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.fa-soccer-ball-o:before,.fa-futbol-o:before{content:"\f1e3"}.fa-tty:before{content:"\f1e4"}.fa-binoculars:before{content:"\f1e5"}.fa-plug:before{content:"\f1e6"}.fa-slideshare:before{content:"\f1e7"}.fa-twitch:before{content:"\f1e8"}.fa-yelp:before{content:"\f1e9"}.fa-newspaper-o:before{content:"\f1ea"}.fa-wifi:before{content:"\f1eb"}.fa-calculator:before{content:"\f1ec"}.fa-paypal:before{content:"\f1ed"}.fa-google-wallet:before{content:"\f1ee"}.fa-cc-visa:before{content:"\f1f0"}.fa-cc-mastercard:before{content:"\f1f1"}.fa-cc-discover:before{content:"\f1f2"}.fa-cc-amex:before{content:"\f1f3"}.fa-cc-paypal:before{content:"\f1f4"}.fa-cc-stripe:before{content:"\f1f5"}.fa-bell-slash:before{content:"\f1f6"}.fa-bell-slash-o:before{content:"\f1f7"}.fa-trash:before{content:"\f1f8"}.fa-copyright:before{content:"\f1f9"}.fa-at:before{content:"\f1fa"}.fa-eyedropper:before{content:"\f1fb"}.fa-paint-brush:before{content:"\f1fc"}.fa-birthday-cake:before{content:"\f1fd"}.fa-area-chart:before{content:"\f1fe"}.fa-pie-chart:before{content:"\f200"}.fa-line-chart:before{content:"\f201"}.fa-lastfm:before{content:"\f202"}.fa-lastfm-square:before{content:"\f203"}.fa-toggle-off:before{content:"\f204"}.fa-toggle-on:before{content:"\f205"}.fa-bicycle:before{content:"\f206"}.fa-bus:before{content:"\f207"}.fa-ioxhost:before{content:"\f208"}.fa-angellist:before{content:"\f209"}.fa-cc:before{content:"\f20a"}.fa-shekel:before,.fa-sheqel:before,.fa-ils:before{content:"\f20b"}.fa-meanpath:before{content:"\f20c"}.fa-buysellads:before{content:"\f20d"}.fa-connectdevelop:before{content:"\f20e"}.fa-dashcube:before{content:"\f210"}.fa-forumbee:before{content:"\f211"}.fa-leanpub:before{content:"\f212"}.fa-sellsy:before{content:"\f213"}.fa-shirtsinbulk:before{content:"\f214"}.fa-simplybuilt:before{content:"\f215"}.fa-skyatlas:before{content:"\f216"}.fa-cart-plus:before{content:"\f217"}.fa-cart-arrow-down:before{content:"\f218"}.fa-diamond:before{content:"\f219"}.fa-ship:before{content:"\f21a"}.fa-user-secret:before{content:"\f21b"}.fa-motorcycle:before{content:"\f21c"}.fa-street-view:before{content:"\f21d"}.fa-heartbeat:before{content:"\f21e"}.fa-venus:before{content:"\f221"}.fa-mars:before{content:"\f222"}.fa-mercury:before{content:"\f223"}.fa-intersex:before,.fa-transgender:before{content:"\f224"}.fa-transgender-alt:before{content:"\f225"}.fa-venus-double:before{content:"\f226"}.fa-mars-double:before{content:"\f227"}.fa-venus-mars:before{content:"\f228"}.fa-mars-stroke:before{content:"\f229"}.fa-mars-stroke-v:before{content:"\f22a"}.fa-mars-stroke-h:before{content:"\f22b"}.fa-neuter:before{content:"\f22c"}.fa-genderless:before{content:"\f22d"}.fa-facebook-official:before{content:"\f230"}.fa-pinterest-p:before{content:"\f231"}.fa-whatsapp:before{content:"\f232"}.fa-server:before{content:"\f233"}.fa-user-plus:before{content:"\f234"}.fa-user-times:before{content:"\f235"}.fa-hotel:before,.fa-bed:before{content:"\f236"}.fa-viacoin:before{content:"\f237"}.fa-train:before{content:"\f238"}.fa-subway:before{content:"\f239"}.fa-medium:before{content:"\f23a"}.fa-yc:before,.fa-y-combinator:before{content:"\f23b"}.fa-optin-monster:before{content:"\f23c"}.fa-opencart:before{content:"\f23d"}.fa-expeditedssl:before{content:"\f23e"}.fa-battery-4:before,.fa-battery:before,.fa-battery-full:before{content:"\f240"}.fa-battery-3:before,.fa-battery-three-quarters:before{content:"\f241"}.fa-battery-2:before,.fa-battery-half:before{content:"\f242"}.fa-battery-1:before,.fa-battery-quarter:before{content:"\f243"}.fa-battery-0:before,.fa-battery-empty:before{content:"\f244"}.fa-mouse-pointer:before{content:"\f245"}.fa-i-cursor:before{content:"\f246"}.fa-object-group:before{content:"\f247"}.fa-object-ungroup:before{content:"\f248"}.fa-sticky-note:before{content:"\f249"}.fa-sticky-note-o:before{content:"\f24a"}.fa-cc-jcb:before{content:"\f24b"}.fa-cc-diners-club:before{content:"\f24c"}.fa-clone:before{content:"\f24d"}.fa-balance-scale:before{content:"\f24e"}.fa-hourglass-o:before{content:"\f250"}.fa-hourglass-1:before,.fa-hourglass-start:before{content:"\f251"}.fa-hourglass-2:before,.fa-hourglass-half:before{content:"\f252"}.fa-hourglass-3:before,.fa-hourglass-end:before{content:"\f253"}.fa-hourglass:before{content:"\f254"}.fa-hand-grab-o:before,.fa-hand-rock-o:before{content:"\f255"}.fa-hand-stop-o:before,.fa-hand-paper-o:before{content:"\f256"}.fa-hand-scissors-o:before{content:"\f257"}.fa-hand-lizard-o:before{content:"\f258"}.fa-hand-spock-o:before{content:"\f259"}.fa-hand-pointer-o:before{content:"\f25a"}.fa-hand-peace-o:before{content:"\f25b"}.fa-trademark:before{content:"\f25c"}.fa-registered:before{content:"\f25d"}.fa-creative-commons:before{content:"\f25e"}.fa-gg:before{content:"\f260"}.fa-gg-circle:before{content:"\f261"}.fa-tripadvisor:before{content:"\f262"}.fa-odnoklassniki:before{content:"\f263"}.fa-odnoklassniki-square:before{content:"\f264"}.fa-get-pocket:before{content:"\f265"}.fa-wikipedia-w:before{content:"\f266"}.fa-safari:before{content:"\f267"}.fa-chrome:before{content:"\f268"}.fa-firefox:before{content:"\f269"}.fa-opera:before{content:"\f26a"}.fa-internet-explorer:before{content:"\f26b"}.fa-tv:before,.fa-television:before{content:"\f26c"}.fa-contao:before{content:"\f26d"}.fa-500px:before{content:"\f26e"}.fa-amazon:before{content:"\f270"}.fa-calendar-plus-o:before{content:"\f271"}.fa-calendar-minus-o:before{content:"\f272"}.fa-calendar-times-o:before{content:"\f273"}.fa-calendar-check-o:before{content:"\f274"}.fa-industry:before{content:"\f275"}.fa-map-pin:before{content:"\f276"}.fa-map-signs:before{content:"\f277"}.fa-map-o:before{content:"\f278"}.fa-map:before{content:"\f279"}.fa-commenting:before{content:"\f27a"}.fa-commenting-o:before{content:"\f27b"}.fa-houzz:before{content:"\f27c"}.fa-vimeo:before{content:"\f27d"}.fa-black-tie:before{content:"\f27e"}.fa-fonticons:before{content:"\f280"}.fa-reddit-alien:before{content:"\f281"}.fa-edge:before{content:"\f282"}.fa-credit-card-alt:before{content:"\f283"}.fa-codiepie:before{content:"\f284"}.fa-modx:before{content:"\f285"}.fa-fort-awesome:before{content:"\f286"}.fa-usb:before{content:"\f287"}.fa-product-hunt:before{content:"\f288"}.fa-mixcloud:before{content:"\f289"}.fa-scribd:before{content:"\f28a"}.fa-pause-circle:before{content:"\f28b"}.fa-pause-circle-o:before{content:"\f28c"}.fa-stop-circle:before{content:"\f28d"}.fa-stop-circle-o:before{content:"\f28e"}.fa-shopping-bag:before{content:"\f290"}.fa-shopping-basket:before{content:"\f291"}.fa-hashtag:before{content:"\f292"}.fa-bluetooth:before{content:"\f293"}.fa-bluetooth-b:before{content:"\f294"}.fa-percent:before{content:"\f295"}.fa-gitlab:before{content:"\f296"}.fa-wpbeginner:before{content:"\f297"}.fa-wpforms:before{content:"\f298"}.fa-envira:before{content:"\f299"}.fa-universal-access:before{content:"\f29a"}.fa-wheelchair-alt:before{content:"\f29b"}.fa-question-circle-o:before{content:"\f29c"}.fa-blind:before{content:"\f29d"}.fa-audio-description:before{content:"\f29e"}.fa-volume-control-phone:before{content:"\f2a0"}.fa-braille:before{content:"\f2a1"}.fa-assistive-listening-systems:before{content:"\f2a2"}.fa-asl-interpreting:before,.fa-american-sign-language-interpreting:before{content:"\f2a3"}.fa-deafness:before,.fa-hard-of-hearing:before,.fa-deaf:before{content:"\f2a4"}.fa-glide:before{content:"\f2a5"}.fa-glide-g:before{content:"\f2a6"}.fa-signing:before,.fa-sign-language:before{content:"\f2a7"}.fa-low-vision:before{content:"\f2a8"}.fa-viadeo:before{content:"\f2a9"}.fa-viadeo-square:before{content:"\f2aa"}.fa-snapchat:before{content:"\f2ab"}.fa-snapchat-ghost:before{content:"\f2ac"}.fa-snapchat-square:before{content:"\f2ad"}.fa-pied-piper:before{content:"\f2ae"}.fa-first-order:before{content:"\f2b0"}.fa-yoast:before{content:"\f2b1"}.fa-themeisle:before{content:"\f2b2"}.fa-google-plus-circle:before,.fa-google-plus-official:before{content:"\f2b3"}.fa-fa:before,.fa-font-awesome:before{content:"\f2b4"}.fa-handshake-o:before{content:"\f2b5"}.fa-envelope-open:before{content:"\f2b6"}.fa-envelope-open-o:before{content:"\f2b7"}.fa-linode:before{content:"\f2b8"}.fa-address-book:before{content:"\f2b9"}.fa-address-book-o:before{content:"\f2ba"}.fa-vcard:before,.fa-address-card:before{content:"\f2bb"}.fa-vcard-o:before,.fa-address-card-o:before{content:"\f2bc"}.fa-user-circle:before{content:"\f2bd"}.fa-user-circle-o:before{content:"\f2be"}.fa-user-o:before{content:"\f2c0"}.fa-id-badge:before{content:"\f2c1"}.fa-drivers-license:before,.fa-id-card:before{content:"\f2c2"}.fa-drivers-license-o:before,.fa-id-card-o:before{content:"\f2c3"}.fa-quora:before{content:"\f2c4"}.fa-free-code-camp:before{content:"\f2c5"}.fa-telegram:before{content:"\f2c6"}.fa-thermometer-4:before,.fa-thermometer:before,.fa-thermometer-full:before{content:"\f2c7"}.fa-thermometer-3:before,.fa-thermometer-three-quarters:before{content:"\f2c8"}.fa-thermometer-2:before,.fa-thermometer-half:before{content:"\f2c9"}.fa-thermometer-1:before,.fa-thermometer-quarter:before{content:"\f2ca"}.fa-thermometer-0:before,.fa-thermometer-empty:before{content:"\f2cb"}.fa-shower:before{content:"\f2cc"}.fa-bathtub:before,.fa-s15:before,.fa-bath:before{content:"\f2cd"}.fa-podcast:before{content:"\f2ce"}.fa-window-maximize:before{content:"\f2d0"}.fa-window-minimize:before{content:"\f2d1"}.fa-window-restore:before{content:"\f2d2"}.fa-times-rectangle:before,.fa-window-close:before{content:"\f2d3"}.fa-times-rectangle-o:before,.fa-window-close-o:before{content:"\f2d4"}.fa-bandcamp:before{content:"\f2d5"}.fa-grav:before{content:"\f2d6"}.fa-etsy:before{content:"\f2d7"}.fa-imdb:before{content:"\f2d8"}.fa-ravelry:before{content:"\f2d9"}.fa-eercast:before{content:"\f2da"}.fa-microchip:before{content:"\f2db"}.fa-snowflake-o:before{content:"\f2dc"}.fa-superpowers:before{content:"\f2dd"}.fa-wpexplorer:before{content:"\f2de"}.fa-meetup:before{content:"\f2e0"}.sr-only{position:absolute;width:1px;height:1px;padding:0;margin:-1px;overflow:hidden;clip:rect(0, 0, 0, 0);border:0}.sr-only-focusable:active,.sr-only-focusable:focus{position:static;width:auto;height:auto;margin:0;overflow:visible;clip:auto}
+.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:auto;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%}
 .book .book-summary ul.summary li a span {display:inline;padding:initial;overflow:visible;cursor:auto;opacity:1;}
 /* show arrow before summary tag as in bootstrap */
 details > summary {display:list-item;cursor:pointer;}
-/*add whatsapp icon from FA 5.1.1
-TODO: remove when updating fontawesome*/
-.fa-whatsapp:before{content:"\f232"}
diff --git a/docs/no_toc/libs/gitbook-2.6.7/js/plugin-clipboard.js b/docs/no_toc/libs/gitbook-2.6.7/js/plugin-clipboard.js
index 9a7d2e75..f0880be6 100644
--- a/docs/no_toc/libs/gitbook-2.6.7/js/plugin-clipboard.js
+++ b/docs/no_toc/libs/gitbook-2.6.7/js/plugin-clipboard.js
@@ -9,7 +9,9 @@ gitbook.require(["gitbook", "jQuery"], function(gitbook, $) {
 
     // the page.change event is thrown twice: before and after the page changes
     if (clipboard) {
-      // clipboard is already defined
+      // clipboard is already defined but we are on the same page
+      if (clipboard._prevPage === window.location.pathname) return;
+      // clipboard is already defined and url path change
       // we can deduct that we are before page changes
       clipboard.destroy(); // destroy the previous events listeners
       clipboard = undefined; // reset the clipboard object
@@ -24,6 +26,8 @@ gitbook.require(["gitbook", "jQuery"], function(gitbook, $) {
       }
     });
 
+    clipboard._prevPage = window.location.pathname
+
   });
 
 });
diff --git a/docs/no_toc/microarray-data.html b/docs/no_toc/microarray-data.html
index 1f9cd48b..e06a560d 100644
--- a/docs/no_toc/microarray-data.html
+++ b/docs/no_toc/microarray-data.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 7 Microarray Data | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 7 Microarray Data | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="sequencing-data.html"/>
 <link rel="next" href="annotating-genomes.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 7 Microarray Data | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,92 +535,92 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="microarray-data" class="section level1" number="7">
-<h1><span class="header-section-number">Chapter 7</span> Microarray Data</h1>
+<div id="microarray-data" class="section level1 hasAnchor" number="7">
+<h1><span class="header-section-number">Chapter 7</span> Microarray Data<a href="microarray-data.html#microarray-data" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-5" class="section level2" number="7.1">
-<h2><span class="header-section-number">7.1</span> Learning Objectives</h2>
-<p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png" title="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." alt="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." width="100%" /></p>
+<div id="learning-objectives-5" class="section level2 hasAnchor" number="7.1">
+<h2><span class="header-section-number">7.1</span> Learning Objectives<a href="microarray-data.html#learning-objectives-5" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png" alt="This chapter will demonstrate how to: Understand the very general basics of microarray data collection and processing workflow. Understand the limitations and strengths of microarray data in general." width="100%" /></p>
 </div>
-<div id="summary-of-microarrays" class="section level2" number="7.2">
-<h2><span class="header-section-number">7.2</span> Summary of microarrays</h2>
+<div id="summary-of-microarrays" class="section level2 hasAnchor" number="7.2">
+<h2><span class="header-section-number">7.2</span> Summary of microarrays<a href="microarray-data.html#summary-of-microarrays" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Microarrays have been in use since before high throughput sequencing methods became more affordable and widespread, but they still can be a effective and affordable tool for genomic assays. Depending on your goals, microarray may be a suitable choice for your genomic study.</p>
 </div>
-<div id="how-do-microarrays-work" class="section level2" number="7.3">
-<h2><span class="header-section-number">7.3</span> How do microarrays work?</h2>
+<div id="how-do-microarrays-work" class="section level2 hasAnchor" number="7.3">
+<h2><span class="header-section-number">7.3</span> How do microarrays work?<a href="microarray-data.html#how-do-microarrays-work" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>All microarrays work on hybridization to sets of oligonucleotides on a chip. However, the preparation of the samples, and the oligonucleotides’ hybridization targets vary depending on the assay and goals.</p>
 <p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_51.png" width="100%" /></p>
 <p>On a basic principle, oligonucleotide probes are designed for different targets sets designed for the same targets are put together. On the whole chip, these probes are arranged in a grid like design so that after a sample is hybridized to them, you can detect how much of the target is detected by taking an image and knowing what target each location is designed to.</p>
-<div id="pros" class="section level3" number="7.3.1">
-<h3><span class="header-section-number">7.3.1</span> Pros:</h3>
+<div id="pros" class="section level3 hasAnchor" number="7.3.1">
+<h3><span class="header-section-number">7.3.1</span> Pros:<a href="microarray-data.html#pros" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li>Microarrays are much more affordable than high throughput sequencing which can allow you to run more samples and have more statistical power <span class="citation">(<a href="#ref-Tarca2006" role="doc-biblioref">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019" role="doc-biblioref">ALSF 2019</a>)</span>.</li>
-<li>Microarrays take less time to process than most high throughput sequencing methods<span class="citation">(<a href="#ref-Tarca2006" role="doc-biblioref">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019" role="doc-biblioref">ALSF 2019</a>)</span>.</li>
-<li>Microarrays are generally less computationally intensive to process and you can get your results more quickly<span class="citation">(<a href="#ref-Tarca2006" role="doc-biblioref">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019" role="doc-biblioref">ALSF 2019</a>)</span>.</li>
-<li>Microarrays are generally as good as sequencing methods for detecting clinical endpoints <span class="citation">(<a href="#ref-Zhang2015" role="doc-biblioref">W. Zhang et al. 2015</a>)</span>.</li>
+<li>Microarrays are much more affordable than high throughput sequencing which can allow you to run more samples and have more statistical power <span class="citation">(<a href="#ref-Tarca2006">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019">ALSF 2019</a>)</span>.</li>
+<li>Microarrays take less time to process than most high throughput sequencing methods<span class="citation">(<a href="#ref-Tarca2006">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019">ALSF 2019</a>)</span>.</li>
+<li>Microarrays are generally less computationally intensive to process and you can get your results more quickly<span class="citation">(<a href="#ref-Tarca2006">Tarca, Romero, and Draghici 2006</a>; <a href="#ref-refinebioexamples2019">ALSF 2019</a>)</span>.</li>
+<li>Microarrays are generally as good as sequencing methods for detecting clinical endpoints <span class="citation">(<a href="#ref-Zhang2015">W. Zhang et al. 2015</a>)</span>.</li>
 </ul>
 </div>
-<div id="cons" class="section level3" number="7.3.2">
-<h3><span class="header-section-number">7.3.2</span> Cons:</h3>
+<div id="cons" class="section level3 hasAnchor" number="7.3.2">
+<h3><span class="header-section-number">7.3.2</span> Cons:<a href="microarray-data.html#cons" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li>Microarray chips can only measure the targets they are designed for, and cannot be used for exploratory purposes <span class="citation">(<a href="#ref-Zhang2015" role="doc-biblioref">W. Zhang et al. 2015</a>)</span>.</li>
-<li>Microarrays’ probe designs can only be as up to date as the genome they were designed against at the time <span class="citation">(<a href="#ref-Mantione2014" role="doc-biblioref">Mantione et al. 2014</a>; <a href="#ref-refinebioexamples2019" role="doc-biblioref">ALSF 2019</a>)</span>.</li>
-<li>Microarray does not escape oligonucleotide biases like GC content and sequence composition biases<span class="citation">(<a href="#ref-refinebioexamples2019" role="doc-biblioref">ALSF 2019</a>)</span>.</li>
+<li>Microarray chips can only measure the targets they are designed for, and cannot be used for exploratory purposes <span class="citation">(<a href="#ref-Zhang2015">W. Zhang et al. 2015</a>)</span>.</li>
+<li>Microarrays’ probe designs can only be as up to date as the genome they were designed against at the time <span class="citation">(<a href="#ref-Mantione2014">Mantione et al. 2014</a>; <a href="#ref-refinebioexamples2019">ALSF 2019</a>)</span>.</li>
+<li>Microarray does not escape oligonucleotide biases like GC content and sequence composition biases<span class="citation">(<a href="#ref-refinebioexamples2019">ALSF 2019</a>)</span>.</li>
 </ul>
 </div>
 </div>
-<div id="what-types-of-arrays-are-there" class="section level2" number="7.4">
-<h2><span class="header-section-number">7.4</span> What types of arrays are there?</h2>
-<div id="snp-arrays" class="section level3" number="7.4.1">
-<h3><span class="header-section-number">7.4.1</span> SNP arrays</h3>
+<div id="what-types-of-arrays-are-there" class="section level2 hasAnchor" number="7.4">
+<h2><span class="header-section-number">7.4</span> What types of arrays are there?<a href="microarray-data.html#what-types-of-arrays-are-there" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="snp-arrays" class="section level3 hasAnchor" number="7.4.1">
+<h3><span class="header-section-number">7.4.1</span> SNP arrays<a href="microarray-data.html#snp-arrays" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_16.png" width="100%" /></p>
 <p>Single nucleotide polymorphism arrays are designed to measure DNA variants. They are designed to target DNA variants. When the sample is hybridized, the amount of fluorescence detected can be interpreted to indicate the presence of the variant and whether the variant is homogeneous or heterogenous. The samples prepped for SNP arrays then need to be DNA samples.</p>
-<div id="examples" class="section level4" number="7.4.1.1">
-<h4><span class="header-section-number">7.4.1.1</span> Examples:</h4>
+<div id="examples" class="section level4 hasAnchor" number="7.4.1.1">
+<h4><span class="header-section-number">7.4.1.1</span> Examples:<a href="microarray-data.html#examples" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://www.internationalgenome.org/">The 1000 genomes project</a> is a large collection of SNP data array from many populations around the world and is available for download.</li>
 </ul>
 </div>
 </div>
-<div id="gene-expression-arrays" class="section level3" number="7.4.2">
-<h3><span class="header-section-number">7.4.2</span> Gene expression arrays</h3>
+<div id="gene-expression-arrays" class="section level3 hasAnchor" number="7.4.2">
+<h3><span class="header-section-number">7.4.2</span> Gene expression arrays<a href="microarray-data.html#gene-expression-arrays" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142e3de7ce8_0_8.png" width="100%" /></p>
 <p>Gene expression arrays are designed to measure gene expression. They are designed to target and measure relative transcript abundance level.</p>
-<div id="examples-1" class="section level4" number="7.4.2.1">
-<h4><span class="header-section-number">7.4.2.1</span> Examples:</h4>
+<div id="examples-1" class="section level4 hasAnchor" number="7.4.2.1">
+<h4><span class="header-section-number">7.4.2.1</span> Examples:<a href="microarray-data.html#examples-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://www.refine.bio/">refine.bio</a> is the largest collection of publicly available, already normalized gene expression data (including gene expression microarrays).</li>
-<li><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000543">Getting started in gene expression microarray analysis</a> <span class="citation">(<a href="#ref-Slonim_Yanai_2009" role="doc-biblioref">Slonim and Yanai 2009</a>)</span>.</li>
-<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3467903/">Microarray and its applications</a> <span class="citation">(<a href="#ref-Govindarajan2012" role="doc-biblioref">2012</a>)</span>.</li>
-<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435252/">Analysis of microarray experiments of gene expression profiling</a> <span class="citation">(<a href="#ref-Tarca2006" role="doc-biblioref">Tarca, Romero, and Draghici 2006</a>)</span>.</li>
+<li><a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000543">Getting started in gene expression microarray analysis</a> <span class="citation">(<a href="#ref-Slonim_Yanai_2009">Slonim and Yanai 2009</a>)</span>.</li>
+<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3467903/">Microarray and its applications</a> <span class="citation">(<a href="#ref-Govindarajan2012">2012</a>)</span>.</li>
+<li><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2435252/">Analysis of microarray experiments of gene expression profiling</a> <span class="citation">(<a href="#ref-Tarca2006">Tarca, Romero, and Draghici 2006</a>)</span>.</li>
 </ul>
 </div>
 </div>
-<div id="dna-methylation-arrays" class="section level3" number="7.4.3">
-<h3><span class="header-section-number">7.4.3</span> DNA methylation arrays</h3>
+<div id="dna-methylation-arrays" class="section level3 hasAnchor" number="7.4.3">
+<h3><span class="header-section-number">7.4.3</span> DNA methylation arrays<a href="microarray-data.html#dna-methylation-arrays" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_1.png" width="100%" /></p>
 <p>DNA methylation can also be measured by microarray. To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively.</p>
 <p>A ratio of the fluorescence signal can be used to identify the relative abundance of the methylated and unmethylated versions of the sequence.</p>
-<p>Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite bisulfite sequencing <span class="citation">(<a href="#ref-Booth2013" role="doc-biblioref">Booth et al. 2013</a>)</span>. Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics.</p>
+<p>Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite bisulfite sequencing <span class="citation">(<a href="#ref-Booth2013">Booth et al. 2013</a>)</span>. Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics.</p>
 </div>
 </div>
-<div id="general-processing-of-microarray-data" class="section level2" number="7.5">
-<h2><span class="header-section-number">7.5</span> General processing of microarray data</h2>
+<div id="general-processing-of-microarray-data" class="section level2 hasAnchor" number="7.5">
+<h2><span class="header-section-number">7.5</span> General processing of microarray data<a href="microarray-data.html#general-processing-of-microarray-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><img src="resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15f36710259_0_0.png" width="100%" /></p>
 <p>After scanning, microarray data starts as an image that needs to be quantified, normalized and further corrected and edited based on the most current genome and probe annotation.</p>
 <p>As noted above, microarrays do not escape the base sequence biases that accompany most all genomic assays. The normalization methods you use ideally will mitigate these sequence biases and also make sure to remove probes that may be outdated or bind to multiple places on the genome.</p>
 <p>The tools and methods by which you normalize and correct the microarray data will be dependent not only on the type of microarray assay you are performing (gene expression, SNP, methylation), but most of all what kind of microarray chip design/platform you are using.</p>
-<div id="examples-2" class="section level3" number="7.5.1">
-<h3><span class="header-section-number">7.5.1</span> Examples</h3>
+<div id="examples-2" class="section level3 hasAnchor" number="7.5.1">
+<h3><span class="header-section-number">7.5.1</span> Examples<a href="microarray-data.html#examples-2" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li><a href="https://docs.refine.bio/en/latest/main_text.html#processing-information">Refine.bio describes their processing methods</a>.</li>
 <li><a href="http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp">Brainarray keeps up to date microarray annotation for all kinds of platforms</a></li>
 </ul>
 </div>
-<div id="microarray-platforms" class="section level3" number="7.5.2">
-<h3><span class="header-section-number">7.5.2</span> Microarray Platforms</h3>
+<div id="microarray-platforms" class="section level3 hasAnchor" number="7.5.2">
+<h3><span class="header-section-number">7.5.2</span> Microarray Platforms<a href="microarray-data.html#microarray-platforms" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>There are so many microarray chip designs out there designed to target different things. Three of the largest commercial manufacturers have ready to use microarrays you can purchase. You can also design microarrays to hit your own targets of interest.</p>
 <p>Here are full lists of platforms that have been published on Gene Expression Omnibus.</p>
 <ul>
@@ -636,8 +630,8 @@ <h3><span class="header-section-number">7.5.2</span> Microarray Platforms</h3>
 </ul>
 </div>
 </div>
-<div id="very-general-microarray-workflow" class="section level2" number="7.6">
-<h2><span class="header-section-number">7.6</span> Very General Microarray Workflow</h2>
+<div id="very-general-microarray-workflow" class="section level2 hasAnchor" number="7.6">
+<h2><span class="header-section-number">7.6</span> Very General Microarray Workflow<a href="microarray-data.html#very-general-microarray-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>In the data type specific chapters, we will cover the microarray workflow and file formats in more detail. But in the most general sense, microarray workflows look like this, note that the exact file formats are specific to the chip brand and type you use (e.g. Illumina, Affymetrix, Agilent, etc.):</p>
 <div id="CB1322FC70C4A4B3A1BF2D8AC29A7E9FFF8_81516">
 <div id="CB1322FC70C4A4B3A1BF2D8AC29A7E9FFF8_81516_robot">
@@ -647,36 +641,36 @@ <h2><span class="header-section-number">7.6</span> Very General Microarray Workf
 <script src="https://cloud.smartdraw.com/plugins/html/js/sdjswidget_html.js" type="text/javascript"></script>
 <script type="text/javascript">SDJS_Widget("CB1322FC70C4A4B3A1BF2D8AC29A7E9FFF8",81516,1,"");</script>
 <p><br/></p>
-<div id="microarray-file-formats-1" class="section level3" number="7.6.1">
-<h3><span class="header-section-number">7.6.1</span> Microarray file formats</h3>
-<div id="idat---intensity-data-file-1" class="section level4" number="7.6.1.1">
-<h4><span class="header-section-number">7.6.1.1</span> IDAT - intensity data file</h4>
+<div id="microarray-file-formats-1" class="section level3 hasAnchor" number="7.6.1">
+<h3><span class="header-section-number">7.6.1</span> Microarray file formats<a href="microarray-data.html#microarray-file-formats-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="idat---intensity-data-file-1" class="section level4 hasAnchor" number="7.6.1.1">
+<h4><span class="header-section-number">7.6.1.1</span> IDAT - intensity data file<a href="microarray-data.html#idat---intensity-data-file-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly.</p>
 <p>Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into <a href="https://github.com/freeseek/gtc2vcf">this package to help you do that</a>.</p>
 <p><a href="https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/technote_array_analysis_workflows.pdf">For more on IDAT files</a>.</p>
 </div>
-<div id="dat---data-file-1" class="section level4" number="7.6.1.2">
-<h4><span class="header-section-number">7.6.1.2</span> DAT - data file</h4>
+<div id="dat---data-file-1" class="section level4 hasAnchor" number="7.6.1.2">
+<h4><span class="header-section-number">7.6.1.2</span> DAT - data file<a href="microarray-data.html#dat---data-file-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/dat.html">For more on DAT files</a>.</p>
 </div>
-<div id="cel-1" class="section level4" number="7.6.1.3">
-<h4><span class="header-section-number">7.6.1.3</span> CEL</h4>
+<div id="cel-1" class="section level4 hasAnchor" number="7.6.1.3">
+<h4><span class="header-section-number">7.6.1.3</span> CEL<a href="microarray-data.html#cel-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html">For more on CEL files</a></p>
 </div>
-<div id="chp-1" class="section level4" number="7.6.1.4">
-<h4><span class="header-section-number">7.6.1.4</span> CHP</h4>
+<div id="chp-1" class="section level4 hasAnchor" number="7.6.1.4">
+<h4><span class="header-section-number">7.6.1.4</span> CHP<a href="microarray-data.html#chp-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files.</p>
 <p><a href="https://www.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/chp-xda.html">For more about CHP files</a>.</p>
 </div>
 </div>
 </div>
-<div id="general-informatics-files-1" class="section level2" number="7.7">
-<h2><span class="header-section-number">7.7</span> General informatics files</h2>
+<div id="general-informatics-files-1" class="section level2 hasAnchor" number="7.7">
+<h2><span class="header-section-number">7.7</span> General informatics files<a href="microarray-data.html#general-informatics-files-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter:</p>
-<div id="bed---browser-extensible-data-1" class="section level4" number="7.7.0.1">
-<h4><span class="header-section-number">7.7.0.1</span> BED - Browser Extensible Data</h4>
+<div id="bed---browser-extensible-data-1" class="section level4 hasAnchor" number="7.7.0.1">
+<h4><span class="header-section-number">7.7.0.1</span> BED - Browser Extensible Data<a href="microarray-data.html#bed---browser-extensible-data-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the <code>chrom</code>, <code>chromStart</code> and <code>chromEnd</code> columns to start.</p>
 <p>A BED file might look like this:</p>
 <pre><code>chrom   chromStart  chromEnd other_optional_columns
@@ -684,8 +678,8 @@ <h4><span class="header-section-number">7.7.0.1</span> BED - Browser Extensible
 chr2    100    3000  bad</code></pre>
 <p>For <a href="https://en.wikipedia.org/wiki/BED_(file_format)">more on BED files</a>.</p>
 </div>
-<div id="gffgtf-general-feature-formatgene-transfer-format-1" class="section level4" number="7.7.0.2">
-<h4><span class="header-section-number">7.7.0.2</span> GFF/GTF General Feature Format/Gene Transfer Format</h4>
+<div id="gffgtf-general-feature-formatgene-transfer-format-1" class="section level4 hasAnchor" number="7.7.0.2">
+<h4><span class="header-section-number">7.7.0.2</span> GFF/GTF General Feature Format/Gene Transfer Format<a href="microarray-data.html#gffgtf-general-feature-formatgene-transfer-format-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data.</p>
 <p>You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version.</p>
 <p>A GFF file may look like this (borrowed example from Ensembl):</p>
@@ -693,27 +687,27 @@ <h4><span class="header-section-number">7.7.0.2</span> GFF/GTF General Feature F
 <p>Note that it will be useful for annotating genes and what we know about them.</p>
 <p>For <a href="https://useast.ensembl.org/info/website/upload/gff.html">more about GTF and GFF files</a>.</p>
 </div>
-<div id="other-files-2" class="section level3" number="7.7.1">
-<h3><span class="header-section-number">7.7.1</span> Other files</h3>
+<div id="other-files-2" class="section level3 hasAnchor" number="7.7.1">
+<h3><span class="header-section-number">7.7.1</span> Other files<a href="microarray-data.html#other-files-2" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>* If you didn’t see a file type listed you are looking for, take a look at this <a href="https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats">list by the BROAD</a>. Or, it may be covered in the data type specific chapters.</p>
 </div>
-<div id="microarray-processing-tutorials" class="section level3" number="7.7.2">
-<h3><span class="header-section-number">7.7.2</span> Microarray processing tutorials:</h3>
+<div id="microarray-processing-tutorials" class="section level3 hasAnchor" number="7.7.2">
+<h3><span class="header-section-number">7.7.2</span> Microarray processing tutorials:<a href="microarray-data.html#microarray-processing-tutorials" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>For the most common microarray platforms, you can see these examples for how to process the data:</p>
-<div id="general-arrays" class="section level4" number="7.7.2.1">
-<h4><span class="header-section-number">7.7.2.1</span> General arrays</h4>
+<div id="general-arrays" class="section level4 hasAnchor" number="7.7.2.1">
+<h4><span class="header-section-number">7.7.2.1</span> General arrays<a href="microarray-data.html#general-arrays" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://www.bioconductor.org/packages/devel/workflows/vignettes/arrays/inst/doc/arrays.html">Using Bioconductor for Microarray Analysis</a>.</li>
 </ul>
 </div>
-<div id="gene-expression-arrays-1" class="section level4" number="7.7.2.2">
-<h4><span class="header-section-number">7.7.2.2</span> Gene Expression Arrays</h4>
+<div id="gene-expression-arrays-1" class="section level4 hasAnchor" number="7.7.2.2">
+<h4><span class="header-section-number">7.7.2.2</span> Gene Expression Arrays<a href="microarray-data.html#gene-expression-arrays-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://f1000research.com/articles/5-1384">An end to end workflow for differential gene expression using Affymetrix microarrays</a>.</li>
 </ul>
 </div>
-<div id="dna-methylation-arrays-1" class="section level4" number="7.7.2.3">
-<h4><span class="header-section-number">7.7.2.3</span> DNA Methylation Arrays</h4>
+<div id="dna-methylation-arrays-1" class="section level4 hasAnchor" number="7.7.2.3">
+<h4><span class="header-section-number">7.7.2.3</span> DNA Methylation Arrays<a href="microarray-data.html#dna-methylation-arrays-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><a href="https://nbis-workshop-epigenomics.readthedocs.io/en/latest/content/tutorials/methylationArray/Array_Tutorial.html">DNA Methylation array workflow</a>.</li>
 </ul>
@@ -722,7 +716,7 @@ <h4><span class="header-section-number">7.7.2.3</span> DNA Methylation Arrays</h
 </div>
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-refinebioexamples2019" class="csl-entry">
 ALSF, CCDL for. 2019. <span>“Introduction to Microarray Data.”</span> <a href="https://alexslemonade.github.io/refinebio-examples/02-microarray/00-intro-to-microarray.html">https://alexslemonade.github.io/refinebio-examples/02-microarray/00-intro-to-microarray.html</a>.
@@ -747,10 +741,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -820,7 +821,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/microbiome-sequencing.html b/docs/no_toc/microbiome-sequencing.html
index 3572462c..e482fe05 100644
--- a/docs/no_toc/microbiome-sequencing.html
+++ b/docs/no_toc/microbiome-sequencing.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 21 Microbiome Sequencing | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 21 Microbiome Sequencing | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="dna-methylation-sequencing.html"/>
 <link rel="next" href="itcr--omic-tool-glossary.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 21 Microbiome Sequencing | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,15 +535,17 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="microbiome-sequencing" class="section level1" number="21">
-<h1><span class="header-section-number">Chapter 21</span> Microbiome Sequencing</h1>
+<div id="microbiome-sequencing" class="section level1 hasAnchor" number="21">
+<h1><span class="header-section-number">Chapter 21</span> Microbiome Sequencing<a href="microbiome-sequencing.html#microbiome-sequencing" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is incomplete! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-19" class="section level2" number="21.1">
-<h2><span class="header-section-number">21.1</span> Learning Objectives</h2>
-<p><img src="resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png" title="Learning Objectives" alt="Learning Objectives" width="100%" />
-## A Brief Introduction to Microbiomes</p>
+<div id="learning-objectives-19" class="section level2 hasAnchor" number="21.1">
+<h2><span class="header-section-number">21.1</span> Learning Objectives<a href="microbiome-sequencing.html#learning-objectives-19" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png" alt="Learning Objectives" width="100%" /></p>
+</div>
+<div id="a-brief-introduction-to-microbiomes" class="section level2 hasAnchor" number="21.2">
+<h2><span class="header-section-number">21.2</span> A Brief Introduction to Microbiomes<a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Microbes are everywhere. We have found these tiny organisms in the deepest regions of the ocean and in the upper atmosphere. We have found them in:
 + water that has been solid ice for millennia in the Antarctic
 + boiling water in the geysers of Yellowstone National Park.
@@ -557,15 +553,15 @@ <h2><span class="header-section-number">21.1</span> Learning Objectives</h2>
 + perpetually damp environments, like the intestinal tract of the human body where they are constantly the subject of inspection by our diligent immune cells, and where they impact our health in positive and negative ways that we are only beginning to understand.
 + our nuclear reactors, prompting questions about whether we could harness them as tiny machines to help us remediate environmental disasters of the past, present, and future.</p>
 <p>If we looked hard enough, I think we’d find them on the surface of the moon and Mars, though they are probably microbes who stowed away on our spacecraft and are now patiently waiting for a drop of water that may or may not ever show up. If we ever colonize those worlds, microbes will be an indispensable ally in creating an environment that could sustain us.</p>
-<p><img src="resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g26ebab787e9_0_0.png" title="Learning Objectives" alt="Learning Objectives" width="100%" />
-This figure is adapted from <span class="citation">(<a href="#ref-Tignat-Perrier2022" role="doc-biblioref">Tignat-Perrier et al. 2022</a>)</span> under Creative Commons license.</p>
+<p><img src="resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g26ebab787e9_0_0.png" alt="Learning Objectives" width="100%" />
+This figure is adapted from <span class="citation">(<a href="#ref-Tignat-Perrier2022">Tignat-Perrier et al. 2022</a>)</span> under Creative Commons license.</p>
 <p>Microbes almost never live alone in the real world (i.e., outside of a laboratory). Rather they exist in communities of different species who are interacting with each other and their environment. Some of these communities will have many different types of organisms, and some will have only a few. Because of the large number of species and individuals involved, no two communities will ever be exactly alike, and quantifying differences between microbial communities is an important area of research at the moment. The types of interactions between organisms are also highly varied. These can include mutualistic relationships, where both organisms benefit from the interaction; parasitic relationships, where one organism exclusively benefits to the detriment of the other; and the full gradient in between.</p>
 <p>Microbiome science is everywhere. There are tens of articles published daily in the scientific literature, and many popular science articles and books present these findings to the world of non-scientists. Understanding the promises and limitations of the methods of microbiome science can help avoid misconceptions about microbiome research, and it’s important for practitioners of microbiome science to understand and convey the promise and limitations of our field. Misconceptions abound, frequently arising from the same sources as high-quality popular science microbiome reporting.</p>
 <pre><code>For example, on 5 Feb 2015 an article appeared in the New York Times noting (almost offhand) that Yersinia pestis, the organism responsible for Bubonic plague, had been found in multiple locations throughout the New York City subway system as part of its normal built environment microbiome. This was rapidly followed up on 6 Feb 2015 with an article noting that there was probably not Bubonic plague on the subway system after all, but rather that the approaches used by the research team are limited in their taxonomic resolution, and that likely a harmless close relative of Y. pestis was observed: “What the researchers probably found, [a spokesman for the university where the study originated] said, was bacteria from an unknown species or from organisms that happened to share some gene sequences with the plague bacterium…”.</code></pre>
 <p>As microbiome services and products are increasingly marketed directly to the public, consumers of microbiome research findings, products, and services need to know how to critically evaluate these offerings and their associated claims. As practitioners in the field, we can help by ensuring that the methods we apply are appropriate and reliable, and that we make our work accessible.</p>
 </div>
-<div id="goals-of-amplicon-analysis" class="section level2" number="21.2">
-<h2><span class="header-section-number">21.2</span> Goals of Amplicon analysis</h2>
+<div id="goals-of-amplicon-analysis" class="section level2 hasAnchor" number="21.3">
+<h2><span class="header-section-number">21.3</span> Goals of Amplicon analysis<a href="microbiome-sequencing.html#goals-of-amplicon-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The technologies that are enabling work in microbiome science are the same that are driving the data revolution in biology. Primarily this work is driven by high-throughput DNA sequencing, which is applied for profiling microbial community composition:</p>
 <ul>
 <li>marker gene profiling (such as 16S or ITS sequencing)</li>
@@ -579,16 +575,16 @@ <h2><span class="header-section-number">21.2</span> Goals of Amplicon analysis</
 </ul>
 <p>As a result, bioinformatics software tools are essential to microbiome research. For many microbiome researchers, bioinformatics is an intimidating and challenging aspect of their projects.</p>
 </div>
-<div id="microbiome-analysis-with-qiime-2" class="section level2" number="21.3">
-<h2><span class="header-section-number">21.3</span> Microbiome Analysis with QIIME 2</h2>
-<p>QIIME 2 is an all in one bioinformatics microbiome analysis platform. This platform allows for users to go from sequenced microbiome data to publication ready visualizations. The original QIIME, now referred to as QIIME 1, was published in 2010 <span class="citation">(<a href="#ref-Caporaso2010" role="doc-biblioref">Caporaso et al. 2010</a>)</span> and has been cited tens of thousands of times in the primary literature. QIIME 2, which was published in July of 2019 <span class="citation">(<a href="#ref-Bolyen2019" role="doc-biblioref">Bolyen et al. 2019</a>)</span>, succeeded QIIME 1 on 1 January 2018. QIIME 2 is better than QIIME 1 in all ways, and QIIME 1 is no longer actively supported. If you have previously used QIIME 1, you should invest time in learning and switching to QIIME 2. If you’re new to QIIME, start with QIIME 2. (When I refer to QIIME in this book, without specifying whether I’m referring to QIIME 1 or QIIME 2, I’m referring to the platform generally.)</p>
+<div id="microbiome-analysis-with-qiime-2" class="section level2 hasAnchor" number="21.4">
+<h2><span class="header-section-number">21.4</span> Microbiome Analysis with QIIME 2<a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p>QIIME 2 is an all in one bioinformatics microbiome analysis platform. This platform allows for users to go from sequenced microbiome data to publication ready visualizations. The original QIIME, now referred to as QIIME 1, was published in 2010 <span class="citation">(<a href="#ref-Caporaso2010">Caporaso et al. 2010</a>)</span> and has been cited tens of thousands of times in the primary literature. QIIME 2, which was published in July of 2019 <span class="citation">(<a href="#ref-Bolyen2019">Bolyen et al. 2019</a>)</span>, succeeded QIIME 1 on 1 January 2018. QIIME 2 is better than QIIME 1 in all ways, and QIIME 1 is no longer actively supported. If you have previously used QIIME 1, you should invest time in learning and switching to QIIME 2. If you’re new to QIIME, start with QIIME 2. (When I refer to QIIME in this book, without specifying whether I’m referring to QIIME 1 or QIIME 2, I’m referring to the platform generally.)</p>
 <p>QIIME 2 has large and growing user and developer communities, and these communities make QIIME 2 possible. The epicenter of the community is the QIIME 2 Forum. The forum is primarily known as a place where users can get technical support with QIIME 2 for no charge. Developers of QIIME 2 moderate the forum, and typically respond to technical support questions within a couple of business days. The forum is also a great place to discuss general topics in microbiome bioinformatics, or microbiome research methods generally. There are many active discussions on these topics on the forum. Keeping up with the discussions on the forum is a great way to learn about current topics in microbiome research methods. There’s also a free job board on the forum - you can use the forum to find jobs, or post your own job ads there to find employees who are well-versed in QIIME 2 and other bioinformatics tools. If you’re not already a member of the QIIME 2 Forum, you should consider joining. It’s a great way for you to get help, and as you develop your QIIME 2 skills helping others on the forum is a great way to reenforce your learning and to get involved in the community.</p>
 <p><a href="https://gregcaporaso.github.io/q2book/front-matter/preface.html">Here</a> is a high-level introduction to microbiome analysis using QIIME 2. This introduction will go over common methods, metrics and approaches used for microbiome science.
 So grab a cup of your favorite hot beverage and let’s get started! ☕</p>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-Bolyen2019" class="csl-entry">
 Bolyen, Evan, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian C Abnet, Gabriel A Al-Ghalith, Harriet Alexander, et al. 2019. <span>“Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using <span>QIIME</span> 2.”</span> <em>Nat. Biotechnol.</em> 37 (8): 852–57.
@@ -601,10 +597,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -674,7 +677,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/reference-keys.txt b/docs/no_toc/reference-keys.txt
index 22a7b8ae..045521d5 100644
--- a/docs/no_toc/reference-keys.txt
+++ b/docs/no_toc/reference-keys.txt
@@ -356,6 +356,7 @@ analysis-1
 more-resources-2
 microbiome-sequencing
 learning-objectives-19
+a-brief-introduction-to-microbiomes
 goals-of-amplicon-analysis
 microbiome-analysis-with-qiime-2
 itcr--omic-tool-glossary
diff --git a/docs/no_toc/references.html b/docs/no_toc/references.html
index fecdd9ed..224475ec 100644
--- a/docs/no_toc/references.html
+++ b/docs/no_toc/references.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>References | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="References | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="about-the-authors.html"/>
 
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>References | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,8 +535,8 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="references" class="section level1 unnumbered">
-<h1>References</h1>
+<div id="references" class="section level1 unnumbered hasAnchor">
+<h1>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div class="csl-entry">
@@ -893,10 +887,17 @@ <h1>References</h1>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -966,7 +967,7 @@ <h1>References</h1>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png
index f686f5e7..317ceac3 100644
Binary files a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png and b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g116525eff64_0_96.png differ
diff --git a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png
index 49ab6da3..36a5546a 100644
Binary files a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png and b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g11db7c97851_0_143.png differ
diff --git a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png
index ec3812e3..6ff2510f 100644
Binary files a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png and b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_0.png differ
diff --git a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png
index 9a93847f..128de8d1 100644
Binary files a/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png and b/docs/no_toc/resources/images/01-intro_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_10.png differ
diff --git a/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png b/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png
index 5ae74919..fb373f68 100644
Binary files a/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png and b/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_20.png differ
diff --git a/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png b/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png
index 1020afe9..d9e59eab 100644
Binary files a/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png and b/docs/no_toc/resources/images/02-genomics_overview_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_gd422c5de97_0_16.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png
index 9562ab50..8ceccacf 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_12.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png
index 0ba300e1..76fe3938 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_45.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png
index 1e8fd3fd..a4987e08 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_52.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png
index 767e3a89..5fddfea0 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12709027cba_1_70.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png
index 40bead14..033e60e7 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_1.png differ
diff --git a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png
index fc3c2754..dc9693d0 100644
Binary files a/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png and b/docs/no_toc/resources/images/03-whats-metadata_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13a7f78e577_0_0.png differ
diff --git a/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png b/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png
index 7f0ec18a..988cc37d 100644
Binary files a/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png and b/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_0.png differ
diff --git a/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png b/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png
index feef250f..50a4595d 100644
Binary files a/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png and b/docs/no_toc/resources/images/04-considerations-for-choosing_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g21f6c5d3981_0_5.png differ
diff --git a/docs/no_toc/resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png b/docs/no_toc/resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png
index 6fa5678d..9e3009f4 100644
Binary files a/docs/no_toc/resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png and b/docs/no_toc/resources/images/05-general-data-analysis-tools_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g20fbd76736e_0_0.png differ
diff --git a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_45.png b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_45.png
index 84eed6a1..d6a62753 100644
Binary files a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_45.png and b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_45.png differ
diff --git a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png
index beb1063c..8c1eb5c7 100644
Binary files a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png and b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png differ
diff --git a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_0.png b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_0.png
index ba43bcf9..ac8cf2af 100644
Binary files a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_0.png and b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_0.png differ
diff --git a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_10.png b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_10.png
index 55163dbc..a6f9b44f 100644
Binary files a/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_10.png and b/docs/no_toc/resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_10.png differ
diff --git a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_51.png b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_51.png
index 1b970ba2..4fd5f8a3 100644
Binary files a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_51.png and b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_51.png differ
diff --git a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png
index 7f79d81e..f27c830b 100644
Binary files a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png and b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_12.png differ
diff --git a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15f36710259_0_0.png b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15f36710259_0_0.png
index 9e7d5045..dc2c6b48 100644
Binary files a/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15f36710259_0_0.png and b/docs/no_toc/resources/images/07-microarray-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15f36710259_0_0.png differ
diff --git a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png
index 658d5e3b..dd4273c8 100644
Binary files a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png and b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_23.png differ
diff --git a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png
index 27ff8ea0..42f10b6f 100644
Binary files a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png and b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_28.png differ
diff --git a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png
index 8faef501..d3fb87e6 100644
Binary files a/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png and b/docs/no_toc/resources/images/08-annotating-genomes_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1b625723c80_0_45.png differ
diff --git a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png
index ec44e041..e00dc98a 100644
Binary files a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png and b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_71.png differ
diff --git a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png
index 3c952005..33d46b89 100644
Binary files a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png and b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_6.png differ
diff --git a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png
index c126ef8c..44f466db 100644
Binary files a/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png and b/docs/no_toc/resources/images/09-DNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_18.png differ
diff --git a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png
index c550d923..45203c0b 100644
Binary files a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png and b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png differ
diff --git a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png
index 1acea2a9..9437f597 100644
Binary files a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png and b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png differ
diff --git a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png
index 15e6afd8..10ab254d 100644
Binary files a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png and b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png differ
diff --git a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png
index ed3c34ac..73fe5600 100644
Binary files a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png and b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png differ
diff --git a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png
index 89760270..f8062784 100644
Binary files a/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png and b/docs/no_toc/resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png differ
diff --git a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png
index 023b08b9..496d3bc9 100644
Binary files a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png and b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png differ
diff --git a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png
index b9fb15e6..1c54280f 100644
Binary files a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png and b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png differ
diff --git a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png
index cb172034..75125203 100644
Binary files a/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png and b/docs/no_toc/resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png
index 101a13f7..e0c5ccfb 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_56.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png
index cb9b715d..8a309e3e 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_47.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png
index 26e44e0b..42004c95 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_59.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png
index f8b1905f..e6fc0859 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_65.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png
index 5f59f49d..da929e46 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_72.png differ
diff --git a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png
index 9546fd17..f63f3e30 100644
Binary files a/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png and b/docs/no_toc/resources/images/10a-bulk-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_23.png differ
diff --git a/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png b/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png
index bbcbc9e2..22098222 100644
Binary files a/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png and b/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png differ
diff --git a/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png b/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png
index 32e0ad8e..98267078 100644
Binary files a/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png and b/docs/no_toc/resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png differ
diff --git a/docs/no_toc/resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png b/docs/no_toc/resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png
index 95c834da..98df09a2 100644
Binary files a/docs/no_toc/resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png and b/docs/no_toc/resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png differ
diff --git a/docs/no_toc/resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png b/docs/no_toc/resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png
index 02514cc0..579155c2 100644
Binary files a/docs/no_toc/resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png and b/docs/no_toc/resources/images/11-chromatin_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_0.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png
index 71e0e0ac..b27cc152 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_66.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png
index e570f830..2774b4ec 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_23.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png
index 807f9eef..94f47602 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_0.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png
index ecf96491..1dc2a05a 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_43.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png
index 572e00d9..90377fca 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_50.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png
index 6ce14d83..0240d2ca 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_57.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png
index 396cde37..2d6c406b 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g1567d947351_0_63.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png
index 53e9c640..4c5deba2 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_18.png differ
diff --git a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png
index ab65234a..fd098aa8 100644
Binary files a/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png and b/docs/no_toc/resources/images/11a-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b5d73942d_0_26.png differ
diff --git a/docs/no_toc/resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png b/docs/no_toc/resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png
index 2a3d1d38..09dd36a1 100644
Binary files a/docs/no_toc/resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png and b/docs/no_toc/resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png differ
diff --git a/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png b/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png
index 35751837..3adec39b 100644
Binary files a/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png and b/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_61.png differ
diff --git a/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png b/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png
index 619ec49d..ffba6bc2 100644
Binary files a/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png and b/docs/no_toc/resources/images/11c-ChIP-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_18.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a3974d9533_0_10.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a3974d9533_0_10.png
index 1653f826..2869d982 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a3974d9533_0_10.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a3974d9533_0_10.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_0.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_0.png
index 86bd5ca3..8e190f75 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_0.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_0.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_14.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_14.png
index 1349efb2..895cd7f7 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_14.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_14.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_22.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_22.png
index 031046d6..a64ff9ea 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_22.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_22.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_28.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_28.png
index f9e918fd..68df87b3 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_28.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_28.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_35.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_35.png
index 271d59f7..559e0d37 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_35.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_35.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_42.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_42.png
index 5c929aaf..f3779d79 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_42.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_42.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_7.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_7.png
index 125b6a5d..39b40a79 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_7.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6a1c838b1_1_7.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_0.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_0.png
index cee7372b..63ebfdf2 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_0.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_0.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_6.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_6.png
index ff6cc469..af85809a 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_6.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a6c0fa3d72_0_6.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a7dbc78b0e_0_0.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a7dbc78b0e_0_0.png
index 941b9be4..4f1186eb 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a7dbc78b0e_0_0.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2a7dbc78b0e_0_0.png differ
diff --git a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2aeb8138211_0_0.png b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2aeb8138211_0_0.png
index d27fdfed..bb441791 100644
Binary files a/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2aeb8138211_0_0.png and b/docs/no_toc/resources/images/11d-CUT-and-RUN_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2aeb8138211_0_0.png differ
diff --git a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png
index db39a4f0..4efb5c70 100644
Binary files a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png and b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_91.png differ
diff --git a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_10.png b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_10.png
index d6aa7d51..cf3e9487 100644
Binary files a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_10.png and b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_10.png differ
diff --git a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_0.png b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_0.png
index bf354a53..c2a670e1 100644
Binary files a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_0.png and b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_0.png differ
diff --git a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_42.png b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_42.png
index 5dc8a490..2acf0b5c 100644
Binary files a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_42.png and b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_42.png differ
diff --git a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png
index 76fde7ca..ca5d8ac9 100644
Binary files a/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png and b/docs/no_toc/resources/images/12-methylation_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g17e24e1c00a_0_5.png differ
diff --git a/docs/no_toc/resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png b/docs/no_toc/resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png
index 57a8b82a..0804178a 100644
Binary files a/docs/no_toc/resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png and b/docs/no_toc/resources/images/13-microbiome_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_0_0.png differ
diff --git a/docs/no_toc/rna-methods-overview.html b/docs/no_toc/rna-methods-overview.html
index 40524924..222bedc7 100644
--- a/docs/no_toc/rna-methods-overview.html
+++ b/docs/no_toc/rna-methods-overview.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 11 RNA Methods Overview | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 11 RNA Methods Overview | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="whole-genome-or-exome-sequencing.html"/>
 <link rel="next" href="bulk-rna-seq-1.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 11 RNA Methods Overview | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,50 +535,50 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="rna-methods-overview" class="section level1" number="11">
-<h1><span class="header-section-number">Chapter 11</span> RNA Methods Overview</h1>
+<div id="rna-methods-overview" class="section level1 hasAnchor" number="11">
+<h1><span class="header-section-number">Chapter 11</span> RNA Methods Overview<a href="rna-methods-overview.html#rna-methods-overview" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. Some of it has been written with AI tools. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-9" class="section level2" number="11.1">
-<h2><span class="header-section-number">11.1</span> Learning Objectives</h2>
-<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png" title="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" width="100%" /></p>
+<div id="learning-objectives-9" class="section level2 hasAnchor" number="11.1">
+<h2><span class="header-section-number">11.1</span> Learning Objectives<a href="rna-methods-overview.html#learning-objectives-9" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_76.png" alt="Learning objectives This chapter will demonstrate how to: Understand the goals and data collection processes for gene expression assays. Compare and contrast the following methods: Bulk RNA-seq, Single cell RNA-seq, Gene expression microarrays" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-gene-expression-analysis" class="section level2" number="11.2">
-<h2><span class="header-section-number">11.2</span> What are the goals of gene expression analysis?</h2>
+<div id="what-are-the-goals-of-gene-expression-analysis" class="section level2 hasAnchor" number="11.2">
+<h2><span class="header-section-number">11.2</span> What are the goals of gene expression analysis?<a href="rna-methods-overview.html#what-are-the-goals-of-gene-expression-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The goal of gene expression analysis is to quantify RNAs across the genome. This can signify the extent to which various RNAs are being transcribed in a particular cell. This can be informative for what kinds of activity a cell is undergoing and responding to.</p>
-<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png" title="The goal of gene expression analysis is to quantify RNAs on a genome wide level" alt="The goal of gene expression analysis is to quantify RNAs on a genome wide level" width="100%" /></p>
+<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g142c259a793_0_0.png" alt="The goal of gene expression analysis is to quantify RNAs on a genome wide level" width="100%" /></p>
 </div>
-<div id="comparison-of-rna-methods" class="section level2" number="11.3">
-<h2><span class="header-section-number">11.3</span> Comparison of RNA methods</h2>
+<div id="comparison-of-rna-methods" class="section level2 hasAnchor" number="11.3">
+<h2><span class="header-section-number">11.3</span> Comparison of RNA methods<a href="rna-methods-overview.html#comparison-of-rna-methods" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>There are three general methods we will discuss for evaluating gene expression. RNA sequencing (whether bulk or single-cell) allows you to catch more targets than gene expression microarrays but is much more costly and computationally intensive. Gene expression microarrays have a lower dynamic range than RNA-seq generally but are much more cost effective. Spatial transcriptomics is the newest method on the block and has the ability to relate gene expression to tissue regions and subpopulations.</p>
-<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png" title="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." alt="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." width="100%" /></p>
-<div id="single-cell-rna-seq-scrna-seq" class="section level3" number="11.3.1">
-<h3><span class="header-section-number">11.3.1</span> Single-cell RNA-seq (scRNA-seq):</h3>
+<p><img src="resources/images/10-RNA_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g13438a9a5b2_0_80.png" alt="Gene expression microarrays are low cost and low computationally intensive. Bulk RNA-seq is higher cost, requires more computational resources but covers more targets than gene expression arrays. Single cell RNA-seq is higher cost, requires more computational resources but as opposed to Bulk RNA-seq gives single cell resolution." width="100%" /></p>
+<div id="single-cell-rna-seq-scrna-seq" class="section level3 hasAnchor" number="11.3.1">
+<h3><span class="header-section-number">11.3.1</span> Single-cell RNA-seq (scRNA-seq):<a href="rna-methods-overview.html#single-cell-rna-seq-scrna-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li>Cost: scRNA-seq methods can be relatively expensive due to the need for specialized protocols and reagents. Droplet-based methods (e.g., 10x Genomics) are generally more cost-effective than full-length methods (e.g., SMART-seq) because they require fewer sequencing reads per cell.</li>
 <li>Experimental Goals: scRNA-seq is suitable when studying cellular heterogeneity and characterizing gene expression profiles at the single-cell level. It provides insights into cell types, cell states, and cell-cell interactions.</li>
 <li>Specific Requirements: scRNA-seq requires single-cell isolation techniques, and the choice of method depends on the desired cell throughput, desired coverage, and the need for full-length transcript information.</li>
 </ul>
 </div>
-<div id="bulk-rna-seq" class="section level3" number="11.3.2">
-<h3><span class="header-section-number">11.3.2</span> Bulk RNA-seq:</h3>
+<div id="bulk-rna-seq" class="section level3 hasAnchor" number="11.3.2">
+<h3><span class="header-section-number">11.3.2</span> Bulk RNA-seq:<a href="rna-methods-overview.html#bulk-rna-seq" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li>Cost: Bulk RNA-seq is generally more cost-effective compared to scRNA-seq because it requires fewer sequencing reads per sample. The cost primarily depends on the sequencing depth required.</li>
 <li>Experimental Goals: Bulk RNA-seq is appropriate for analyzing average gene expression profiles across a population of cells. It provides information on gene expression levels and can be used for differential gene expression analysis.</li>
 <li>Specific Requirements: Bulk RNA-seq requires a sufficient quantity of RNA from the sample, typically obtained through RNA extraction and purification.</li>
 </ul>
 </div>
-<div id="gene-expression-microarray" class="section level3" number="11.3.3">
-<h3><span class="header-section-number">11.3.3</span> Gene Expression Microarray:</h3>
+<div id="gene-expression-microarray" class="section level3 hasAnchor" number="11.3.3">
+<h3><span class="header-section-number">11.3.3</span> Gene Expression Microarray:<a href="rna-methods-overview.html#gene-expression-microarray" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li>Cost: Gene expression microarrays are usually less expensive compared to RNA-seq methods. The cost includes array production and hybridization.</li>
 <li>Experimental Goals: Microarrays are useful for profiling gene expression levels across a large number of genes in a cost-effective manner. They can be employed for differential gene expression analysis and identification of gene expression patterns.</li>
 <li>Specific Requirements: Microarrays require labeled cDNA or cRNA targets, and they are limited to the detection of known transcripts represented on the array platform.</li>
 </ul>
 </div>
-<div id="spatial-transcriptomics" class="section level3" number="11.3.4">
-<h3><span class="header-section-number">11.3.4</span> Spatial Transcriptomics:</h3>
+<div id="spatial-transcriptomics" class="section level3 hasAnchor" number="11.3.4">
+<h3><span class="header-section-number">11.3.4</span> Spatial Transcriptomics:<a href="rna-methods-overview.html#spatial-transcriptomics" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
 <li>Cost: Spatial transcriptomics methods can vary in cost depending on the technique used. Some methods involve additional steps and specialized equipment, making them relatively more expensive.</li>
 <li>Experimental Goals: Spatial transcriptomics allows the investigation of gene expression patterns within the context of tissue or cellular spatial organization. It provides spatial information on gene expression, enabling the identification of cell types and their interactions.</li>
@@ -596,10 +590,17 @@ <h3><span class="header-section-number">11.3.4</span> Spatial Transcriptomics:</
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -669,7 +670,7 @@ <h3><span class="header-section-number">11.3.4</span> Spatial Transcriptomics:</
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/search_index.json b/docs/no_toc/search_index.json
index 0f68b5de..78a4c151 100644
--- a/docs/no_toc/search_index.json
+++ b/docs/no_toc/search_index.json
@@ -1 +1 @@
-[["index.html", "Choosing Genomics Tools About this Course 0.1 Available course formats", " Choosing Genomics Tools May, 2024 About this Course This course is part of a series of courses for the Informatics Technology for Cancer Research (ITCR) called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. This initiative is funded by the following grant: National Cancer Institute (NCI) UE5 CA254170. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at www.itcrtraining.org for more information. 0.1 Available course formats This course is available in multiple formats which allows you to take it in the way that best suites your needs. You can take it for certificate which can be for free or fee. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. This course can be taken for free certification through Leanpub. This course can be taken on Coursera for certification here (but it is not available for free on Coursera). Our courses are open source, you can find the source material for this course on GitHub. "],["introduction.html", "Chapter 1 Introduction 1.1 Target Audience 1.2 Topics covered: 1.3 Motivation 1.4 Curriculum 1.5 How to use the course", " Chapter 1 Introduction This is a living course meaning it is constantly changing and being updated. The goal for this course is to be a “wikipedia” of omic data. If you’d like to contribute, you can file a pull request on GitHub if you are comfortable with that sort of thing or email csavonen@fredhutch.org to ask how to get started. 1.1 Target Audience The course is intended for students in the biomedical sciences and researchers who have been given data and don’t know what to do with it or would like an overview of the different genomic data types that are out there. This course is written for individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of genomic data types. Want to find resources for processing and interpreting genomics data. 1.2 Topics covered: 1.3 Motivation Cancer datasets are plentiful, complicated, and hold untold amounts of information regarding cancer biology. Cancer researchers are working to apply their expertise to the analysis of these vast amounts of data but training opportunities to properly equip them in these efforts can be sparse. This includes training in reproducible data analysis methods. Often students and researchers need to utilize genomic data to reach the next steps of their research but may not have formal training in computational methods or the basics of the genomic data they are attempting to utilize. Often researchers receive their genomic data processed from another lab or institution, and although they are excited to gain insights from it to inform the next steps of their research, they may not have a practical understanding of how the data they have received came to be or what needs to be done with it. As an example, data file formats may not have been covered in their training, and the data they received seems unintelligible and not as straightforward as they hoped. This course attempts to give this researcher the basic bearings and resources regarding their data, in hopes that they will be equipped and informed about how to obtain the insights for their researcher they originally aimed to find. 1.4 Curriculum Goal of this course: Equip learners with tutorials and resources so they can understand and interpret their genomic data in a way that helps them meet their goals and handle the data properly. This includes helping learners formulate questions they will need to ask others about their data What is not the goal Teach learners about choosing parameters or about the ins and outs of every genomic tool they might be interested in. This course is meant to connect people to other resources that will help them with the specifics of their genomic data and help learners have more efficient and fruitful discussions about their data with bioinformatic experts. 1.5 How to use the course This course is designed to be a jumping off point to more specific resources based on a genomic data type the learner has in mind (or currently on their computer). We encourage learners to follow links to resources we provide and feel free to jump around to chapters that are most useful for them. "],["a-very-general-genomics-overview.html", "Chapter 2 A Very General Genomics Overview 2.1 Learning Objectives 2.2 General informatics files", " Chapter 2 A Very General Genomics Overview 2.1 Learning Objectives In this chapter we are going to cover sequencing and microarray workflows at a very general high level overview to give you a first orientation. As we dive into specific data types and experiments, we will get into more specifics. Here we will cover the most common file formats. If you have a file format you are dealing with that you don’t see listed here, it may be specific to your data type and we will discuss that more in that data type’s respective chapter. We still suggest you go through this chapter to give you a basic understanding of commonalities of all genomic data types and workflows 2.1.1 What do genomics workflows look like? In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly, this summarized data is what can be used to make inferences and create plots and results tables. 2.1.2 Basic file formats Before we get into bioinformatic file types, we should establish some general file types that you likely have already worked with on your computer. These file types are used in all kinds of applications and not specific to bioinformatics. 2.1.2.1 TXT - Text A text file is a very basic file format that contains text! 2.1.2.2 TSV - Tab Separated Values Tab separated values file is a text file is good for storing a data table. It has rows and columns where each value is separated by (you guessed it), tabs. Most commonly, if your genomics data has been provided to you in a TSV or CSV file, it has been processed and summarized! It will be your job to know how it was processed and summarized Here the literal ⇥ represents tabs which often may show up invisible in your text editor’s preference settings. gene_id⇥sample_1⇥sample_2 gene_a⇥12⇥15, gene_b⇥13⇥14 2.1.2.3 CSV - Comma Separated Values A comma separated values file is list just like a TSV file but instead of values being separated by tabs it is separated by… (you guessed it), commas! In its raw form, a CSV file might look like our example below (but if you open it with a program for spreadsheets, like Excel or Googlesheets, it will look like a table) gene_id, sample_1, sample_2, gene_a, 12, 15, gene_b, 13, 14 2.1.3 Sequencing file formats 2.1.3.1 SAM - Sequence Alignment Map SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. For more about SAM files. 2.1.3.2 BAM - Binary Alignment Map BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily). 2.1.3.3 FASTA - “fast A” Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence): &gt;SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT For more about fasta files. 2.1.3.4 FASTQ - “Fast q” A Fastq file is like a Fasta file except that it also contains information about the Quality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly? @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !&#39;&#39;*((((***+))%%%++)(%%%%).1***-+*&#39;&#39;))**55CCF&gt;&gt;&gt;&gt;&gt;&gt;CCCCCCC65 For more about fastq files. Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you. Sequencing data that is not of high enough quality should not be trusted! It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters. 2.1.3.5 BCL - binary base call (BCL) sequence file format This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs. More about BCL to Fastq conversion. 2.1.3.6 VCF - Variant Call Format VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are variant from the reference genome or each other. This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter. For more on VCF files. 2.1.3.7 MAF - Mutation Annotation Format MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file. For more on MAF files. 2.1.4 Microarray file formats 2.1.4.1 IDAT - intensity data file This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly. Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into this package to help you do that. For more on IDAT files. 2.1.4.2 DAT - data file This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels. For more on DAT files. 2.1.4.3 CEL This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file. For more on CEL files 2.1.4.4 CHP CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files. For more about CHP files. 2.2 General informatics files At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter: 2.2.0.1 BED - Browser Extensible Data A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the chrom, chromStart and chromEnd columns to start. A BED file might look like this: chrom chromStart chromEnd other_optional_columns chr1 0 1000 good chr2 100 3000 bad For more on BED files. 2.2.0.2 GFF/GTF General Feature Format/Gene Transfer Format A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data. You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version. A GFF file may look like this (borrowed example from Ensembl): 1 transcribed_unprocessed_pseudogene gene 11869 14409 . + . gene_id &quot;ENSG00000223972&quot;; gene_name &quot;DDX11L1&quot;; gene_source &quot;havana&quot;; gene_biotype &quot;transcribed_unprocessed_pseudogene&quot;; Note that it will be useful for annotating genes and what we know about them. For more about GTF and GFF files. 2.2.1 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. "],["guidelines-for-good-metadata.html", "Chapter 3 Guidelines for Good Metadata 3.1 Learning Objectives 3.2 What are metadata? 3.3 How to create metadata?", " Chapter 3 Guidelines for Good Metadata 3.1 Learning Objectives 3.2 What are metadata? Metadata are critically important descriptive information about your data. Without metadata, the data themselves are useless or at best vastly limited. Metadata describe how your data came to be, what organism or patient the data are from and include any and every relevant piece of information about the samples in your data set. Metadata includes but isn’t limited to, the following example categories: At this time it’s important to note that if you work with human data or samples, your metadata will likely contain personal identifiable information (PII) and protected health information (PHI). It’s critical that you protect this information! For more details on this, we encourage you to see our course about data management. 3.3 How to create metadata? Where do these metadata come from? The notes and experimental design from anyone who played a part in collecting or processing the data and its original samples. If this includes you (meaning you have collected data and need to create metadata) let’s discuss how metadata can be made in the most useful and reproducible manner. 3.3.1 The goals in creating your metadata: 3.3.1.1 Goal A: Make it crystal clear and easily readable by both humans and computers! Some examples of how to make your data crystal clear: - Look out for typos and spelling errors! - Don’t use acronyms unless you need to and then if you do need to make sure to explain what the acronym means. - Don’t add extraneous information – perhaps items that are relevant to your lab internally but not meaningful to people outside of your lab. Either explain the significance of such information or leave it out. Make your data tidy. &gt; Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: &gt; - Every column is a variable. &gt; - Every row is an observation. &gt; - Every cell is a single value. 3.3.1.2 Goal B: Avoid introducing errors into your metadata in the future! Toward these two goals, this excellent article by Broman &amp; Woo discusses metadata design rules. We will very briefly cover the major points here but highly suggest you read the original article. Be Consistent - Whatever labels and systems you choose, use it universally. This not only means in your metadata spreadsheet but also anywhere you are discussing your metadata variables. Choose good names for things - avoid spaces, special characters, or within the lab jargon. Write Dates as YYYY-MM-DD - this is a global standard and less likely to be messed up by Microsoft Excel. No Empty Cells - If a particular field is not applicable to a sample, you can put NA but empty cells can lead to formatting errors or just general confusion. Put Just One Thing in a Cell - resist the urge to combine variables into one, you have no limit on the number of metadata variables you can make! Make it a Rectangle - This is the easiest way to read data, for a computer and a human. Have your samples be the rows and variables be columns. Create a Data Dictionary - Have somewhere that you describe what your metadata mean in detailed paragraphs. No Calculations in the Raw Data Files - To avoid mishaps, you should always keep a clean, original, raw version of your metadata that you do not add extra calculations or notes to. Do Not Use Font Color or Highlighting as Data - This only adds to confusion to others if they don’t understand your color coding scheme. Instead create a new variable for anything you might be tempted to color code. Make Backups - Metadata are critical, you never want to lose them because of spilled coffee on a computer. Keep the original backed up in a multiple places. We recommend keeping writing your metadata in something like GoogleSheets because it is both free and also saved online so that it is safe from computer crashes. Use Data Validation to Avoid Errors - set data types to have googlesheets or excel check that the data in the columns is the type of data it expects for a given variable. Note that it is very dangerous to open gene data with Excel. According to Ziemann, Eren, and El-Osta (2016), approximately one-fifth of papers with Excel gene lists have errors. This happens because Excel wants to interpret everything as a date. We strongly caution against opening (and saving afterward) gene data in Excel. 3.3.2 To recap: If you are not the person who has the information needed to create metadata, or you believe that another individual already has this information, make sure you get ahold of the metadata that correspond to your data. It will be critical for you to have to do any sort of meaningful analysis! References "],["considerations-for-choosing-tools.html", "Chapter 4 Considerations for choosing tools 4.1 Learning Objectives 4.2 Overview 4.3 Coming to a decision 4.4 More resources", " Chapter 4 Considerations for choosing tools 4.1 Learning Objectives 4.2 Overview In this course, we will introduce you to the fundamentals of various data types and give you advice about choosing tutorials and tools whenever possible. However, it is critical to note that there is no “one size fits all” when it comes to genomic data decisions. Instead, our goals are to equip you with the knowledge you need as well as the questions you need to ask yourself (or others) when making decisions about your genomics data. We will discuss the following considerations you should gather information and otherwise ponder when comparing one or more tools for your analysis: 4.2.1 Is this tool appropriate for your data type? Certain tools are built for certain kinds of data. In each data-type-specific chapter we will attempt to point you tools that are appropriate for the given data type. However, note that some tools also might require tweaks in parameters for non-standard data collection methods. If you were not sure of the data collection methods used for your data type, be sure to follow the data type specific advice in the chapter to find out the information about your data that you need to know to make an informed decision. 4.2.2 Is this tool appropriate for your scientific question? Some tools may be appropriate for the general data type, but might mask information you will need to answer your particular scientific question or hypothesis. For example, for RNA-seq if you are interested in splice variants, you may not be able to use certain alignment tools that do not differentiate between splice variants. Be sure to make your goals and scientific questions clear when asking for advice or guidance. Some tools may be applicable to certain scientific questions, but other accommodations or preprocessing may need to be done 4.2.3 Is this tool in an interface or programming language you feel comfortable with? Genomics and informatics tools can be classified into two groups based on how you interact with them. These groups are 1) command line or 2) graphics user interface (GUI). GUIs are tools that you can use by clicking and pointing with your mouse whereas command line tools require input through writing out commands. Command line tools often lend to greater reproducibility of an analysis since a script can have all the steps needed to re-run analysis. This makes it so you could re-run and reproduce your results with one command instead of lots of clicking various buttons in particular order as you would need to do with a GUI based tool. Your level of comfort or willingness/time available to learn a programming language like R or Python will influence what tool options you have. If you are unfamiliar and uncomfortable writing in R, Python, or Bash scripting, this will influence what tools you have available to you or whether you will need to enlist more outside help. If you are interested in learning to use command line, we have many resources and recommendations for you to use for learning in this next chapter. However, if you do not have the bandwidth or motivation to learn how to code, you will want to gravitate toward tools that have GUIs. 4.2.4 How much computing power do you have? Some tools require a lot more computing resources (or runtime) than others. Many institutions have cloud computing resources or high powered computing clusters for your use. We’ll recommend you to our Computing Course for more information about this. But your computing budget access, and time allotment, may influence what tools you would like to use for a project. For example, for RNA seq data alignment, traditional aligners that use the genome take an order of magnitude greater amount of time to run than quantifying transcripts with pseudo alignment based tools. For many applications pseudoaligners are perfectly appropriate and efficient choices that can be run on a laptop. But if you prefer a traditional aligner because you are interested in something that is not detected by pseudosligners such as splice variants, then you may want to look into using some computing resources for this task. All these decisions need to be weighed in balance with each other. 4.2.5 Are there benchmarking papers that compare this tool to other options? Some tools and their algorithms have been more thoroughly examined and tested than others. And this doesn’t always align to a tool’s popularity. Seek out the literature and what studies have been done comparing this tool to others like it. Keep in mind the tool developer’s own bias if the paper is coming directly from the group or individual who is the creator of the tool. Developers will be more likely to understand and know how to tweak parameters of their own tool properly, while not necessarily spending as much time testing and adjusting tools made by others. This concept has sometimes been called the “Continental Breakfast Included” concept. 4.2.6 Is the tool well documented and usable? Well documented and usable tools can be very powerful. Poorly documented tools which may lead to unknown parameters or other mishandling of the data if it has not been made clear by the tool developers and maintainers. Good understanding of what a tool is doing with the data you give it is perhaps more important than using fancy algorithms that are unclear. Not only does documentation and usability increase your ability to use a tool, but your analysis will be more reproducible if others can also understand the tools that you used. The existence of forums and user groups for particular tools, not only makes it a useful resource for you for analysis, troubleshooting and interpretation of your results, but it also indicates a particular drive for the tool to continue to be maintained and developed overtime. 4.2.7 Is the tool well maintained? If a tool is actively being maintained this will aid in the reproducibility of your results. Tools on GitHub (an open-source platform for software) or other repositories often indicate when latest updates to a tool were made. Ideally updates are being made regularly to the tool, but a lack of updates does not speak well for the future existence of the tool. A tool that is not well maintained or supported may deprecate and make it increasingly difficult if not possible to reproduce, re-run or further develop your analysis. 4.2.8 Is the tool generally accepted by the field? While tool popularity should not be the only consideration when choosing a tool, it is an aspect that can influence communication or acceptance of your results. All things being equal, it can be better to choose a tool that is more accepted by the community as tried and true, and well benchmarked as opposed to the bleeding edge technology that may have not been truly scrutinized yet. In an analysis it is perhaps more valuable to know and weigh the known limitations of an older tool than to use a newer tool whose limitations may not have been identified yet (but it certainly will have its own limitations identified in time). 4.3 Coming to a decision It’s important to note that the questions we will discuss here need to be considered in balance of one another. Rarely should you make a decision about a tool without considering all of these items congruently. For example, some tools may have better benchmarking but if it is more computationally costly and you do not have access to the necessary computing resources to run the tool, then you may need to consider other options. 4.4 More resources A longer list of tools and resources can be found here DataTrail curriculum Introduction to Reproducibility Advanced Reproducibility in Cancer Informatics Computing in Cancer Informatics "],["general-data-analysis-tools.html", "Chapter 5 General Data Analysis Tools 5.1 Learning Objectives 5.2 Command Line vs GUI 5.3 More resources", " Chapter 5 General Data Analysis Tools 5.1 Learning Objectives 5.2 Command Line vs GUI When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do. Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here. 5.2.1 Bash Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash. On a Mac computer, you can use bash commands by finding your Terminal window. Go to your search bar and search for the Terminal. You may want to keep this application handy. In Windows, you can use bash commands by search for Command Prompt application. Go to your search bar and search for Command Prompt. You may want to keep this application handy. 5.2.2 R R is a program commonly used for statistics and data analysis. It’s free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our tool glossary. 5.2.2.1 Resources for learning R 5.2.2.1.1 R and Tidyverse Swirl, an interactive tutorial R for Data Science Tidyverse skills for Data Science by Carrie Wright. Handy R cheatsheets R Cookbook Second Edition Advanced R R for Epidemiology - has generally good R advice O’Reilly books available through Seattle Public Library 5.2.2.1.2 R notebooks R Markdown Tutorial on R, RStudio and R Markdown Handy R cheatsheets R Notebooks tutorial 5.2.2.1.3 R and Genomics Intro to R and Tidyverse course and exercises from the Childhood Cancer Data Lab. Refine.bio examples from the Childhood Cancer Data Lab. Biostar Handbook: A Beginner’s Guide to Bioinformatics 5.2.3 Python Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our tool glossary. 5.2.3.1 Resources for learning python Python Data Science Handbook Python for Biologists 5.3 More resources A longer list of tools and resources can be found here DataTrail curriculum Introduction to Reproducibility Advanced Reproducibility in Cancer Informatics Computing in Cancer Informatics "],["sequencing-data.html", "Chapter 6 Sequencing Data 6.1 Learning Objectives 6.2 How does sequencing work? 6.3 Sequencing concepts 6.4 Very General Sequencing Workflow", " Chapter 6 Sequencing Data This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 6.1 Learning Objectives In this section, we are going to discuss generalities that apply to all sequencing data. This is meant to be a “primer” for you which data-type specific chapters will build off of to give you more specific and practical steps and advice in regards to your data type. 6.2 How does sequencing work? Sequencing methods, whether they are targeting DNA, transcriptomes, or some other target of the genome, have some commonalities in the steps as well as what types of biases and data generation artifacts to look out for. All sequencing experiments start out with the extraction of the biological material of interest. This biological material will be processed in some way to isolate to the genomic target of interest (we will cover the various techniques for this in more detail in each respective data chapter since it is highly specific to the data type). This set of processing steps will lead up to library generation – adding a way to catalog what molecules came from where. Sometimes for this library prep the sequences need to be fragmented before hand and an adapter bound to them. The resulting sample material is often a very small quantity, which means Polymerase Chain Reaction (PCR) needs to be used to amplify the material to a quantity large enough to be reliably sequenced. We will talk about how this very common method not only amplifies the sequences we want to read but amplifies sequence method biases that we would like to avoid. At the end of this process, base sequences are called for the samples (with varying degrees of confidence), creating huge amounts of data and what hopefully contains valuable research insights. 6.3 Sequencing concepts 6.3.1 Inherent biases Sequences are not all sequenced or amplified at the same rate. In a perfect world, we could take a simple snapshot of the genome we are interested in and know exactly what and how many sequences were in a sample. But in reality, sequencing methods and the resulting data always have some biases we have to be aware of and hopefully use methods that attempt to mitigate the biases. 6.3.1.1 GC bias You may recall that with nucleotides: adenine binds with thymine and guanine binds with cytosine. But, the guanine-cytosine bond (GC) has 3 hydrogen bonds whereas the adenine-thymine bond (AT) has only 2 bonds. This means that the GC bond is stickier (to put it scientifically) and needs higher temperatures to unbind. The sequencing and PCR amplification process involves cycling through temperatures and binding and unbinding of sequences which means that if a sequence has a lot of G’s and C’s (high GC content) it will unbind at a different temperatures than a sequence of low GC content. 6.3.1.2 Sequence complexity Nonrepeating sequences are harder to sequence and amplify than repeating sequences. This means that the complexity of a target sequence influences the PCR amplification and detection. 6.3.1.3 Length bias Longer sequences – whether they represent long sequence variants, long transcripts, or etc, are more likely to be identified than shorter ones! So if you are attempting to quantify the presence of a sequence, a longer sequence is much more likely to be counted more often. 6.3.2 PCR Amplification All of the above biases are amplified when the sequences are being amplified! You can picture that if each of these biases have a certain effect for one copy, then as PCR steps copy the sequence exponentially, the error is also being multiplied! PCR amplification is generally a necessary part of the process. But there are tools that allow you to try to combat the biases of PCR amplification in your data analysis. These tools will be dependent on the type of sequencing methods you are using and will be something that is discussed in each data type chapter. 6.3.3 Depth of coverage The depth of sequencing refers to how many times on average a particular base is sequenced. Obviously the more times something is sequenced, the more you can be confident that the base call is accurate. However, sequencing at greater depths also takes more time and money. Depending on your sequencing goals and methods there is an appropriate level of depth that is needed. Coverage on the other hand has to do with how much of the target is covered. If you are doing Whole Genome Sequencing, what percentage of the whole genome were you able to sequence? You may realize how depth is related to coverage, in that the greater depth of sequencing you use the more likely you are to also cover more of the genome. As discussed in relation to the biases, some part of the genome are harder to reach than others, so by reading at greater depths some of those “hard to read” parts of the genome will be able to be covered. 6.3.4 Quality controls Sequencing bases involves some error/confidence rate. As mentioned, some parts of the genome are harder to read than others. Or, sometimes your sequencing can be influenced by poor quality sample that has degraded. Before you jump in to further analyzing your data, you will want to investigate the quality of the sequencing data you’ve collected. The most common and well-known method for assessing sequencing quality controls is FASTQC. FASTQC creates an abundance of sequencing quality control reports from fastq files. These reports need to be interpreted within the context of your sequencing methods, samples, and experimental goals. Often bioinformatics cores are good to contact about these reports (they may have already run FASTQC on your data if that is where you obtained your data initially). They can help you wade through the flood of quality control reports printed out by FASTQC. FASTQC also has great documentation that can attempt to guide you through report interpretation. This also includes examples of good and bad FASTQC reports. But note that all FASTQC report interpretations must be done relative to the experiment that you have done. In other words, there is not a one size fits all quality control cutoffs for your FASTQC reports. The failure/success icons FASTQC reports back are based on defaults that may not be accurate or applicable to your data, so further investigation and consultation is warranted before you decided to trust or pitch your sequencing data. 6.3.5 Alignment Once you have your reads and you find them reasonably trustworthy through quality control checks, you will want to align them to your reference. The reference you align your sequences to will depend on the data type you have: a reference genome, a reference transcriptome, something else? Traditional aligners - Align your data to a reference using standard alignment algorithms. Can be very computationally intensive. Pseudo aligners - much faster and the trade off for accuracy is often negligible (but again is dependent on the data you are using). TODO: considerations for alignment. 6.3.6 Single End vs Paired End Sequencing can be done single-end or paired-end. Paired end means the primers are going to bind to both sides of a sequence. This can help you avoid some 3’ bias and give you more complete coverage of the area you are sequencing. But, as you may guess, pair-end read sequencing is more expensive than single end. You will want to determine whether your sequencing is paired end or single end. If it is paired end you will likely see file names that indicate this. You should have pairs of files that may or may not be labeled with _1 and _2 or _F and _R. We will discuss file nomenclature more specifically as it pertains to different data types in the upcoming chapters. 6.4 Very General Sequencing Workflow In the data type specific chapters, we will cover the sequencing data workflows and file formats in more detail. But in the most general sense, sequencing workflows look like this: 6.4.1 Sequencing file formats 6.4.1.1 SAM - Sequence Alignment Map SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. For more about SAM files. 6.4.1.2 BAM - Binary Alignment Map BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily). 6.4.1.3 FASTA - “fast A” Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence): &gt;SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT For more about fasta files. 6.4.1.4 FASTQ - “Fast q” A Fastq file is like a Fasta file except that it also contains information about the Quality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly? @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !&#39;&#39;*((((***+))%%%++)(%%%%).1***-+*&#39;&#39;))**55CCF&gt;&gt;&gt;&gt;&gt;&gt;CCCCCCC65 For more about fastq files. Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you. Sequencing data that is not of high enough quality should not be trusted! It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters. 6.4.1.5 BCL - binary base call (BCL) sequence file format This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs. More about BCL to Fastq conversion. 6.4.1.6 VCF - Variant Call Format VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are variant from the reference genome or each other. This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter. For more on VCF files. 6.4.1.7 MAF - Mutation Annotation Format MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file. For more on MAF files. 6.4.2 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. "],["microarray-data.html", "Chapter 7 Microarray Data 7.1 Learning Objectives 7.2 Summary of microarrays 7.3 How do microarrays work? 7.4 What types of arrays are there? 7.5 General processing of microarray data 7.6 Very General Microarray Workflow 7.7 General informatics files", " Chapter 7 Microarray Data This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 7.1 Learning Objectives 7.2 Summary of microarrays Microarrays have been in use since before high throughput sequencing methods became more affordable and widespread, but they still can be a effective and affordable tool for genomic assays. Depending on your goals, microarray may be a suitable choice for your genomic study. 7.3 How do microarrays work? All microarrays work on hybridization to sets of oligonucleotides on a chip. However, the preparation of the samples, and the oligonucleotides’ hybridization targets vary depending on the assay and goals. On a basic principle, oligonucleotide probes are designed for different targets sets designed for the same targets are put together. On the whole chip, these probes are arranged in a grid like design so that after a sample is hybridized to them, you can detect how much of the target is detected by taking an image and knowing what target each location is designed to. 7.3.1 Pros: Microarrays are much more affordable than high throughput sequencing which can allow you to run more samples and have more statistical power (Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays take less time to process than most high throughput sequencing methods(Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays are generally less computationally intensive to process and you can get your results more quickly(Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays are generally as good as sequencing methods for detecting clinical endpoints (W. Zhang et al. 2015). 7.3.2 Cons: Microarray chips can only measure the targets they are designed for, and cannot be used for exploratory purposes (W. Zhang et al. 2015). Microarrays’ probe designs can only be as up to date as the genome they were designed against at the time (Mantione et al. 2014; ALSF 2019). Microarray does not escape oligonucleotide biases like GC content and sequence composition biases(ALSF 2019). 7.4 What types of arrays are there? 7.4.1 SNP arrays Single nucleotide polymorphism arrays are designed to measure DNA variants. They are designed to target DNA variants. When the sample is hybridized, the amount of fluorescence detected can be interpreted to indicate the presence of the variant and whether the variant is homogeneous or heterogenous. The samples prepped for SNP arrays then need to be DNA samples. 7.4.1.1 Examples: The 1000 genomes project is a large collection of SNP data array from many populations around the world and is available for download. 7.4.2 Gene expression arrays Gene expression arrays are designed to measure gene expression. They are designed to target and measure relative transcript abundance level. 7.4.2.1 Examples: refine.bio is the largest collection of publicly available, already normalized gene expression data (including gene expression microarrays). Getting started in gene expression microarray analysis (Slonim and Yanai 2009). Microarray and its applications (2012). Analysis of microarray experiments of gene expression profiling (Tarca, Romero, and Draghici 2006). 7.4.3 DNA methylation arrays DNA methylation can also be measured by microarray. To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively. A ratio of the fluorescence signal can be used to identify the relative abundance of the methylated and unmethylated versions of the sequence. Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite bisulfite sequencing (Booth et al. 2013). Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics. 7.5 General processing of microarray data After scanning, microarray data starts as an image that needs to be quantified, normalized and further corrected and edited based on the most current genome and probe annotation. As noted above, microarrays do not escape the base sequence biases that accompany most all genomic assays. The normalization methods you use ideally will mitigate these sequence biases and also make sure to remove probes that may be outdated or bind to multiple places on the genome. The tools and methods by which you normalize and correct the microarray data will be dependent not only on the type of microarray assay you are performing (gene expression, SNP, methylation), but most of all what kind of microarray chip design/platform you are using. 7.5.1 Examples Refine.bio describes their processing methods. Brainarray keeps up to date microarray annotation for all kinds of platforms 7.5.2 Microarray Platforms There are so many microarray chip designs out there designed to target different things. Three of the largest commercial manufacturers have ready to use microarrays you can purchase. You can also design microarrays to hit your own targets of interest. Here are full lists of platforms that have been published on Gene Expression Omnibus. Affymetrix platforms Agilent platforms. Illumina platforms. 7.6 Very General Microarray Workflow In the data type specific chapters, we will cover the microarray workflow and file formats in more detail. But in the most general sense, microarray workflows look like this, note that the exact file formats are specific to the chip brand and type you use (e.g. Illumina, Affymetrix, Agilent, etc.): 7.6.1 Microarray file formats 7.6.1.1 IDAT - intensity data file This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly. Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into this package to help you do that. For more on IDAT files. 7.6.1.2 DAT - data file This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels. For more on DAT files. 7.6.1.3 CEL This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file. For more on CEL files 7.6.1.4 CHP CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files. For more about CHP files. 7.7 General informatics files At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter: 7.7.0.1 BED - Browser Extensible Data A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the chrom, chromStart and chromEnd columns to start. A BED file might look like this: chrom chromStart chromEnd other_optional_columns chr1 0 1000 good chr2 100 3000 bad For more on BED files. 7.7.0.2 GFF/GTF General Feature Format/Gene Transfer Format A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data. You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version. A GFF file may look like this (borrowed example from Ensembl): 1 transcribed_unprocessed_pseudogene gene 11869 14409 . + . gene_id &quot;ENSG00000223972&quot;; gene_name &quot;DDX11L1&quot;; gene_source &quot;havana&quot;; gene_biotype &quot;transcribed_unprocessed_pseudogene&quot;; Note that it will be useful for annotating genes and what we know about them. For more about GTF and GFF files. 7.7.1 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. 7.7.2 Microarray processing tutorials: For the most common microarray platforms, you can see these examples for how to process the data: 7.7.2.1 General arrays Using Bioconductor for Microarray Analysis. 7.7.2.2 Gene Expression Arrays An end to end workflow for differential gene expression using Affymetrix microarrays. 7.7.2.3 DNA Methylation Arrays DNA Methylation array workflow. References "],["annotating-genomes.html", "Chapter 8 Annotating Genomes 8.1 Learning Objectives 8.2 What are reference genomes? 8.3 What are genome versions? 8.4 What are the different files? 8.5 Considerations for annotating genomic data 8.6 Resources you will need for annotation!", " Chapter 8 Annotating Genomes This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 8.1 Learning Objectives In this chapter, we are going to discuss methods that affect every genomic method and may take up the majority of your time as a genomic data analyst: Annotation. We know that the sequencing or array data is not useful on its own – for our human minds to comprehend it and apply it to something we need a tangible piece of information to be attached to it. This is where annotation comes in. At best annotation helps you and others interpret genomic data. At its worst, its a time consuming activity that, done incorrectly, can lead to erroneous conclusions and labeling. Proper annotation requires an understanding of how the annotation data you are using was derived as well as the realization that all annotation data is constantly changing and the confidence for these data are never 100%. Some organism’s genomes are better annotated than others but nearly all are at least somewhat incomplete. 8.2 What are reference genomes? Every individual organism has its own DNA sequence that is unique to it. So how can we compare organisms to each other? In some studies, sequencing data is obtained and the genome is built de novo (aka from scratch) but this takes a lot of time and computing power. So instead, most genomic studies use the imperfect method of comparing to a reference genome. Reference genomes are built from prior data and available online. They inherently have biases in them. For example, human genomes are generally not made from diverse populations but instead from mostly males of european descent. It is inherently bad for both ethical and scientific reasons to to have genome references that are too white. For more on the problems with reference genomes, read this. In summary, reference genomes are used for comparison and as a ‘source of truth’ of sorts, but its important to note that this method is biased and better alternatives need to be realized. 8.3 What are genome versions? If you are familiar with software development, or have used any app before, you’re familiar with software updates and releases. Similarly, the genome has updates and releases as continued cloning and assemblies of organisms teaches us more. In the image below we are showing an example of what a genome version may be noted as (note that different databases may have different terminology – here we are showing the Genome Reference Consortium). You may also notice on their website it shows the date the genome version was released and what was fixed. The details of how genome versions are fixed and released are not really of concern for your data analysis. This is merely to explain that genomes change and what is most important in your analysis is that: You choose one genome version and consistently use it in all your analyses. Choose a genome version that the rest of your field has generally had a consensus on and is also using. Generally this means sticking with major releases of a genome instead of always going with the latest version. Most databases will try to point you to their major release, so just stick with that. We will point you where you can find genome annotation for a lot of the major organisms. 8.4 What are the different files? Although we can’t walk you through every organism and database set up, we will walkthrough the files and structure of one example here. In the above screenshot, from Ensembl, it shows different organisms in the rows, but also a variety of different files across the columns. In this example, DNA reference to the DNA sequence of the organism’s genome, but cDNA refers to complementary DNA – aka DNA that has been reversed transcribed from RNA. If you are working with RNA data you may want to use the cDNA file. Whereas CDS files are referring to only coding sequences and ncRNA files are showing only non coding sequences. Most of these files are FASTA files. Gene sets are also their own annotation files called GTF or GFF files. Ensembl provides more detailed information about what these files contain, but briefly, each row is a feature and has information describing that feature such as genomic locations, the relevant feature type (gene, coding sequence, pseudogene, etc.), and the gene ID or name. For a reminder on what these different file types are see the previous chapter. Depending on the tool you are using, the data file and type you need will vary. Some tools have these data built in or are compatible with other packages that have annotation. If a tool automatically includes annotation within it, you will need to ensure that any additional tools you are using are also pulling from the same genome and version. Look into a tool’s documentation to find out what genome versions it is based on. If it doesn’t tell you at all, you don’t want to be using that tool. You cannot assume that cross genome analyses will translate. 8.4.1 How to download annotation files For another database example we’ll look at the human data on ENA’s servers. Note that if you see FTP that just means “Fast Transfer Protocol” and it just means its where you can get the files themselves. For more on computing lingo, you can take our Computing in Cancer Informatics course. There’s many ways you can download these files and they are described here. In summary: - If you don’t feel comfortable using command line, you can use the browser downloader for ENA here - If you are using command line to write a script, then you can write use the wget or curl instructions described here. Be sure to read the README files to understand what it is you are downloading. Also note that if you are working from a high power computing cluster or other online server, these annotation files may already be available to you. You don’t want to take up more computing resources by downloading extra files, so check with an administrator or informatics expert who also uses the cluster or cloud to check if the annotation files already exist in your workspace. 8.5 Considerations for annotating genomic data 8.5.1 Make sure you have the right file to start! Is the annotation from the right organism? You may think this is a dumb question, but its very critical that you make sure you have the genome annotation for the organism that matches your data. Indeed the author of this has made this mistake in the past, so double check that you are using the correct organism. Are all analyses utilizing coordinates from the same genome/transcriptome version? Genome versions are constantly being updated. Files from older genome versions cannot be used with newer ones (without some sort of liftover conversion). This also goes for transcriptome and genome data. All analysis need to be done using the same genomic versions so that is ensured that any chromosomal coordinates can translate between files. For example, it could be in one genome version a particular gene was said to be at chromosome base pairs 300 - 400, but in the next version its now been changed to 305 - 405. This can throw off an analysis if you are not careful. This type of annotation mapping becomes even more complicated when considering different splice variants or non-coding genes or regulatory regions that have even less confidence and annotation about them. 8.5.2 Be consistent in your annotations If at all possible avoid making cross species analyses - unless you are an evolutionary genomics expert and understand what you are doing. But for most applications cross species analyses are hopeful wishing at best, so stick to one organism. Avoid mixing genome/transcriptome versions. Yes there is liftover annotation data to help you identify what loci are parallel between releases, but its really much simpler to stick with the same version throughout your analyses’ annotations. 8.5.3 Be clear in your write ups! Above all else, not matter what you end up doing, make sure that your steps, what files you use, and what tool versions you use are clear and reproducible! Be sure to clearly link to and state the database files you used and include your code and steps so others can track what you did and reproduce it. For more information on how to create reproducible analyses, you can take our reproducibility in cancer informatics courses: Introduction to Reproducibility and Advanced Reproducibility in Cancer Informatics. 8.6 Resources you will need for annotation! 8.6.1 Annotation databases Ensembl EMBL-EBI UCSCGenomeBrowser NCBI Genomes download page 8.6.2 GUI based annotation tools UCSCGenomeBrowser BROAD’s IGV Ensembl’s biomart 8.6.3 Command line based tools 8.6.3.1 R-based packages: annotatr ensembldb GenomicRanges - useful for manipulating and identifying sequences. GO.db - Gene ontology annotation org.Hs.eg.db RSamtools A full list of Bioconductors annotation packages - contains annotation for all kinds of species and versions of genomes and transcriptomes. 8.6.3.2 Python-based packages: BioPython genetrack 8.6.4 More resources about genome annotation "],["dna-methods-overview.html", "Chapter 9 DNA Methods Overview 9.1 Learning Objectives 9.2 What are the goals of analyzing DNA sequences? 9.3 Comparison of DNA methods 9.4 How to choose a DNA sequencing method 9.5 Strengths and Weaknesses of different methods", " Chapter 9 DNA Methods Overview This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 9.1 Learning Objectives 9.2 What are the goals of analyzing DNA sequences? There are several larger goals behind DNA sequencing experiments ranging from assembling whole genomes, to identifying variation or performing a functional genomic analysis or comparative genomic study. Each of these has implications when studying disease. Assembling whole genomes: Because an organism’s genome determines how an organism develops and functions (NHGRI 2024), an important task in the genomics field is assembling the genome of an organism from sequencing reads. This assembly process attempts to reconstruct how the sequencing reads overlap or fit together (Schatz, Delcher, and Salzberg 2010; Li and Durbin 2024). Recent examples of genome assembly in the genomics field include a complete 3.055 billion-base pair sequence of the human reference genome which was published by the Telomere-to-Telomere (T2T) Consortium (2022), the T2T-CHM13 version (followed not long after by the complete sequence of the human Y chromosome (2023)). A goal of the field is to better capture human genetic diversity by creating a reference pangenome, assembled from multiple donors within the population (2024). Genome assemblies are an important part of genomics beyond human genomics research; there are reference gnomes available for most model organisms as well as many plants, animals, and pathogens, with more and more being published at a high frequency (Miller, Zimin, and Gordus 2023; Alonge et al. 2022; Gershman et al. 2023; Sistrom et al. 2016). These reference genomes each act as an extensive compilation of the observed DNA sequence of genes, regulatory elements, etc. and the related coordinate systems for these elements, such that, for the corresponding organism, sequencing reads from other experiments can be mapped or aligned to the reference in order to localize where that read was in the genome. In the case of cancer informatics, a recent approach utilized personalized genome assembly to more accurately detect tumor somatic mutations. This is likely to be an area of future research for application in precision medicine (Xiao et al. 2022; Ermini and Driguez 2024). Identifying variation: Variant caller software is used within the field of genomics to identify places where reads from a DNA sequencing experiment differ from a comparative reference genome sequence (NHGRI 2022). Variants may be as small as single nucleotide differences (single-nucleotide polymorphisms or SNPs) or much larger (50 base pairs or more) structural variation (SVs) such as duplications, deletions, insertions, inversions, translocations (Wong, Hudson, and McPherson 2011). (Shorter insertions or deletions are termed indels.) The SVs involving gains or losses in genomic DNA can lead to copy number variations (CNVs). Mutation and structural variants are very common in cancer as well as larger-scale catastrophic genomic rearrangements (C.-Z. Zhang and Pellman 2022). Overall, variants may be rare in a population or fairly common (Audano et al. 2019). Further, variants may be somatic or germline variants: germline variants are hereditary and will be passed down from parent to offspring; in the offspring, the variant will be present in every cell, while somatic variants are generally not hereditary and present only in some cells rather than every cell (Frost 2022). Because variation, specifically genetic diversity is a necessary part of a healthy species (“What Is Genetic Diversity and Why Does It Matter?” n.d.) and because variation, specifically mutations/variants may cause disease, identifying variation is a common goal in a DNA sequencing workflow. An example of research focusing on studying genetic diversity in humans is the 1000 Genomes Project which recently expanded its resource of sequenced genomes and in doing so discovered even more variation present in the population (Byrska-Bishop et al. 2022). Functional genomic analysis: Genomes contain more than just genes (the coding sequences that will be transcribed and translated into a protein); they also contain functional elements such as promoters, enhancers, or silencers that modulate the expression of genes (Kellis et al. 2014). Further, differential gene expression is the phenomenon by which cells with the same DNA sequence show different patterns of gene expression. Functional genomic analyses aim to better understand differential gene expression and the impact of genetic variation found in functional elements. For example, many human genetic variants associated with common traits and diseases are localized in or near known functional elements (Hindorff et al. 2009). These variants may impact gene expression due to either changes in transcription factor binding at that site, or resulting epigenetic changes, which are defined as chemical modifications of chromatin or nucleotides beyond the DNA sequence. Such epigenetic modifications, which include histone marks and DNA methylation, can alter DNA compaction and influence a functional element’s accessibility for transcriptional machinery (e.g., if the element isn’t accessible, transcription may not occur; while previously the element was accessible and the gene could be transcribed). In later sections, methods that study epigenetic modifications like chromatin accessibility, DNA methylation, or binding of specific proteins will be discussed. All of these methods support functional genomic analyses and are important for better understanding differential gene expression and the impact of genetic variants located in functional elements may have on disease occurrence. A somewhat recent and high profile example of a functional genomic analysis centers again on work from the T2T Consortium. Not only did they publish a new, complete reference genome, but they also studied the epigenetic landscape in the newly resolved regions of the genome and pointed to potential newly discovered functional elements in a region previously thought to be transcriptionally inactive (Gershman et al. 2022). Comparative genomics: A common saying in the genomics field is that structure determines function and conserved structure may be constrained such that there is an important function which needs to be conserved (Alföldi and Lindblad-Toh 2013). Further, similarities in structure may be due to shared ancestry through the processes of evolution; therefore, some comparative genomics studies aim to infer homology or an evolutionary relationship from structural similarity (Pearson 2013). More pertinent to the topics discussed previously, comparative genomics studies are also useful for identifying functional elements (J. Taylor et al. 2006) and variants associated with disease (e.g., by comparing the genomes of those with the disease and those without it and identifying differences) (Alföldi and Lindblad-Toh 2013; Eichler 2019). 9.3 Comparison of DNA methods There are four DNA sequencing methods discussed in this chapter. The above graph compares WGS, WXS, and Targeted gene sequencing. The last section compares all 4. Whole genome sequencing (WGS) Whole exome sequencing (WXS) Targeted gene sequencing DNA/SNP microarrays Compared to WXS and Targeted Gene Sequencing, WGS is the most expensive but requires the lowest depth of coverage to achieve 95% sensitivity. In other words, WGS requires sequencing each region of the genome (3.2 billion bases) 30 times in order to confidently be able to pick up all possible meaningful variants. (Sims et al. 2014) goes into more depth on how these depths are calculated. Alternatively, WXS is a more cost effective way to study the genome, focusing places in the genome that have open reading frames – aka generally genes that are able to be expressed. This focuses on enriching for exons and not introns so splicing variants may be missed. In this case, each gene must be sequenced 80-100x for sufficient sensitivity to pick up meaningful variants. In targeted gene sequencing, a panel of 50-500 regions of interest are selected. This technique is very applicable for studying a set of specific genes of interest at great depth to identify all varieties of mutations within those specific genes. These genes must be sequenced at much greater depth (&gt;500x) to confidently identify all meaningful variants. This page from Illumina also provides information regarding sequencing depth considerations for different modalities. Additional references: WGS: (Bentley et al. 2008) WES: (Clark et al. 2011) Targeted: (Bewicke-Copley et al. 2019) 9.4 How to choose a DNA sequencing method Before starting any sequencing method, you likely have a research question or hypothesis in mind. In order to choose a DNA sequencing method, you will need to consider a few items in balance of each other: 9.4.1 1. What region(s) of the genome pertain to your research question? Is this unknown? Can it be narrowed down to non-coding or coding regions? Is there an even more specific subset of interest? 9.4.2 2. What does your project budget allow for? Some methods are much more costly than others. Cost is not only a factor for the reagents needed to sequence, but also the computing power needed to process and store the data and people’s compensation for their work on the data. All of these costs increase as the amounts of data that are collected increase. For more information on computing decisions see our Computing in Cancer Informatics course. 9.4.3 3. What is your detection power for these variants? Detecting DNA variants is not simply a matter of yes or no, but a confidence level due to sequencing errors in data collection. Are the variants you are looking for very rare and/or small (single nucleotide or very few copy number differences)? If so you will need more samples and potentially more sequencing depth to detect these variants with confidence. 9.5 Strengths and Weaknesses of different methods Is not much known about DNA variants in your organism or disease in question? In this instance you may want to cast a large net to explore more variants by using WGS. If previous research has identified sections of the genome that are of interest to your research question, then it’s highly advisable to not sequence the entire genome with WGS methods. Not only will whole genome sequencing be more costly, but it will decrease your statistical power to discover true positive variants of interest and increase your chances of discovering false positive variants. This is because multiple testing correction needs to be applied in instances where many tests are being done currently. In this instance, the tests being performed are across the whole genome. If your research question does not pertain to non-coding regions of the genome or splicing, then its advisable to use WXS. Recall that only about 1-2% of the genome is coding sequences meaning that if you are uninterested in noncoding regions but still use WGS then 98-99% of your data will be uninteresting to you and will only serve to increase your chances of finding false positives or cost you a lot of funding. Not only does sequencing more of the genome take more money and time but it will be more costly in time and resources in terms of the computing power needed to analyze it. Furthermore, if you are able to narrow down even further what regions are of interest this would be better in terms of cost and detection abilities. A targeted sequencing panel or DNA microarray are ideal for assaying known groups of targets. DNA microarrays are the least costly of all the methods to identify DNA variants, but with both targeted sequencing and DNA microarray you will need to find or create a custom probe or primer set. Ideally a probe or primer set that hits your regions of interest already exists commercially but if not, then you will have to design your own – which also costs time and money. In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data. References "],["whole-genome-or-exome-sequencing.html", "Chapter 10 Whole Genome or Exome Sequencing 10.1 Learning Objectives 10.2 WGS and WGS Overview 10.3 Advantages and Disadvantages of WGS vs WXS 10.4 WGS/WXS Considerations 10.5 DNA Sequencing Pipeline Overview 10.6 Data Pre-processing 10.7 Commonly Used Tools 10.8 Data pre-processing tools 10.9 Tools for somatic and germline variant identification 10.10 Tools for variant calling annotation 10.11 Tools for copy number variation analysis 10.12 Tools for data visualization 10.13 Resources for WGS", " Chapter 10 Whole Genome or Exome Sequencing This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 10.1 Learning Objectives The learning objectives for this course are to explain the use and application of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES/WXS) for genomics studies, outline the technical steps in generating WGS/WXS data, and detail the processing steps for analyzing and interpreting WGS/WXS data. To familiarize yourself with sequencing methods as a whole, we recommend you read our chapter on sequencing first. 10.2 WGS and WGS Overview The difference between WGS and WXS sequencing is whether or not the open reading frames and thus coding regions are targeted in sequencing. WGS attempts to sequence the whole genome, while for WXS only exons with open reading frames are targeted for sequencing. Both of these methods can be massively beneficial for studying rare and complex diseases. Thus, whole genome sequencing is a technique to thoroughly analyze the entire DNA sequence of an organism’s genome. This includes sequencing all genes both coding and non-coding and all mitochondrial DNA. WGS is beneficial for identifying new and previously established variants related to disease and the regulatory elements of the genome including promoters, enhancers, and silencers. Increasingly non-coding RNAs have also been identified to play a functional role in biological mechanisms and diseases. In order to learn more about the non-coding regions of the genome, WGS is necessary. Alternatively whole exome sequencing is used to sequence the coding regions of an organism’s genome. Although non-coding regions can sometimes reveal valuable insights, coding regions can be a useful area of the genome to focus sequencing methods on, since changes in a protein coding sequence of the genome generally have more information known about them. Often protein coding sequences can have more clearly functional changes - like if a stop codon is introduced or a codon is changed to a predictable amino acid. This can more easily lead to downstream investigations on the functional implications of the protein affected. 10.3 Advantages and Disadvantages of WGS vs WXS We more thoroughly discuss how to choose DNA sequencing methods here in the previous chapter, but we will briefly cover this here. Alternatives to WGS include Whole Exome Sequencing (WES/WXS), which sequences the open reading frame areas of the genome or Targeted Gene Sequencing where probes have been designed to sequence only regions of interest. The main advantages of WGS include the ability to comprehensively analyze all regions of a genome, the ability to study structural rearrangements, gene copy number alterations, insertions and deletions, single nucleotide polymorphisms (SNPs), and sequencing repeats. Some disadvantages include higher sequencing costs and the necessity for more robust storage and analysis solutions to manage the much larger data output generated from WGS. 10.4 WGS/WXS Considerations Some important considerations for WGS/WXS include: What genome you are studying and the size of this genome. Included in this considerations is whether this genome has been sequenced before and you will have a “reference” genome to compare your data against or whether you will have to make a reference genome yourself. This bioinformatics resource provides a great overview of genome alignment. The depth of coverage for sequencing is an important consideration. The typical recommendation for WGS coverage is 30x, but this is on the lower side and many researchers find it does not provide sufficient coverage compared to 50x. Illumina has an infographic that explains this information The tissue source and whether genetic alterations were introduced during processing are important. Fixation for formalin-fixed paraffin embedded (FFPE) can introduce mutations/genetic changes that will need to be accounted for during data analysis. This page from Beckman addresses many of the questions researchers often have about utilizing FFPE samples for their sequencing studies. The library preparation method of DNA amplification via PCR is very important as PCR can often introduce duplicates that interfere with interpreting whether a mutant gene is truly frequent or just over amplified during sequencing preparation. Illumina provides a comparison of using PCR and PCR-free library preparation methods on their website. 10.4.1 Target enrichment techniques For WXS or other targeted sequencing specifically (so not relevant to WGS data), what methods were used to enrich for the targeted sequences? (Which is the entire exome in the case of general WXS) These methods are generally summarized into two major categories: Hybridization based and amplicon based enrichment. - [Hybridization based enrichment](https://www.paragongenomics.com/target-enrichment/). This includes a variety of widely used methods that we will broadly categorize in two groups: Array-based and In-solution: - [Array-based capture](https://en.wikipedia.org/wiki/Exome_sequencing#:~:text=Target%2Denrichment%20strategies-,Array%2Dbased%20capture,-In%2Dsolution%20capture) uses microarrays that have probes designed to bind to known coding sequences. Fragments that do not bind to these probes are washed away, leaving the sample with known coding sequences bound and ready for PCR amplification [@Hodges2007; @Turner2009]. - [In-solution capture](https://en.wikipedia.org/wiki/Exome_sequencing#In-solution_capture) has become more popular in recent years because it [requires less sample DNA than array-base capture](https://sequencing.roche.com/us/en/products/product-category/target-enrichment.html). To enrich for coding sequences, in-solution capture has a pool of custom probes that are designed to bind to the coding regions in the sample. Attached to these probes are beads which can be physically separated from DNA that is not bound to the probes (this should be the non-coding sequences) [@Mamanova2010]. - [PCR/Amplicon based enrichment](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/) requires even less sample than the other two strategies and so is ideal for when the amount of sample is limited or the DNA has been otherwise processed harshly (e.g. with paraffin embedding). Because the other two enrichment methods are done after PCR amplification has been done to the whole genomic DNA sample, its thought that this method of selective PCR amplification for enrichment can result in more uniformly amplified DNA in the resulting sample. However this is less suitable the more gene targets you have (like if you truly need to sequence all of the exome) since amplicons need to be designed for each target. Overall it is much more affordable of a method. There are several variations of this method that are [discussed thoroughly by @Singh2022](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/). 10.5 DNA Sequencing Pipeline Overview In order to create WGS/WXS data, DNA is first extracted from a specific sample type (tissue, blood samples, cells, FFPE blocks, etc.). Either traditional (involving phenol and chloroform) or commercial kits can be used for this first step. Next, the DNA sequencing libraries are prepared. This involves fragmenting the DNA, adding sequencing adapters, and DNA amplification if the input DNA is not of sufficient quantity. Recall that for WXS After sequencing, data is analyzed by converting and aligning reads to generate a BAM file. Many analysis tools will use the BAM file to identify variants, which then generates a VCF file. More information about sequencing and BAM and VCF file generation can be found here in the sequencing data chapter. 10.6 Data Pre-processing Raw sequencing reads are first transformed into a fastq file (more information about fastq files can be found here in the sequencing data chapter in the Quality Controls section. Then the sequencing reads are aligned to a reference genome to create a BAM file. This data is sorted and merged, and PCR duplicates are identified. The confidence that each read was sequenced correctly is reflected in the base quality score. This score must be recalibrated at this step before variants are called. A final BAM file is thus created. This can be used for future analysis steps include variant or mutation identification, which is outlined on the following slide. 10.7 Commonly Used Tools The following link provides the data analysis pipeline written by researchers in the NCI division of the NIH and provides a helpful overview of the typical steps necessary for WGS analysis. Here are many of the tools and resources used by researchers for analyzing WGS data. 10.8 Data pre-processing tools In most cases, all of these tools will be used sequentially to prepare the data for downstream mutational and copy number variation (CNV) analysis. Bedtools including the bamtofastq function, which is the first step in converting data off the sequencer to a usable format for downstream analysis Samtools including tools for converting fastq to BAM files while mapping genes to the genome, duplicate read marking, and sorting reads Picard2 including tools to covert fastq to SAM files, filter files, create indices, mark read duplicates, sort files, and merge files GATK is a comprehensive set of tools from the Broad Institute for analyzing many types of sequencing data. For pre-processing, the print read function is very beneficial for writing the reads from a BAM or SAM file that pass specific criteria to a new file 10.9 Tools for somatic and germline variant identification These tools are used to identify either somatic or germline mutations from a sequenced sample. Many researchers will often use a combination of these tools to narrow down only variants that are identified using a combination of these analysis algorithms. All of these mutation calling tools except SvABA can be used on both WGS and WXS data. Mutect2 This is a beneficial variant calling tool with functions including using a “panel of normals” (samples provided by the user of many normal controls) to better compare disease samples to normal and filtering functions for samples with orientation bias artifacts (FFPE samples) called F1R2, which is explained in the link above. Varscan 2 This is a helpful tool that utilizes a heuristic/statistic approach to variant calling. This means that it detects somatic CNAs (SCNAs) as deviations from the log-ratio of sequence coverage depth within a tumor–normal pair, and then quantify the deviations statistically. This approach is unique because it accounts for differences in read depth between the tumor and normal sample. Varscan 2 can also be used for identifying copy number alterations in tumor-normal pairs. MuSE This is a beneficial mutation calling tool when you have both tumor and normal datasets. The Markov Substitution Model for Evolution utilized in this tool models the evolution of the reference allele to the allelic composition of the tumor and normal tissue at each genomic locus. SvABA This tool is especially useful for calling insertions and deletions (indels) because it assembles aberrantly aligned sequence reads that reflect indels or structural variants using a custom String Graph Assembler. Indels can be difficult to detect with standard alignment-based variant callers. Strelka2 This is a small variant caller designed by Illumina. It is used for identifying germline variants in cohorts of samples and somatic variants in tumor/normal sample pairs. SomaticSniper SomaticSniper can be used to identify SNPs in tumor/normal pairs. It calculates the probability that the tumor and normal genotypes are different and reports this probability as a somatic score. Pindel Pindel is a tool that uses a pattern growth approach to detect breakpoints of large deletions, medium size insertion/inversion, tandem duplications. Lancet This is a newer variant calling tool that uses colored de Bruijn graphs to jointly analyze tumor and normal pairs, offering strong indel detection. More information about the processes used in this variant calling tool can be found here Researchers may want to create a consensus file based on the mutation calls using multiple tools above. OpenPBTA-analysis shows an open source code example of how you might compare and contrast different SNV caller’s results. For researchers who prefer GUI based platforms: Gene Pattern has a great set of variant based tutorials. GenePattern is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. 10.10 Tools for variant calling annotation These are beneficial for providing functional meaning to the mutational hits identified above. Annovar This is a helpful tool for annotating, filtering, and combining the output data from the above tools. It can be used for gene-based, region-based, or filter-based annotations. GENCODE This tool can be used to identify and classify gene features in human and mouse genomes. dbSNP This is a resource to look up specific human single nucleotide variations, microsatellites, and small-scale insertions and deletions. Ensembl This resource is a genome browser for annotating genes from a wide variety of species. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. 10.11 Tools for copy number variation analysis Similar to the mutation calling tools, many researchers will use several of these tools and investigate the overlapping hits seen with different copy number variant calling algorithms: GATK GATK has a variety of tools that can be used to study changes in copy numbers of genes. This link provides a tutorial for how to use the tools. AscatNGS These tools (allele-specific copy number analysis of tumors) are specific for WGS copy number variation analysis. They can be used to dissect allele-specific copy numbers of tumors by estimating and adjusting for tumor ploidy and nonaberrant cell admixture. TitanCNA This tool is used to analyze copy number variation and loss of heterozygosity at the subclonal level for both WGS and WXS data in tumors compared to matched normals. It accounts for mixtures of cell populations and estimates the proportion of cells harboring each event. The Ha lab has developed a snakemake pipeline to more easily use this tool. Ha et al. published a paper describing this tool in detail here gGNV This is a germline CNV calling tool that can be used on both WGS and WXS data. This tool has booth COHORT and CASE modes. COHORT mode is used when providing a cohort of germline samples where CASE mode is used for individual samples. More details about these modes are described in the link above. BIC-seq2 This tool is used to detect CNVs with or without control samples. The steps involved in this data processing tool include normalization and CNV detection. 10.12 Tools for data visualization These tools are often used in parallel to look at regions of the genome, develop plots, and create other relevant figures: OpenCRAVAT uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations. IGV IGV is an interactive tool used to easily visualize genomic data. It is available as a desktop application, web application, and JavaScript to embed in web pages. This application is very beneficial for visualizing both mutational and CNV data for WGS and WXS. IGV has many tutorials on YouTube that are helpful for using the tool to its full potential. Maftools Maftools is an R package that can be used to create informative plots from your WGS data output. It has tools to import both VCF files and ANNOVAR output for data analysis. Prism Prism is a widely used tool in scientific research for organizing large datasets, generating plots, and creating readable figures. WGS or WXS data regarding mutations and CNV can be used as input for creating plots with this tool. 10.13 Resources for WGS Online tutorials: Galaxy tutorials NCI resources Bioinformaticsdotca tutorial Papers comparing analysis tools: (Hwang et al. 2019) (Naj et al. 2019) (X. He et al. 2020) References "],["rna-methods-overview.html", "Chapter 11 RNA Methods Overview 11.1 Learning Objectives 11.2 What are the goals of gene expression analysis? 11.3 Comparison of RNA methods", " Chapter 11 RNA Methods Overview This chapter is in a beta stage. Some of it has been written with AI tools. If you wish to contribute, please go to this form or our GitHub page. 11.1 Learning Objectives 11.2 What are the goals of gene expression analysis? The goal of gene expression analysis is to quantify RNAs across the genome. This can signify the extent to which various RNAs are being transcribed in a particular cell. This can be informative for what kinds of activity a cell is undergoing and responding to. 11.3 Comparison of RNA methods There are three general methods we will discuss for evaluating gene expression. RNA sequencing (whether bulk or single-cell) allows you to catch more targets than gene expression microarrays but is much more costly and computationally intensive. Gene expression microarrays have a lower dynamic range than RNA-seq generally but are much more cost effective. Spatial transcriptomics is the newest method on the block and has the ability to relate gene expression to tissue regions and subpopulations. 11.3.1 Single-cell RNA-seq (scRNA-seq): Cost: scRNA-seq methods can be relatively expensive due to the need for specialized protocols and reagents. Droplet-based methods (e.g., 10x Genomics) are generally more cost-effective than full-length methods (e.g., SMART-seq) because they require fewer sequencing reads per cell. Experimental Goals: scRNA-seq is suitable when studying cellular heterogeneity and characterizing gene expression profiles at the single-cell level. It provides insights into cell types, cell states, and cell-cell interactions. Specific Requirements: scRNA-seq requires single-cell isolation techniques, and the choice of method depends on the desired cell throughput, desired coverage, and the need for full-length transcript information. 11.3.2 Bulk RNA-seq: Cost: Bulk RNA-seq is generally more cost-effective compared to scRNA-seq because it requires fewer sequencing reads per sample. The cost primarily depends on the sequencing depth required. Experimental Goals: Bulk RNA-seq is appropriate for analyzing average gene expression profiles across a population of cells. It provides information on gene expression levels and can be used for differential gene expression analysis. Specific Requirements: Bulk RNA-seq requires a sufficient quantity of RNA from the sample, typically obtained through RNA extraction and purification. 11.3.3 Gene Expression Microarray: Cost: Gene expression microarrays are usually less expensive compared to RNA-seq methods. The cost includes array production and hybridization. Experimental Goals: Microarrays are useful for profiling gene expression levels across a large number of genes in a cost-effective manner. They can be employed for differential gene expression analysis and identification of gene expression patterns. Specific Requirements: Microarrays require labeled cDNA or cRNA targets, and they are limited to the detection of known transcripts represented on the array platform. 11.3.4 Spatial Transcriptomics: Cost: Spatial transcriptomics methods can vary in cost depending on the technique used. Some methods involve additional steps and specialized equipment, making them relatively more expensive. Experimental Goals: Spatial transcriptomics allows the investigation of gene expression patterns within the context of tissue or cellular spatial organization. It provides spatial information on gene expression, enabling the identification of cell types and their interactions. Specific Requirements: Spatial transcriptomics requires intact tissue sections or samples, and the choice of method depends on factors such as desired spatial resolution, throughput, and compatibility with downstream analyses. In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data. "],["bulk-rna-seq-1.html", "Chapter 12 Bulk RNA-seq 12.1 Learning Objectives 12.2 Where RNA-seq data comes from 12.3 RNA-seq workflow 12.4 RNA-seq data strengths 12.5 RNA-seq data limitations 12.6 RNA-seq data considerations 12.7 Visualization GUI tools 12.8 RNA-seq data resources 12.9 More reading about RNA-seq data", " Chapter 12 Bulk RNA-seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 12.1 Learning Objectives 12.2 Where RNA-seq data comes from 12.3 RNA-seq workflow In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. In this chapter we will highlight some of the more popular RNA-seq tools, that are generally suitable for most experiment data but there is no “one size fits all” for computational analysis of RNA-seq data (Conesa et al. 2016). You may find tools out there that better suit your needs than the ones we discuss here. 12.4 RNA-seq data strengths RNA-seq can give you an idea of the transcriptional activity of a sample. RNA-seq has a more dynamic range of quantification than gene expression microarrays are able to measure. RNA-seq is able to be used for transcript discovery unlike gene expression microarrays. 12.5 RNA-seq data limitations RNA-seq suffers from a lot of the common sequence biases which are further worsened by PCR amplification steps. We discussed some of the sequence biases in the previous sequencing chapter. These biases are nicely covered in this blog by Mike Love and we’ll summarize them here: Fragment length: Longer transcripts are more likely to be identified than shorter transcripts because there’s more material to pull from. Positional bias: 3’ ends of transcripts are more likely to be sequenced due to faster degradation of the 5’ end. Fragment sequence bias: The complexity and GC content of a sequence influences how often primers will bind to it (which influences PCR amplification steps as well as the sequencing itself). Read start bias: Certain reads are more likely to be bound by random hexamer primers than others. Main Takeaway: When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way. 12.6 RNA-seq data considerations 12.6.1 Ribo minus vs poly A selection Most of the RNA in the cell is not mRNA or noncoding RNAs of interest, but instead loads of ribosomal RNA a. So before you can prepare and sequence your data you need to isolate the RNAs to those you are interested in. There are two major methods to do this: Poly A selection - Keep only RNAs that have poly A tails – remember that mRNAs and some kinds of noncoding RNAs have poly A tails added to them after they are transcribed. A drawback of this method is that transcripts that are not generally polyadenylated: microRNAs, snoRNAs, certain long noncoding RNAs, or immature transcripts will be discarded. There is also generally a worse 3’ bias with this method since you are selecting based on poly A tails on the 3’ end. Ribo-minus - Subtract all the ribosomal RNA and be left with an RNA pool of interest. A drawback of this method is that you will need to use greater sequencing depths than you would with poly A selection (because there is more material in your resulting transcript pool). This blog by Sitools Biotech does a good summary of the pros and cons of either selection method. 12.6.2 Transcriptome mapping How do you know which read belongs to which transcript? This is where alignment comes into play for RNA-seq There are two major approaches we will discuss with examples of tools that employ them. Traditional aligners - Align your data to a reference using standard alignment algorithms. Can be very computationally intensive. Traditional alignment is the original approach to alignment which takes each read and finds where and how in the genome/transcriptome it aligns. If you are interested in identifying the intracacies of different splices and their boundaries, you may need to use one of these traditional alignment methods. But for common quantification purposes, you may want to look into pseudo alignment to save you time. Examples of traditional aligners: STAR HISAT2 This blog compares some of the traditional alignment tools Pseudo aligners - much faster and the trade off for accuracy is often negligible (but as always, this is likely dependent on the data you are using). The biggest drawback to pseudoaligners is that if you care about local alignment (e.g. perhaps where splice boundaries occur) instead of just transcript identification then a traditional alignment may be better for your purposes. These pseudo aligners often include a verification step where they compare a subset of the data to its performance to a traditional aligner (and for most purposes they usually perform well). Pseudo aligners can potentially save you hours/days/weeks of processing time as compared to traditional aligners so are worth looking into. Examples of pseudo aligners: Salmon Kallisto Reference free assembly - The first two methods we’ve discussed employ aligning to a reference genome or transcriptome. But alternatively, if you are much more interested in transcript identification or you are working with a model organism that doesn’t have a well characterized reference genome/transcriptome, then de novo assembly is another approach to take. As you may suspect, this is the most computationally demanding approach and also requires deeper sequencing depth than alignment to a reference. But depending on your goals, this may be your preferred option. These strategies are discussed at greater length in this excellent manuscript by Conesa et al, 2016. 12.6.3 Abundance measures If your RNA-seq data has already been processed, it may have abundance measure reported with it already. But there are various types of abundance measures used – what do they represent? raw counts - this is a raw number of how many times a transcript was counted in a sample. Two considerations to think of: 1. Library sizes: Raw counts does not account for differences between samples’ library sizes. In other words, how many reads were obtained from each sample? Because library sizes are not perfectly equal amongst samples and not necessarily biologically relevant, its important to account for this if you wish to compare different samples in your set. 2. Gene length: Raw counts also do not account for differences in gene length (remember how we discussed longer transcripts are more likely to be counted). Because of these items, some sort of transformation needs to be done on the raw counts before you can interpret your data. These other abundance measures attempt to account for library sizes and gene length. This blog and video by StatQuest does an excellent job summarizing the differences between these quantifications and we will quote from them: Reads per kilobase million (RPKM) Count up the total reads in a sample and divide that number by 1,000,000 – this is our “per million” scaling factor. Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving you reads per million (RPM) Divide the RPM values by the length of the gene, in kilobases. This gives you RPKM. Fragments per kilobase million (FPKM) FPKM is very similar to RPKM. RPKM was made for single-end RNA-seq, where every read corresponded to a single fragment that was sequenced. FPKM was made for paired-end RNA-seq. With paired-end RNA-seq, two reads can correspond to a single fragment, or, if one read in the pair did not map, one read can correspond to a single fragment. The only difference between RPKM and FPKM is that FPKM takes into account that two reads can map to one fragment (and so it doesn’t count this fragment twice). Transcripts per million (TPM) Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK). Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor. Divide the RPK values by the “per million” scaling factor. This gives you TPM. TPM has gained a popularity in recent years because it is more intuitive to understand: When you use TPM, the sum of all TPMs in each sample are the same. This makes it easier to compare the proportion of reads that mapped to a gene in each sample. In contrast, with RPKM and FPKM, the sum of the normalized reads in each sample may be different, and this makes it harder to compare samples directly. 12.6.4 RNA-seq downstream analysis tools ComplexHeatmap is great for visualizations DESEq2 and edgeR are great for differential expression analyses. CTAT - Using RNA-seq as input, CTAT modules enable detection of mutations, fusion transcripts, copy number aberrations, cancer-specific splicing aberrations, and oncogenic viruses including insertions into the human genome. Gene Set Enrichment Analysis (GSEA) is a method to identify the coordinate activation or repression of groups of genes that share common biological functions, pathways, chromosomal locations, or regulation, thereby distinguishing even subtle differences between phenotypes or cellular states. Gene Pattern’s RNA-seq tutorials - an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. 12.7 Visualization GUI tools WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. Network Data Exchange (NDEx) is a project that provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge. 12.8 RNA-seq data resources ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. Refine.bio - a repository of uniformly processed and normalized, ready-to-use transcriptome data from publicly available sources. 12.9 More reading about RNA-seq data Refine.bio’s introduction to RNA-seq StatQuest: A gentle introduction to RNA-seq (Starmer 2017). A general background on the wet lab methods of RNA-seq (Hadfield 2016). Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation (M. I. Love, Hogenesch, and Irizarry 2016). Mike Love blog post about sequencing biases (M. Love 2016) Biases in Illumina transcriptome sequencing caused by random hexamer priming (Hansen, Brenner, and Dudoit 2010). Computation for RNA-seq and ChIP-seq studies (Pepke, Wold, and Mortazavi 2009). References "],["single-cell-rna-seq.html", "Chapter 13 Single-cell RNA-seq 13.1 Learning Objectives 13.2 Where single-cell RNA-seq data comes from 13.3 Single-cell RNA-seq data types 13.4 Single cell RNA-seq tools 13.5 Quantification and alignment tools 13.6 Downstream tools Pros and Cons 13.7 More scRNA-seq tools and tutorials 13.8 Visualization GUI tools 13.9 Useful tutorials 13.10 Useful readings", " Chapter 13 Single-cell RNA-seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 13.1 Learning Objectives 13.2 Where single-cell RNA-seq data comes from As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity. Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity. If your research questions require cell-level transcriptional information, single-cell RNA-seq will on interest to you. 13.3 Single-cell RNA-seq data types There are broadly two categories of single-cell RNA-seq data methods we will discuss. Full length RNA-seq: Individual cells are physically separated and then sequenced. Tag Based RNA-seq: Individual cells are tagged with a barcode and their data is separated computationally. Depending on your goals for your single cell RNA-seq analysis, you may want to choose one method over the other. (Material borrowed from (“Alex’s Lemonade Training Modules” 2022)). 13.3.1 Unique Molecular identifiers Often Tag based single cell RNA-seq methods will include not only a cell barcode for cell identification but will also have a unique molecular identifier (UMI) for original molecule identification. The idea behind the UMIs is it is a way to have insight into the original snapshot of the cell and potentially combat PCR amplification biases. 13.4 Single cell RNA-seq tools There are a lot of scRNA-seq tools for various steps along the way. In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting doublets (also known as duplets), and using this information to filter out data that is not trustworthy. Doublets are transcriptome data generated from two cells, and an undesired technical artifact when single cell RNA-seq workflows want data representing a single cell at a time. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses. Each step of this very general representation of a workflow can be conducted by a variety of tools. We will highlight some of the more popular tools here. But, to look through a full list, you can consult the scRNA-tools website. 13.5 Quantification and alignment tools This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. STAR (Dobin et al. 2013): Pros: Accurate alignment of RNA-seq reads to the genome. Can handle a wide range of RNA-seq protocols, including scRNA-seq. Provides read counts and gene-level expression values. Cons: Requires a significant amount of memory and computational resources. May be difficult to set up and run for beginners. HISAT2 (Kim, Langmead, and Salzberg 2015): Pros: Accurate alignment of RNA-seq reads to the genome. Provides transcript-level expression values. Supports splice-aware alignment. Cons: May require significant computational resources for large datasets. May not be as accurate as some other alignment tools. Kallisto bustools (Bray et al. 2016): Pros: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Requires less memory and computational resources than alignment-based methods. Cons: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates. Alevin/Salmon (Patro et al. 2017): Pros: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Supports both single-end and paired-end sequencing. Cons: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates. Cell Ranger (Zheng et al. 2017): Pros: Specifically designed for 10x Genomics scRNA-seq data, with optimized workflows for alignment and quantification. Provides read counts and gene-level expression values. Offers a streamlined pipeline with minimal input from the user. Cons: Limited options for customizing parameters or analysis methods. May not be suitable for datasets from other scRNA-seq platforms. 13.6 Downstream tools Pros and Cons Seurat: Pros: Has a wide range of functionalities for preprocessing, clustering, differential expression, and visualization. Can handle multiple modalities, including CITE-seq and ATAC-seq. Has a large and active user community, with extensive documentation and tutorials available. Cons: Can be computationally intensive, especially for large datasets. Requires some knowledge of R programming language. Scanpy: Pros: Written in Python, a widely used programming language in bioinformatics. Has a user-friendly interface and extensive documentation. Offers a variety of preprocessing, clustering, and differential expression methods, as well as interactive visualizations. Cons: May not be as feature-rich as some other tools, such as Seurat. Does not yet support multiple modalities. Monocle: Pros:Focuses on trajectory analysis, allowing users to explore developmental trajectories and cell fate decisions. Has a user-friendly interface and extensive documentation. Can handle data from multiple platforms, including Smart-seq2 and Drop-seq. Cons: May not be as feature-rich for clustering or differential expression analysis as some other tools. Requires some knowledge of R programming language. Monocle: Pros:Focuses on trajectory analysis, allowing users to explore developmental trajectories and cell fate decisions. Has a user-friendly interface and extensive documentation. Can handle data from multiple platforms, including Smart-seq2 and Drop-seq. Cons: May not be as feature-rich for clustering or differential expression analysis as some other tools. Requires some knowledge of R programming language. 13.6.1 Doublet Tool Pros and Cons DoubletFinder(McGinnis, Murrow, and Gartner 2020): Pros: Uses a machine learning approach to detect doublets based on transcriptome similarity. Can be used with a variety of scRNA-seq platforms. Offers a user-friendly interface and extensive documentation. Cons: Can be computationally intensive for large datasets. May require some knowledge of R programming language. Scrublet (Wolock, Krishnaswamy, and Huang 2019): Pros: Uses a density-based approach to detect doublets based on barcode sharing. Fast and computationally efficient, making it suitable for large datasets. Offers a user-friendly interface and extensive documentation. Cons:May not be as accurate as other methods, especially for low-quality data. Limited to 10x Genomics data. DoubletDecon (De Pasquale and Dudoit 2019): Pros: Uses a statistical approach to identify doublets based on the distribution of the number of unique molecular identifiers (UMIs) per cell. Can be used with different platforms and species. Offers a user-friendly interface and extensive documentation. Cons: May not be as accurate as other methods, especially for data with low sequencing depth or low cell numbers. Requires some knowledge of R programming language. It’s important to note that no doublet detection method is perfect, and it’s often a good idea to combine multiple methods to increase the accuracy of doublet identification. Additionally, manual inspection of the data is always recommended to confirm the presence or absence of doublets. 13.7 More scRNA-seq tools and tutorials AlevinQC Gene Pattern’s single cell RNA-seq tutorials - an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. Single Cell Genome Viewer For normalization scater TumorDecon can be used to generate customized signature matrices from single-cell RNA-sequence profiles. It is available on Github (https://github.com/ShahriyariLab/TumorDecon) and PyPI (https://pypi.org/project/TumorDecon/). 13.8 Visualization GUI tools WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. 13.9 Useful tutorials These tutorials cover explicit steps, code, tool recommendations and other considerations for analyzing RNA-seq data. Orchestrating Single Cell Analysis with Bioconductor - An excellent tutorial for processing single cell data using Bioconductor. Advanced Single Cell Analysis with Bioconductor - a companion book to the intro version that contains code examples. Alex’s Lemonade scRNA-seq Training module - A cancer based workshop module based in R, with exercise notebooks. Sanger Single Cell Course - a general tutorial based on using R. ASAP: Automated Single-cell Analysis Pipeline is a web server that allows you to process scRNA-seq data. Processing raw 10X Genomics single-cell RNA-seq data (with cellranger) - a tutorial based on using CellRanger. 13.10 Useful readings An Introduction to the Analysis of Single-Cell RNA-Sequencing Data (AlJanahi, Danielsen, and Dunbar 2018). Orchestrating single-cell analysis with Bioconductor (Amezquita et al. 2020). UMIs the problem, the solution and the proof (Smith 2015). Experimental design for single-cell RNA sequencing (Baran-Gale, Chandra, and Kirschner 2018). Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies (Lafzi et al. 2018). Comparative Analysis of Single-Cell RNA Sequencing Methods (Ziegenhain et al. 2017). Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq Systems (X. Zhang et al. 2019). Single cells make big data: New challenges and opportunities in transcriptomics (Angerer et al. 2017). Comparative Analysis of common alignment tools for single cell RNA sequencing (Brüning et al. 2021). Current best practices in single-cell RNA-seq analysis: a tutorial (Luecken and Theis 2019). References "],["spatial-transcriptomics-1.html", "Chapter 14 Spatial transcriptomics 14.1 Learning objectives 14.2 What are the goals of spatial transcriptomic analysis? 14.3 Overview of a spatial transcriptomics workflow 14.4 Spatial transcriptomic data strengths: 14.5 Spatial transcriptomic data weaknesses: 14.6 Tools for spatial transcriptomics 14.7 More tools and tutorials regarding spatial transcriptomics", " Chapter 14 Spatial transcriptomics This chapter has currently been written by ChatGPT and has not been verified by experts. We need help writing and reviewing it! If you wish to contribute, please go to this form or our GitHub page. 14.1 Learning objectives 14.2 What are the goals of spatial transcriptomic analysis? Spatial transcriptomics (ST) technologies have been developed as a solution to the lack of spatial context in single cell transcriptomics (scRNA-seq) data (Rao et al. 2021; Ospina, Soupir, and Fridley 2023). There is a diversity of ST methods, however all have in common two features: Multiple measurements of gene expression and the locations within the tissue where those gene expression measurements were taken. Data analysis of ST data requires integration of those two components, and it’s primary goal is to characterize gene expression patterns within the tissue or cellular context. The ability to quantify gene expression at different locations within the tissue is of tremendous value to understand the functional variation of different tissue regions, domains, or niches. It also places cell-cell communication in the context of cell neighborhoods, which ultimately facilitates a deeper understanding of cell and tissue biology, but also enables practical applications such as discovery of novel drug targets for complex diseases such as cancer (Dries et al. 2021; Williams et al. 2022). Following, are some of the specific goals that a study using ST could achieve: Describe tissue-specific cellular neighborhoods of cell types and cell type sub-populations: Although scRNA-seq continues to be a powerful method to assign biological identities to a mixture of cells, integrated analysis of ST combined with scRNA-seq adds crucial information to cell phenotypes by describing the neighborhoods where cells occur (Longo et al. 2021). Many methods to phenotype ST data are available, with most of them relying on the availability of a curated (scRNA-seq) cell type reference. Once cell identities have been determined, clustering or spatial statistics can be applied to describe the composition of tissue niches or domains. The explosion of ST data has resulted on novel and comprehensive tissue- or disease-specific atlases, not only describing the cell types within organs, but also the functional cell-cell relationships that result from spatial organization (e.g., Guilliams et al. (2022); Wu et al. (2021)). Uncover spatially regulated biological processes: With ST data, there comes the ability to detect genes or gene pathways that are expressed in specific areas within tissues (i.e., spatially-restricted expression). Detecting genes with spatially-restricted expression is key to achieve further understanding of specific biological processes, such as tissue gradients, cell differentiation, or signaling pathways. For example, cancer researchers are now able to study signaling pathways restricted to the tumor-stroma interface (Hunter et al. 2021), which could lead to the discovery of mechanisms representing cancer vulnerabilities resulting from interactions between the tumor and stroma cells. Investigate cell-cell interactions: From basic to applied tissue biology research, the study of cell-cell interactions is of high interest, especially the interactions that occur via ligand-receptor pairs. The construction of comprehensive databases of ligand-receptor interactions has been possible due the large amounts of single-cell data sets produced by researchers. A major contribution of ST to the study of tissue biology is the addition of the spatial context to previously identified ligand-receptor interactions. Because single-cell RNA-seq requires physical separation of cells, current ligand-receptor databases represent hypotheses which ST can help to address by using models of spatial co-localization, enabling in-situ examination of cell-cell interactions and communication (Raredon et al. 2023; X. Wang, Almet, and Nie 2023). Integrate imaging data: Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., Ravi et al. (2022)). The opposite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., Hu et al. (2021); Xu et al. (2022); Tan et al. (2020)). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy. Identify biomarkers and drug targets: The use of ST allows the exploration of tissue niche-specific expression patterns and gene pathway analysis. This exploration can lead to generation of hypotheses about potential biomarkers for specific tissue functions or disease states. Furthermore, the molecular interactions predicted using scRNA-seq (e.g., ligand-receptor), can now be put in context of the larger tissue architecture using ST data. The spatial context of these interactions will likely boost the identification of novel drug targets, as well as improved understanding of current therapies (Lyubetskaya et al. 2022; L. Zhang et al. 2022). 14.3 Overview of a spatial transcriptomics workflow There is a large diversity in approaches to spatially profile tissues. Some ST technologies allow profiling at coarse cellular resolution, where regions of interest (ROIs) are usually identified by a pathologist. These ROIs may include tens of cells up to few hundreds (e.g., GeoMx Bergholtz et al. (2021)). Smaller ROI sizes can be found in other technologies such as Visium, where ROIs of 55uM of diameter (or “spots”) often contain no more than 10 cells (https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data). For finer cellular resolution, technologies such as MERFISH, SMI, or Xenium, among others, can measure gene expression at individual cells (Yue et al. 2023). In general, there is a trade-off between the cellular resolution and molecular resolution, as the number of quantified genes and RNA molecules is lower in single-cell level spatial technologies compared to those at the ROI or spot level. In single-cell ST, often a panel of hundreds of genes is quantified, while in “mini-bulk” (ROI/spot) ST, it is possible to genes at the whole transcriptome level. In addition to the differences in cellular and molecular, there are fundamental differences in the chemistry used to count the RNA transcripts in the tissue (N. Wang et al. 2021; Yue et al. 2023). Capture or hybridization of RNA followed by sequencing, or fluorescent imaging are two of the most common techniques used in ST methods. Because of large diversity in resolution and chemical procedures among ST technologies, data collection workflows are equally diverse. Finally, each study poses specific questions that cannot be addressed with traditional scRNA-seq pipelines, requiring customized workflows. Some of the commonalities in the workflows are presented here: Sample preparation: The preparation of a tissue sample will depend largely on the specific ST technology to be used. In general, this involves obtaining the tissue of interest in the form of a thin slice from a fresh frozen biopsy or a paraffin embedded tissue block. Tissue slices are generally about five to 10 micron of thickness. Given the instability of RNA molecules, the samples originating the tissue slices should be properly preserved and stabilized to maintain the integrity of RNA molecules. Many ST technologies are compatible with tissue microarrays (TMAs). Capture or hybridization of RNA molecules: In this step, the tissue sample is typically placed on a solid substrate, such as regular positively charged glass slides or vendor-designed slides. The latter category include spatially barcoded slides. (e.g., Visium (Ståhl et al. 2016) ), where RNA capture probes are contained in microscopic spots arranged in arrays or grids. The use of positively charged slides are used in technologies using in-situ sequencing or imaging-based methods, however, capture-based methods like GeoMx also employ this type of slide. Each method entails specific considerations. An example of these considerations include optimization of tissue permeabilization in Visium slides to release the RNA molecules. In the case of imaging-based methods, RNA molecules are hybridized with fluorescent probes that uniquely identify each RNA species [e.g., SMI (S. He et al. 2022), MERFISH (M. Zhang et al. 2021) ]. RNA quantification: The method used to count the number of captured or hybridized RNA molecules greatly varies from technology to technology. Capture methods often involve release of the RNA molecules from the tissue or slide, followed by library preparation, amplification, next generation sequencing, and read mapping to a reference genome. In this case, libraries are spatially multiplexed, whereby barcodes indicate the spatial location originating the captured RNA molecules. In imaging-based methods, segmentation is required to delineate the cell borders. Then, coded fluorescent probes are counted within each segmented cells. Data quality control and pre-processing: As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriptomics data typically contain an excess of zeroes and high gene dropout (Zhao et al. 2022). Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercised to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly (Ospina, Soupir, and Fridley 2023). In imaging-based methods, the area of cells can be used to detect “doublets” generated during image segmentation. Once filtering has been performed, gene count normalization and transformation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform (Hafemeister and Satija 2019) can be also used. Visualization: Similar to scRNA-seq data, dimension reduction methods such as the Uniform Manifold Approximation and Projection (UMAP) are key to visualize the heterogeneity of the data set. Nonetheless, given the additional modality provided by the spatial coordinates, spatial gene expression heatmaps can be generated, which can be compared against the imaging data (e.g., H&amp;, IHC, mIF) to gain further insights into overall tissue architecture. Clustering and cell/tissue domain phenotyping: There is a plethora of clustering approaches, ranging from employed in scRNA-seq analysis (e.g., Louvain) to novel neural network classification. Some methods take advantage of the spatial location information and/or tissue image to inform clustering. Compared to clustering, cell/domain phenotyping is an area of even more active development, within the majority of methods relying on the use of a comprehensive single-cell, tissue specific atlas from which cell types (i.e., “labels”) are obtained. Canonical marker-based phenotyping is still widely used, and in many cases unavoidable to identify specific cell populations. general, it is advisable to use the expert validation of a tissue biologist or pathologist to ascertain if clustering and phenotyping are capturing the tissue architecture adequately. 14.4 Spatial transcriptomic data strengths: Preservation of the spatial context: Spatial transcriptomics allows the investigation of gene expression patterns, cell types, and their interactions within the context of tissue spatial organization. Integration with imaging data: Spatial transcriptomics provides an additional data modality in the form of imaging data, such as histological images or fluorescence microscopy. This integration enhances the interpretation of spatial transcriptomic data by correlating gene expression patterns with tissue morphology and specific cellular structures. Discovery of novel cell-cell interactions and signaling pathways: By examining gene expression profiles in the spatial context, higher accuracy in the identification of novel cell-cell interactions and signaling pathways is obtained. Pairs of interacting genes can be identified by studying their level of co-localization (i.e., expressed in the same regions). Exploration of spatially regulated biological processes: Spatial transcriptomics enables the investigation of biological processes, such as spatial expression gradients or developmental processes occurring in specific regions. It provides insights into spatially restricted gene expression patterns associated with tissue patterning, morphogenesis, or cellular differentiation. Hypothesis generation and biomarker discovery: Spatial transcriptomic analysis can help in the generation of hypotheses and the identification of potential biomarkers related to specific tissue functions, regions, or disease states. By linking gene expression patterns to tissue organization and pathology, spatial transcriptomics facilitates the discovery of spatially restricted gene signatures and potential diagnostic or prognostic markers. 14.5 Spatial transcriptomic data weaknesses: Trade-off between spatial resolution and molecular resolution: Spatial transcriptomic techniques that provide whole transcriptome level information measure expression at the “mini-bulk” level (spots or ROIs), with each mini-bulk sample containing a collection of cells. Conversely, single-cell ST provide expression for a panel of genes (hundreds to a few thousands of genes). In addition, obtaining fine-grained spatial information may be challenging, especially in complex tissues or samples with high cellular density. Technical variability and experimental artifacts: Spatial transcriptomic analysis involves multiple experimental steps, including tissue processing, capture/hybridization, and sequencing/imaging. Each step introduces technical variability and potential experimental artifacts, which can impact the accuracy and reproducibility of the results. Controlling and minimizing these sources of variation is crucial but can be challenging. Zero excess and limited coverage of transcripts: Since most ST techniques use probes to capture of hybridize RNA transcripts, the resulting data may contain biases in the representation of certain RNA molecules. Additionally, spatial transcriptomic methods may have limitations in capturing certain RNA species or low-abundance transcripts, leading to a large portion of genes not being detected and contribution to zero-count excess. Complex Data Analysis: Analyzing spatial transcriptomic data requires advanced computational methods and expertise. The complexity of the data and the need for specialized bioinformatics tools and pipelines can pose challenges, particularly for researchers without extensive computational skills. Validation and integration challenges: Spatial transcriptomic analysis generates hypotheses and provides spatially resolved gene expression information. However, validating the functional significance of identified gene expression patterns or cellular interactions may require additional experimentation. Integrating spatial transcriptomic data with other omics data or imaging modalities can also be complex and may require careful data integration strategies. Cost and time considerations: Spatial transcriptomic analysis can be relatively expensive and time-consuming compared to traditional transcriptomic techniques. The specialized protocols, reagents, and instrumentation required can add to the cost of the analysis. Moreover, the data generation and analysis processes can be time-intensive, which may limit the scalability of studies involving large sample sizes. 14.6 Tools for spatial transcriptomics 14.6.1 Data processing: 14.6.1.1 Space Ranger Pros: Space Ranger is a software package developed by 10x Genomics specifically for processing and analyzing spatial transcriptomics raw data generated by their platform (Visium). It provides a streamlined workflow for processing raw data, including image registration, assignment of read counts to spots, and counting transcripts. Outputs from Space Ranger are commonly the input of many other ST analytical software. Cons: Space Ranger has been designed to process only 10x Genomics data. The software does not provide methods to extract insights, which is accomplished by integration with other analytical suites. Requires knowledge of command line use. 14.6.1.2 GeomxTools Pros: The GeomxTools R package has been designed to take outputs from the GeoMx Digital Spatial Profiler (DSP) platform. The package includes methods to use raw .dcc files and .pkc probe set files to generate count matrices per ROI. Support for normalization and transformation of counts are also included in GeomxTools. Cons: GeomxTools has been designed to process GeoMx DSP data outputs. Requires knowledge of R programming. 14.6.2 Data exploration: 14.6.2.1 Seurat Pros: Seurat is a widely used R package in single-cell data, with expanded capabilities to analyze ST data from multiple platforms. Seurat features direct integration with outputs from Space Ranger, MERSCOPE, CosMx-SMI, among others. It provides a variety of functions for data pre-processing, dimensionality reduction, clustering, and visualization. Seurat has a large user community, extensive documentation, and tutorials, making it accessible to researchers. Cons: Seurat can be memory-intensive, particularly when working with large data sets. It requires familiarity with R programming and bioinformatics concepts for effective use. Overall, methods in Seurat are the same methods applied to non-spatial scRNA-seq data. 14.6.2.2 Squidpy Pros: Scanpy is a Python-based library specifically designed for single-cell and ST analysis. It offers a range of functionalities for data pre-processing, clustering, trajectory analysis, and visualization. Scanpy is known for its scalability, efficiency, and flexibility. It integrates well with other Python libraries and frameworks, making it suitable for integration with other analysis pipelines. Some of the statistical methods in Squidpy implicitly make use of the spatial coordinates to detect patterns. Cons: Similar to Seurat, Scanpy requires some familiarity with Python programming and bioinformatics concepts. Users without prior programming experience may need to invest time in learning Python. 14.6.2.3 Giotto Pros: The analytical suite Giotto in a collection of methods to study spatial gene expression, agnostic to the platform used to generate the data. It allows users to perform data pre-processing, clustering, visualization, detection of spatially variable genes, and expression co-localization analysis. Computationally intensive analysis can be conducted in the cloud via integration with Terra.bio or locally using a Docker container. Some of the statistical methods in Giotto implicitly make use of the spatial coordinates to detect patterns. Cons: Requires some familiarity with R, as well as bioinformatics and spatial statistics concepts. Installation requires setting up Python, as some modules use that language. 14.6.2.4 spatialGE and spatialGE-web Pros: The spatialGE analysis suite allows users to study STdata form multiple platforms, including methods for pre-processing, clustering/domain detection, spatially variable genes, and functional analysis via detection of gene expression gradients and/or gene set enrichment spatial patterns. All the functionality of the R package has been implemented on a point-and-click web application requiring no coding experience and email notifications when analyses are completed. Statistcial methods in spatialGE implicitly take into account the spatial coordinates during calculations. Cons: Use of the spatialGE R package requires familiarity with the language. The spatialGE web application by-pass the need of R coding, however computationally-intensive methods can take time to complete. 14.6.2.5 Loupe Pros: The Loupe browser is a point-and-click tool for exploration of both non-spatial scRNA-seq and ST. Loupe takes Visium outputs and allows visualization of gene expression, clustering, and detection of differentially expressed genes. The tool also allows for easy registration and comparative analysis of Visium imaging and expression data. Cons: Loupe allows basic exploration of the data. To perform functional-level analysis of ST data, the use of additional tools might be required. 14.6.2.6 ST Pipeline Pros: ST Pipeline is a bioinformatics pipeline developed by the Spatial Transcriptomics consortium. It provides a complete workflow for ST data analysis, including pre-processing, normalization, spot detection, and visualization. ST Pipeline supports various spatial transcriptomic platforms, making it versatile. Cons: ST Pipeline requires familiarity with Python, command-line, and Linux environments. Users may need to invest time in setting up the pipeline and configuring parameters based on their specific datasets and platforms. 14.6.2.7 semla Pros: The semla R package is a bioinformatics pipeline enabling pre-processing, visualization, spatial statistics, and image integration of ST data. The package provides integration with Seurat. Cons: ST Pipeline requires familiarity with R. 14.6.3 Clustering/tissue domain identification: 14.6.3.1 SpaGCN Pros: The SpaGCN Python package performs prediction of tissue domains implicitly taking into account the spatial coordinates and optionally assisted by colors in the image data. The gene expression, coordinate, and image data are processed via graph convolutional networks (GCN) to find common patterns between the modalities. Based on predicted domains, SpaGCN can identify gene or collection of genes (meta genes) that are uniquely expressed in the domains. SpaGCN allows analysis of multiple ST technologies. Cons: SpaGCN requires familiarity with Python and basic data frame processing. Some understanding of GCNs and parameters involved in calculations is advisable. 14.6.4 Spatially variable gene identification: 14.6.4.1 SpatialDE Pros: SpatialDE is a Python package designed for detecting spatially variable genes from ST data using non-parametric statistics. SpatialDE intergrates the spatial coordinates and image data to identify genes or group of genes showing spatial expression aggregation. The package can analyze data from multiple ST platforms. Cons: SpatialDE requires familiarity with Python programming. 14.6.4.2 SPARK and SPARK-X Pros: The SPARK methods allows scalable detection of genes showing spatial patterns. The tests are performed via generalized linear models and spatial autocorrelation matrix estimation. The SPARK implementation allows scalabilty and computing efficiency. Cons: The SPARK methods require familiarity with Python programming. Some familiarity with spatial statistics is advisable. 14.6.4.3 SpaceMarkers Pros: The SpaceMarkers approach detects sets of genes with evidence of spatial co-expression. Kernel smoothing is used to model the weight of expression of a gene taking into account neighboring areas. Cons: Requires familiarity with R programming. The method has been tested in Visium data. 14.6.5 Deconvolution/phenotyping: 14.6.5.1 SPOTlight Pros: The SPOTlight algorithm takes advantage of robust non-negative matrix factorization (NMF) to define transcriptomic profiles from an annotated scRNA-seq reference. The transcriptomic profiles are transferred to the spatial transcriptomics data using non-negative least squares regression. Instead of providing a single category for “mini-bulk” data (e.g., Visium), SPOTlight features piecharts to describe the cell type composition within each mini-bulk sample (e.g., spot). Cons: Requires some familiarity with R programming. The method has been tested in Visium data. As with most deconvolution methods, accurate identification of cell types highly relies on a well-annotated scRNA reference. 14.6.5.2 STdeconvolve Pros: The STdeconvolve algorithm uses latent dirichlet allocation (LDA) to define transcriptomic profiles or topics on the ST data. The topics are assigned a biological identity (e.g., cell type, tissue domain) using gene set enrichment of marker-based phenotyping. The topics are presented as proportions in “mini-bulk” data (e.g., Visium), where pie charts describe the cell type/domain composition within each mini-bulk sample (e.g., spot). STdeconvolve is one of very few reference-free ST deconvolution methods. Cons: Requires some familiarity with R programming. The method has been mostly tested in Visium data. For MERFISH data, requires aggregation into spots. 14.6.5.3 InSituType Pros: InSituType is a cell phenotyping algorithm designed for CosMx-SMI data but applicable to other single-cell ST data. InSituType can transfer cell types from an annotated scRNA-seq data set, or run reference-free unsupervised clustering to detect cell populations. In addition, immunofluorescence data accompanying SMI data sets can be used to inform gene expression deconvolution. InSituType can phenotype large quantities of cells within reasonable time. Cons: InSituType assumes cell populations can be defined via cluster centroids. Thus, deconvolution can be affected when samples contain cells with intermediate phenotypes or if technical/background noise is prevalent. Requires familiarity with R programming. 14.6.5.4 SpatialDecon Pros: The SpatialDecon algorithm implements log-normal regression to alleviate the effects of ST data skewness in the prediction of cell types. The method is analogous to estimation of cell types proportions in bulk RNAseq to “mini-bulk” ROIs or spots in GeoMx and Visium experiments respectively. Hence, the method assumes cell type heterogeneity within the ROIs or spots. In the case of GeoMx experiments, SpatialDecon takes advantage of nuclei counts to provide absolute cell type counts within each ROI. The package includes pre-built cell type signature matrices for several tissue types, but scRNA references can be used to create custom signatures. Cons: Requires familiarity with R programming. 14.6.6 Cell communication: 14.6.6.1 CellChat Pros: CellChat is an algorithm to infer cell communications via ligand-receptor interactions. CellChat was designed for non-spatial scRNA data, however, a recent implementation has been included to account for distances between cells in ST experiments. The package includes a comprehensive ligand-receptor data base which is queried after quantification of probability of interaction between two given cell types. Cons: Requires familiarity with R programming. The spatial implementation of CellChat has been tested on Visium data. 14.7 More tools and tutorials regarding spatial transcriptomics Analysis, visualization, and integration of spatial datasets with Seurat Sheffield Bioinformatics tutorial for spatial transcriptomics Theis Lab SCOG workshop materials for spatial transcriptomics Visualization, domain detection, and spatial heterogeneity with spatialGE References "],["chromatin-methods-overview.html", "Chapter 15 Chromatin Methods Overview 15.1 Learning Objectives 15.2 Why are people interested in chromatin? 15.3 What kinds of questions can chromatin answer? 15.4 Comparison of technologies", " Chapter 15 Chromatin Methods Overview This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. In its existing form, this chapter has been written with AI and still needs further verification by experts. 15.1 Learning Objectives 15.2 Why are people interested in chromatin? Chromatin plays a crucial role in regulating gene expression, which is essential for a wide range of biological processes. It is the complex of DNA and proteins that make up the structure of chromosomes in the nucleus of a cell. The DNA in chromatin is packaged around histone proteins in a way that can either promote or inhibit access to the DNA by other proteins that control gene expression. Specifically, chromatin structure can affect the ability of transcription factors and RNA polymerase to bind to and transcribe genes. Changes in chromatin structure can lead to changes in gene expression, which can have profound effects on cell function and development. For example, chromatin remodeling is a key step in cell differentiation, during which cells become specialized and take on specific functions. Dysregulation of chromatin structure can also lead to the development of diseases, such as cancer, in which aberrant gene expression contributes to uncontrolled cell growth and proliferation. Therefore, understanding the mechanisms that regulate chromatin structure and function is crucial for advancing our understanding of cellular processes, disease development, and potential therapies. This is why chromatin research has become a major area of focus in molecular biology and genomics research. 15.3 What kinds of questions can chromatin answer? How are genes turned on and off in response to developmental cues or environmental stimuli? What are the mechanisms by which chromatin structure is altered during cell differentiation and development? How do epigenetic modifications, such as DNA methylation and histone modifications, affect chromatin structure and gene expression? How does chromatin structure influence the binding of transcription factors and other regulatory proteins to specific regions of the genome? How is chromatin structure altered in diseases such as cancer, and how can this knowledge be used to develop new therapies? How can we manipulate chromatin structure to selectively activate or repress specific genes, and what are the potential applications of such approaches? 15.3.1 Chromatin is involved in a variety of biological processes: Gene expression: Chromatin structure and organization play a crucial role in regulating gene expression. The packaging of DNA around histone proteins can either promote or inhibit access to the DNA by other proteins that control gene expression. DNA replication and repair: Chromatin structure can also affect DNA replication and repair. For example, histone modifications and chromatin remodeling can facilitate access to DNA replication and repair machinery. Epigenetic regulation: Epigenetic modifications, such as DNA methylation and histone modifications, can be stably inherited and play a critical role in the regulation of gene expression. Cell differentiation: Chromatin structure is dynamically regulated during cell differentiation and plays a key role in determining cell fate and function. Development: Chromatin structure also plays an important role in the regulation of developmental processes, such as morphogenesis and organogenesis. Disease: Dysregulation of chromatin structure and function is associated with a wide range of diseases, including cancer, neurodegenerative disorders, and developmental disorders. 15.4 Comparison of technologies 15.4.1 ATAC-seq: ATAC-seq (Assay for Transposase Accessible Chromatin using sequencing) is a technique that uses transposases to fragment DNA and insert sequencing adapters into accessible chromatin regions. The DNA fragments are then sequenced to identify regions of open chromatin. This technique is widely used to study the epigenetic regulation of gene expression. 15.4.1.1 When to use ATAC-seq: When you want to study the epigenetic regulation of gene expression. When you want to identify open chromatin regions associated with regulatory elements such as enhancers and promoters. When you want to study various cell types and tissues, including difficult-to-access cell types. 15.4.1.2 Advantages: ATAC-seq is a simple and cost-effective technique that requires a low amount of starting material. It allows the identification of open chromatin regions, which are usually associated with regulatory elements such as enhancers and promoters. ATAC-seq can be used to study various cell types and tissues, including difficult-to-access cell types. 15.4.1.3 Disadvantages: ATAC-seq can have high background noise due to non-specific cleavage of chromatin. It may miss lowly accessible regions due to a bias towards highly accessible regions. It is difficult to identify the specific regulatory elements that are associated with open chromatin regions. 15.4.2 Single-cell ATAC-seq: Single-cell ATAC-seq is a technique that combines single-cell sequencing and ATAC-seq to identify open chromatin regions in individual cells. This technique allows the study of epigenetic heterogeneity between cells and the identification of cell-specific regulatory elements. 15.4.2.1 When to use single-cell ATAC-seq: When you want to study the epigenetic heterogeneity between cells and identify cell-specific regulatory elements. When you want to identify rare cell types or rare cell states that may be missed by bulk techniques. When you want to study the epigenetic dynamics of cells in response to environmental changes. 15.4.2.2 Advantages: Single-cell ATAC-seq allows the identification of open chromatin regions in individual cells, which provides cell-specific epigenetic information. It can identify rare cell types and rare cell states that may be missed by bulk techniques. It can be used to study the epigenetic dynamics of cells in response to environmental changes. 15.4.2.3 Disadvantages: Single-cell ATAC-seq can have a higher level of technical noise due to the low amount of starting material. It can be challenging to obtain high-quality single-cell suspensions from tissues. It can be difficult to analyze the large amount of data generated by single-cell sequencing techniques. 15.4.3 ChIP-seq: ChIP-seq (Chromatin Immunoprecipitation sequencing) is a technique that uses antibodies to isolate specific DNA-protein complexes, such as transcription factors or histone modifications. The DNA fragments associated with the protein complexes are then sequenced to identify the genomic regions that are bound by the protein. 15.4.3.1 Advantages: ChIP-seq allows the identification of specific protein-DNA interactions, which provides information on the regulation of gene expression. It can be used to study the epigenetic changes associated with specific cellular processes, such as differentiation or development. ChIP-seq can identify the binding sites of transcription factors, which can be used to identify regulatory elements such as enhancers and promoters. 15.4.3.2 Disadvantages: ChIP-seq requires a high amount of starting material and can be costly. It can have a high level of background noise due to non-specific binding of antibodies. It can be challenging to perform 15.4.4 CUT&amp;RUN CUT&amp;RUN (Cleavage Under Targets &amp; Release Using Nuclease) is a relatively new genomic method that involves the targeted cleavage of DNA by a specific antibody or protein of interest, followed by the release and sequencing of the DNA fragments. The CUT&amp;RUN method was developed as a more streamlined alternative to the ChIP-seq (Chromatin Immunoprecipitation sequencing) method, which involves a more complex series of steps Skene and Henikoff (2018). 15.4.4.1 How CUT&amp;RUN works: Cells are permeabilized and incubated with a specific antibody or protein of interest. This antibody or protein is fused to a protein called Protein A-Micrococcal Nuclease (pA-MNase). After incubation, the pA-MNase is activated and cleaves the DNA in the vicinity of the bound antibody or protein of interest. The released DNA fragments are then purified and sequenced to identify the genomic regions that were bound by the antibody or protein of interest. CUT&amp;RUN has several advantages over ChIP-seq, including: CUT&amp;RUN requires a lower amount of starting material and can be performed more quickly than ChIP-seq. CUT&amp;RUN produces less background noise, as the DNA is cleaved in situ, rather than being fragmented by sonication or other methods. CUT&amp;RUN can be used to study chromatin-associated proteins that may not be easily solubilized for ChIP-seq. 15.4.5 CUT&amp;Tag CUT&amp;Tag (Cleavage Under Targets and Tagmentation) is similar to CUT&amp;RUN. It was developed as an improvement over CUT&amp;RUN, with the goal of reducing the amount of background noise and improving the efficiency of the method (Kaya-Okur et al. 2019). 15.4.5.1 How CUT&amp;Tag works: Cells are permeabilized and incubated with a specific antibody or protein of interest, which is fused to a protein called Protein A-Tn5 transposase. The Protein A-Tn5 transposase inserts sequencing adapters into the genomic DNA in the vicinity of the bound antibody or protein of interest. The DNA is then released from the chromatin by the Protein A-Tn5 transposase and purified for sequencing. Like CUT&amp;RUN, CUT&amp;Tag allows for the specific cleavage of DNA in the vicinity of a target protein or antibody, but the addition of sequencing adapters in CUT&amp;Tag occurs directly in the nucleus, prior to DNA release. This results in less background noise and more efficient DNA recovery. 15.4.5.2 Advantages: CUT&amp;Tag has a lower level of background noise and higher sensitivity due to the addition of sequencing adapters in situ. CUT&amp;Tag requires less input material than CUT&amp;RUN, which makes it a more efficient method. CUT&amp;Tag can be used to study the binding sites of transcription factors and chromatin-associated proteins. Overall, both CUT&amp;RUN and CUT&amp;Tag are powerful genomic methods that allow for the efficient study of protein-DNA interactions and epigenetics. The choice between the two methods may depend on the specific research question and the availability of specific reagents or equipment. 15.4.6 GRO-seq (Global Run-On sequencing) Allows for the genome-wide analysis of transcriptional activity by measuring the nascent RNA transcripts that are actively being synthesized by RNA polymerase. GRO-seq is a high-throughput sequencing-based technique that provides a snapshot of the transcriptional landscape of a cell Park and Won (2018). 15.4.7 How GRO-seq works: Nuclei are isolated from cells and incubated with a biotinylated nucleotide triphosphate, which is incorporated into nascent RNA transcripts by RNA polymerase. The labeled RNA is then selectively captured using streptavidin beads, and the RNA is reverse-transcribed into cDNA. The cDNA is then sequenced to identify the regions of the genome that are actively transcribed. 15.4.7.1 Advantages: Its ability to distinguish between the sense and antisense strands of transcribed RNA Its ability to quantify the level of transcriptional activity in individual genes Its ability to identify novel transcripts and transcriptional start sites. DNase-seq and MNase-seq are alternative approaches which can be used to identify accessible regions of chromatin. MNase-seq is particularly useful for studying the occupancy of nucleosomes or transcription factors with high resolution. DNase-seq uses DNAse I to cleave DNA at hypersensitive sites typically associated with cis-regulatory elements. It is also possible to footprint TF occupancy with base-pair level resolution using DNase-seq, while the quality of ATAC-seq footprinting is still in question. Additionally, although both DNAse-seq and MNase-seq have sequence biases as well, the sequence preference is different for each enzyme. References "],["atac-seq-1.html", "Chapter 16 ATAC-Seq 16.1 Learning Objectives 16.2 What are the goals of ATAC-Seq analysis? 16.3 ATAC-Seq general workflow overview 16.4 ATAC-Seq data strengths: 16.5 ATAC-Seq data limitations: 16.6 ATAC-Seq data considerations 16.7 ATAC-seq analysis tools 16.8 Additional tutorials and tools 16.9 Additional tutorials and tools 16.10 Online Visualization tools 16.11 More resources about ATAC-seq data", " Chapter 16 ATAC-Seq This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 16.1 Learning Objectives 16.2 What are the goals of ATAC-Seq analysis? The goals of ATAC-seq are to identify the accessible regions of the genome in a particular set of samples. These data allow us to understand the relationships between the chromatin accessibility patterns and cell states, and to understand the mechanistic causes and consequences of these chromatin accessibility patterns. ATAC-seq data is generated by fragmenting the genome with the Tn5 endonuclease and sequencing the shorter DNA fragments. While most of the genome is associated with protein complexes that preclude the digestion of DNA by Tn5, some regions of the genome have accessible chromatin that can be cleaved by Tn5 resulting in short (&lt;500bp) fragments. These regions of the genome are of biological interest as they are likely to harbor transcription factor binding sites and to constitute cis-regulatory elements, genomic regions that are involved in the regulation of gene expression. 16.2.1 What questions can be answered with ATAC-seq? 16.3 ATAC-Seq general workflow overview A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means. 16.3.1 Data quality metrics: 16.3.1.1 Pre-sequencing QC: 16.3.1.2 Sequencing considerations: 16.3.1.3 Pre-alignment QC: A tool like FastQC or similar should be used to check for GC content, read quality and length, and primer or adapter reads prior to alignment. Trimmomatic is a useful tool for removing primer and adapter sequences if they are present. ATAC-seq experiments should be sequenced with paired-end sequencing, and existing pipelines will expect paired-end. (2 files *_R1.fastq and *_R2.fastq) Use fasterq-dump to download files from NCBI Sequence Read Archive - this tool will automatically split the reads in multiple files 16.3.1.4 Number of mapped reads As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about general sequencing technologies in our previous chapter here. For experiments on human samples this number should be greater than 20 million mapped unique reads. Bowtie2 is commonly used for mapping fragments to the genome. As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about general sequencing technologies in our previous chapter here. For experiments on human samples this number should be greater than 20 million mapped unique reads. 16.3.1.5 Post-alignment QC: Post alignment: check percent of matched, unmatched, unpaired and duplicated reads. Reads which are duplicated or unmatched should be filtered out. Picard is a useful tool for this step. Reads on the + strand should be shifted +4bp, reads on the - strand should be shifted -5 bp. 16.3.1.6 Fragment size distribution: ATAC-seq data is often generated using paired end sequencing technologies, which allow for characterization of ATAC-seq fragments. Histograms of these distributions using single base pair resolution bins reveal patterns of enrichment relative to the nucleosome scale of 147bp and the DNA-helix scale ~10.5bp. When comparing ATAC-seq samples, it is important to consider the fragment size distributions of the samples being compared. Differences in the distributions could lead to results that are unrelated to biology. 16.3.1.7 Peak calling: ATAC-seq peak calling typically makes use of analysis tools developed for ChIP-seq. MACS2 is one of the most common choices for a peak calling tool, but HOMER or other common ChIP-seq peak callers are also acceptable. An input sample is not typically generated for ATAC-seq as it would be for a ChIP-seq experiment, so the major requirement for the peak caller is that it does not require the input control to call peaks. #### Number of peaks: Although the number of accessible chromatin regions can vary from one cell type to another, there are several regions that appear to be constitutively accessible across most cell types. At least 20,000 peaks can be identified in a high quality experiment. The deeper the sequencing the more peaks will be detected in an ATAC-seq experiments. At a very high sequencing depth some of the statistically significant peaks might not be of biological interest. In an analysis of such data sets the fold enrichment relative to background, or absolute peak signal, in addition to statistical significance, ought to be taken into account. 16.3.1.8 FRiP score (fraction of reads in peaks) In high quality ATAC-seq data a large fraction of reads overlap with peaks, while in low quality data there is a high level of fragments that map to background regions. Ideally, the FRiP score is greater than 0.3 (30 percent or more of reads overlap with peaks), with a score below 0.2 indicating low-quality data 16.3.1.9 Overlap with other chromatin accessibility data Thousands of ATAC-seq samples have been produced in human and mouse. High quality ATAC-seq data will share a substantial proportion of peaks with many of these datasets. Publicly available ATAC-seq data can be found and comparisons made at the Cistrome Data Browser [http://cistrome.org/db/]. 16.3.1.10 Overlap with promoters The promoter regions of many genes are constitutively accessible. Examining peak overlap with regions close to known protein coding gene transcription start sites can be used as a check for data quality. 16.3.2 Information from ATAC-seq analysis: 16.3.2.1 Major approaches: Compare changes in transcription factor motif enrichment in accessible regions between samples Compare changes in accessibility of regions (differential accessibility) between samples Footprinting - identify regions where insertion is below expected level 16.3.2.2 Differential accessibility analysis: Differential accessibility analysis typically uses packages for RNA-seq differential expression analysis such as DEseq2, edgeR, or limma. All three are available as R packages and can be installed using Bioconductor, a bioinformatics package manager for R. Unfortunately, there are no well-established packages for this analysis in other languages such as Python. Differential accessibility analysis is an approach with high potential, but care must be taken in processing and normalizing the data for accurate results. 16.3.2.3 Motif analysis: Motif analysis in ATAC-seq is more complex than for ChIP-seq because a larger set of TFs are responsible for the emergence of chromatin accessible regions than for the binding sites of a particular TF. Nevertheless, in the analysis of differential ATAC-seq peaks motif analysis can be used to reveal the TFs related to differences between conditions. This type of analysis is most likely to be successful when the ATAC-seq between closely related conditions or cell types is being compared. The MEME suite has a variety of tools for motif analysis available in both web and command-line versions. 16.3.2.4 Motif Scanning Motif scanning is an analysis technique which identifies putative transcription factor binding sites (TFBS) which sufficiently match a given TF motif’s position-weight matrix. PWMscan is a straightforward online tool, but not the best option for high throughput. FIMO is an alternative which can be used either on the web or the command line. This approach will identify all sites within the genome which are likely to bind a single transcription factor. 16.3.2.5 Motif discovery: Homer or MEME. These tools identify overrepresented sequences within the accessible peaks, regardless of whether they match a previously defined motif. Once the ATAC-seq peaks are determined, the next step is to search for enriched DNA sequence motifs within these regions. This is accomplished by using motif discovery algorithms such as MEME Suite, HOMER, or DREME. These tools scan the ATAC-seq peaks for overrepresented sequence patterns, which may correspond to binding sites for specific transcription factors or other regulatory elements. The motifs discovered can be compared against existing motif databases, such as JASPAR or TRANSFAC, to annotate the potential transcription factor binding sites. 16.3.2.6 Motif Enrichment: These motif enrichment tools will scan through and identify matches to known motif sequences within accessible sites, and additionally will quantify whether the motif is significantly enriched compared to a control sample (input, uncommon with ATAC-seq) or a shuffled sequence to mimic background. After identifying the enriched motifs, researchers can perform motif enrichment analysis to determine the significance of these motifs in the ATAC-seq peaks. This is often done using statistical tools like Fisher’s exact test or hypergeometric test, which assess the enrichment of specific motifs compared to their background occurrence in the genome. Additionally, tools like GREAT or HOMER can be employed to perform gene ontology analysis and assess the functional relevance of the identified motifs in biological processes and pathways. Overall, ATAC-seq motif enrichment analysis provides researchers with valuable insights into the regulatory landscape of the genome. By identifying enriched motifs within accessible chromatin regions, researchers can gain a deeper understanding of the transcriptional regulatory networks and potentially uncover novel transcription factors involved in specific biological processes or diseases. This analysis serves as a powerful tool for unraveling the intricacies of gene regulation and can pave the way for further investigations in functional genomics and therapeutic development. Homer or MEME suite tools. 16.4 ATAC-Seq data strengths: The ATAC-seq is easy to adopt and has been used by many laboratories to generate high quality data for characterizing accessible chromatin in cell lines or sorted cells derived from tissues. In principle, ATAC-seq can identify a large proportion of cis-regulatory elements. In contrast to ChIP-seq, ATAC-seq does not require specific antibodies- ATAC-seq is a time-efficient protocol which requires low cell input. In comparison with histone modification ChIP-seq, ATAC-seq provides a higher resolution assessment of the cis-regulatory genomic regions. Histone modification ChIP-seq, in contrast, tends to be localized on nucleosomes flanking the site of interest and can spread to nucleosomes beyond the immediate flanking ones. 16.5 ATAC-Seq data limitations: ATAC-seq does not precisely identify the transcription factors or other chromatin associated factors that bind in or around chromatin accessible regions. This type of information needs to be inferred through analysis of transcription factor binding motif analysis or ChIP-seq data. Whereas ATAC-seq indicates the presence of a putative cis-regulatory element, H3K27ac ChIP-seq is able to separate accessible regions from those that are accessible and active. Accessible regions are not necessarily cis-regulatory regions, although many of them are. The genes that are regulated by cis-regulatory elements cannot be identified conclusively by ATAC-seq alone. ATAC-seq data can be biased, and affected by batch effects like any other genomics data type. When comparing ATAC-seq data good experimental design principles like the inclusion of biological replicates and consideration of controls, are needed for a meaningful outcome. . 16.6 ATAC-Seq data considerations The nucleosome is the fundamental unit of chromatin packaging in the genome and nucleosomal DNA is far less likely to be cleaved by the Tn5 nuclease than linker DNA. When DNA is fragmented by Tn5 the positions of the endpoints relative to the nucleosomes is an important consideration. When the ends are less than 147bp apart it is likely that both ends originate from the same linker region. Longer fragments can result from cuts on opposite sides of the same nucleosome, or even opposite sides of a genomic interval that encompasses multiple nucleosomes. The short fragments are therefore most likely to be nucleosome free and provide stronger evidence for transcription factor binding sites. As will other genomics protocols, ATAC-seq data is subject to biases introduced in the ATAC-seq protocol and in the sequencing itself. Comparison of ATAC-seq data generated in different batches, by different laboratories or using different protocols might not be directly comparable. In addition, the Tn5 endonuclease does have biases in the precise DNA sequences it can cut. This should be taken into consideration when carrying out base pair resolution analyses including footprinting analysis and analysis of the effects of sequence variants on chromatin accessibility. Read depth will impact ATAC-seq signal, but enzyme strength and conditions can also alter the distribution of cuts. When using ATAC-seq data to answer biological questions it is important to understand what types of bias could impact the results. To ensure valid results the analysis needs to use appropriate statistical methods, ensure enough high quality ATAC-seq data is available, including controls, and possibly reframing the questions. 16.7 ATAC-seq analysis tools This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. MACS2(Y. Zhang et al. 2008): Pros: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples. Cons: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases. HOMER(Heinz et al. 2010): Pros: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data. Cons: may not be as accurate as other peak-calling tools in some cases. ATACseqQC(Schep et al. 2017): Pros: provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity. Cons: does not perform peak-calling or downstream analysis. deeptools(Ramı́rez et al. 2016): Pros: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility. Cons: may require some programming skills to use effectively. DFilter (Ghavi-Helm et al. 2019): Pros: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases. Cons: may require more computational resources than other tools. 16.8 Additional tutorials and tools This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. MACS2(Y. Zhang et al. 2008): Pros: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples. Cons: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases. HOMER(Heinz et al. 2010): Pros: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data. Cons: may not be as accurate as other peak-calling tools in some cases. ATACseqQC(Schep et al. 2017): Pros: provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity. Cons: does not perform peak-calling or downstream analysis. deeptools(Ramı́rez et al. 2016): Pros: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility. Cons: may require some programming skills to use effectively. DFilter (Ghavi-Helm et al. 2019): Pros: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases. Cons: may require more computational resources than other tools. 16.9 Additional tutorials and tools A Galaxy based tutorial for ATAC-seq - Galaxy is a good recommendation for those new to informatics who would like a cloud-based GUI option to use for the analysis of their data. MACS - Model-based analysis for ChIP-Seq - A command line tool for the identification of transcription factor binding sites. Can be used with ChIP-seq or ATAC-seq. CHIPS - A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data. This tool will require some snakemake and coding knowledge. For more recommendations about coding see our later chapter about general data analysis tools. Cistrome DB - a visual tool to allow you to browse your ATAC-seq data. SELMA - Simplex Encoded Linear Model for Accessible Chromatin - SELMA is a python based tool for the assessment of biases in Chromatin based data. 16.10 Online Visualization tools Cistrome DB - a visual tool to allow you to browse your ATAC-seq data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ATAC-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. 16.11 More resources about ATAC-seq data ATAC-seq overview from Galaxy - these slides explain the overarching concepts of ATAC-seq. ATAC seq guidelines from Harvard - this workflow runs through step by step how to analysis ATAC-seq data and what different parameters mean. ATAC-seq review - this paper gives a great overview of ATAC-seq data and step by step what needs to be considered. Identifying and mitigating bias in chromatin CHIP Snakemake pipeline for analyzing ChIP-seq and chromatin accessibility data Paper on bias in DNase-seq footprinting analysis and fragment size effects, similar comments apply to ATAC-seq SELMA Method for evaluating footprint bias in ATAC-seq References "],["single-cell-atac-seq-1.html", "Chapter 17 Single cell ATAC-Seq 17.1 Learning Objectives 17.2 What are the goals of scATAC-seq analysis? 17.3 scATAC-seq general workflow overview 17.4 Peak calling 17.5 Dimensionality reduction 17.6 Embedding (visualization) 17.7 Clustering 17.8 Cell type annotation 17.9 scATAC-seq data strengths: 17.10 scATAC-seq data limitations: 17.11 scATAC-seq data considerations 17.12 scATAC-seq analysis tools 17.13 Trajectory analysis 17.14 Motif detection (ex. ChromVar) 17.15 Regulatory network detection 17.16 Tools for data type conversion 17.17 More resources and tutorials about scATAC-seq data", " Chapter 17 Single cell ATAC-Seq This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 17.1 Learning Objectives 17.2 What are the goals of scATAC-seq analysis? The primary goal of single-cell ATAC-seq is to obtain a high-resolution map of chromatin accessibility at the single-cell level. It is often used for the identification of cell type-specific cis-regulatory elements (CREs) or transcription factor (TF) binding sites because single-cell resolution enables researchers to parse heterogeneous subgroups within a sample. Single-cell ATAC-seq is often applied to questions in developmental biology and cell differentiation. 17.3 scATAC-seq general workflow overview Align reads to genome and assign to cells based on barcodes This step can be performed using Cell Ranger if the data were generated using a 10X Genomics kit (commercially available). For other methods, this step largely resembles the alignment step of bulk ATAC-seq analysis, using aligners such as Bowtie2 or BWA, filtering tools such as Picard, and adapter-trimming tools such Trimmomatic. Prior to adapter trimming barcodes should be matched to the list of known barcodes generated in the experiment and either assigned to a cell or assigned as ambiguous. At this stage unique molecular identifiers (UMIs) added to fragments during library preparation are also extracted and associated with each read to allow for PCR deduplication. Quality control The most important considerations for single-cell ATAC-seq are the number of unique fragments per cell, the transcription start site (TSS) enrichment score and detection of doublets. The number of unique fragments in a cell is a critical quality control metric for single-cell ATAC-seq. Cells with a low fragment count do not provide enough information to draw conclusions about their characteristics, and cells with extremely high fragment counts are likely to be doublets containing reads from multiple cells. To determine the number of unique reads per cell, short random barcodes termed unique molecular identifiers (UMIs) are added to the fragments during library preparation. After the reads have been aligned to the genome and grouped by their cell barcodes, the UMIs can be used to remove PCR duplicates by retaining only one copy of reads with the same UMI and genomic location. The resulting UMI counts can be used as a more accurate measure of chromatin accessibility at specific genomic regions in individual cells. An additional step is typically taken to filter out reads mapping to the mitochondrial genome, so that the final unique fragment counts consist of only unique reads corresponding to nuclear DNA. The TSS enrichment score in ATAC-seq measures the preferential accessibility of chromatin regions near gene promoters. This approach was established in pipelines for bulk ATAC-seq, such as the ENCODE pipeline (cite), and is also applicable to single-cell ATAC-seq. In brief, the TSS enrichment score quantifies the enrichment of open chromatin regions at TSSs versus a non-TSS background (e.g. +/-2000 bp beyond TSSs). A high TSS enrichment score therefore indicates that the number of accessible regions at TSSs, where high accessibility is expected, is significantly higher than background (cite), while a low TSS enrichment score indicates that the data quality is not high enough to distinguish accessible regions from background insertion patterns. Doublet detection is any approach that attempts to computationally identify cell barcodes which contain reads from a mixture of single cells. Although an extremely high number of fragment counts may indicate that a cell is in fact a doublet, doublet detection provides a more targeted approach by assigning a score or a probability that each cell is a doublet. These approaches may compare cells to simulated doublets generated randomly from the data, or may rely on the fact that the number of ATAC-seq reads in a single cell is limited to only two reads per cell for diploid organisms. This step is not as common in scATAC-seq analysis as it is in single cell RNA-seq analysis owing to the difficulty of estimating doublets from the highly sparse data, but can be done for additional rigor or if there is particular concern that the dataset contains a high number of doublets. Additionally, the fragment size distribution of the library should exhibit nucleosomal periodicity, where fragments are enriched at ~147 bp intervals corresponding to the length of nucleosome-bound DNA that are refractory to Tn5 insertion. 17.4 Peak calling Peak calling in ATAC-seq is performed in a similar manner to bulk ATAC-seq [ref bulk chapter]. Importantly, it should be performed by treating data from all cells within a cluster as a pseudo-bulk replicate. This is because scATAC-seq data is highly sparse and any individual cell only has enough information to convey whether a region is accessible or inaccessible, due to the maximum of 2 reads per locus per cell. Peak calling is commonly performed using MACS2, but other peak callers suitable for ATAC-seq could be used as well, as described in our chapter on bulk ATAC-seq (reference). 17.5 Dimensionality reduction As ATAC-seq data is extremely high dimensional, with counts for hundreds of thousands of peaks in thousands of cells, dimensionality reduction must be performed to represent the data in a way which reflects the major sources of variation while allowing for efficient computation. Many of the most popular dimensionality reduction approaches for ATAC-seq are borrowed from natural language processing, including latent semantic indexing (LSI) as well as probabilistic approaches such as latent Dirichlet allocation (LDA) and probabilistic LSI (pLSI). LSI and its variations are commonly used and are a simple, efficient approach based on PCA. Probabilistic approaches calculate the probability of information in a dataset being related to specific ‘topics’ identified by the statistical model. They are more mathematically complex than LSI but attempt to more accurately reconstruct the latent (not observable) structure in the data. 17.6 Embedding (visualization) Embedding is the process of representing the high-dimensional scATAC-seq dataset in two (or occasionally three) dimensions for visualization. First, dimensionality reduction must have been performed using one of the methods described in the section above. Then, the result of dimensionality reduction can be provided as input to the chosen embedding approach. The most common method for generating ATAC-seq embeddings is UMAP (Uniform Manifold approximation) but other methods, such as force-directed graph layouts or t-SNE (t-distributed Stochastic Neighbor Embedding) can also be used. 17.7 Clustering Clustering is the process of computationally detecting populations of cells with similar characteristics - in this case, cells with similar accessibility profiles. Leiden clustering, which uses the similarity of cells to their neighbors to group cells into clusters, is a common choice for identifying clusters in scATAC-seq data. 17.8 Cell type annotation Cell type annotation on scATAC-seq data alone can be performed based on the enrichment of cell-type-specific CREs, or alternatively can be performed based on gene expression patterns observed in integrated scRNA-seq data. Gene scores are a measure of the accessibility of a gene locus and putative CREs within a defined window of the gene. Gene scores significantly above the expected background suggest a gene is active in a given cell type, and these scores can be used to identify markers for cell type annotation. Integration with scRNA-seq data can allow for identification of cell types which may be difficult to distinguish based on ATAC-seq profiles alone(ref), but requires an scRNA-seq dataset of a comparable population of cells. Trajectory analysis, which is used to infer and visualize the developmental or differentiation paths of individual cells within a population, can be performed on processed single-cell ATAC-seq data using tools developed for single-cell RNA-seq data. These approaches aim to reconstruct the temporal progression and identify the key intermediate states or cell fate decisions during biological processes such as embryonic development, tissue regeneration, or disease progression. Trajectory inference algorithms, such as: Monocle Qiu et al. (2017) Palantir Setty et al. (2019) PAGA Wolf et al. (2019) These are commonly used to reconstruct the developmental trajectories and order the cells along these trajectories. The resulting trajectory models provide valuable insights into the underlying regulatory dynamics, lineage relationships, and critical regulatory genes or pathways governing cellular differentiation and development. Much like peak calling, it is not possible to obtain enough information from individual cells to perform differential accessibility analysis at the single cell level. Because of this limitation, differential accessibility analysis is performed in a similar manner to bulk ATAC-seq analysis using pseudo-bulk data at the cluster or cell type level, where counts from many single cells are aggregated together and treated as though they are a single sample generated from a bulk experiment. Common tools for differential accessibility analysis include deSeq2 and EdgeR, which were both developed for differential gene expression analysis. 17.9 scATAC-seq data strengths: scATAC-seq is the gold-standard for showing heterogeneity in chromatin accessibility between populations of cells and within tissues because single-cell resolution enables analysis of subpopulations that are challenging to isolate experimentally. scATAC-seq can be paired with scRNAseq to obtain transcriptome and chromatin accessibility measurements from the same cells. This is a powerful approach for gaining understanding of how specific patterns of chromatin accessibility affect gene expression. scATAC-seq is also a relatively high throughput technique, particularly with droplet based techniques. A single dataset can cover thousands of cells. 17.10 scATAC-seq data limitations: scATAC-seq has very high sparsity compared to single-cell RNA-seq since there are only two copies of each locus in a diploid cell compared to many copies of mRNAs. Like other single-cell techniques This results in the data essentially being binary at the single cell level - a region either has reads and is considered accessible in that cell or has no reads. Like bulk ATAC-seq, the Tn5 transposase has a sequence bias, so regions with a preferred sequence will undergo higher levels of transposition. Highly accessible regions of DNA will also be overrepresented in the final library. Single-cell ATAC-seq is an expensive technique regardless of the experimental approach chosen. Plate-based methods are generally cheaper but have lower throughput, while droplet-based methods are higher throughput but extremely costly and reliant on proprietary technology. Large datasets require significant investment and often use of droplet-based techniques. Many scATAC-seq datasets have low cell numbers due to the cost and technical difficulty of the assay. This presents a challenge for analysis since the data is highly sparse and noisy, which in combination with a small dataset can lead to difficulty interpreting the data. 17.11 scATAC-seq data considerations scATAC-seq will always be sequenced with paired-end reads. There are two major experimental approaches for generating single-cell ATAC-seq data: droplet based methods, such as the commercially available 10X Chromium platform, where nuclei are separated into individual droplets, and plate-based methods, which use multiple pooling and barcoding steps to tag each cell with a unique combination of barcodes (with a level of expected barcode collisions). The procedure for demultiplexing the reads will depend on the method used to generate the data. Data generated using 10X platforms can be de-multiplexed and aligned using the Cell Ranger software, while plate-based approaches typically use an alignment and peak-calling approach similar to that used for bulk ATAC-seq, with the additional step of matching the barcodes in each read to the known set of combinatorial barcodes. Correctly matching the reads to cells and filtering reads with non-matching barcodes is a critical step for scATAC-seq analysis. 17.12 scATAC-seq analysis tools Cellranger is a popular preprocessing tool specifically designed for scATAC-seq data generated using the 10x Genomics platform. It performs essential steps such as demultiplexing, barcode processing, read alignment, and filtering, providing a streamlined workflow for 10x-generated scATAC-seq data. However, it cannot be used for data generated by other methods. Bowtie2, Picard tools, and Trimmomatic: These tools are commonly used for preprocessing scATAC-seq data generated using plate-based or combinatorial indexing approaches. Bowtie is a fast and widely used aligner for mapping sequencing reads to a reference genome, while Picard provides a suite of command-line tools for manipulating and analyzing BAM files and Trimmomatic can remove adapter sequences from reads. These tools can be utilized for aligning reads, removing duplicates, sorting, and filtering the data to obtain the necessary inputs for downstream analysis. ArchR is a comprehensive scATAC-seq preprocessing tool implemented in R. It accepts both 10x fragment files and BAM files as input, making it suitable for data generated using different protocols. ArchR performs quality control, peak calling, peak annotation, normalization, and data transformation steps. It is one of the most popular tools for analyzing standalone scATAC-seq data and provides a user-friendly interface for exploratory data analysis. Scanpy is a Python-based tool widely used for visualizing and manipulating single-cell omics data, including scATAC-seq. After processing scATAC-seq data with tools like ArchR, the output can be exported as a matrix (data) or CSV (metadata) and formatted into a Scanpy data object. Scanpy offers various analytical functionalities, including dimensionality reduction, clustering, trajectory inference, differential accessibility analysis, and visualization. This tool is the tool of choice if you plan to perform your analysis primarily in Python. Seurat is an R-based tool that is extensively used for analyzing and visualizing single-cell omics data, including scATAC-seq. Similar to Scanpy, after preprocessing the data with tools like ArchR, Seurat can be employed for downstream analysis. It provides a wide range of functions for quality control, dimensionality reduction, clustering, differential accessibility analysis, cell type identification, and visualization. Seurat integrates well with other existing R-based tools for single-cell data analysis, offering flexibility and compatibility. This is a useful core tool to use if you plan to perform your analysis in R. Signac is an R package specifically designed for the analysis of single-cell epigenomics data, including scATAC-seq. It offers a comprehensive set of functions for preprocessing, quality control, dimensionality reduction, clustering, trajectory analysis, differential accessibility, and visualization. Signac integrates well with Seurat, providing an additional tool for exploring and analyzing scATAC-seq data. Additional quality checking tools: Quality checking and filtering steps in scATAC-seq analysis can be performed using various tools depending on the workflow and programming language. Some commonly used tools with QC capabilities useful for examining library quality measures such as GC bias, overrepresented sequences, and quality scores include FastQC and deepTools. 17.12.0.1 Doublet detection ArchR has a tool for doublet detection - it generates synthetic doublets from combinations of cells in the dataset and uses the similarity of cells in the dataset to these synthetic doublets to identify doublets. This is a common approach, and variations of it are used by most doublet detection algorithms. Many are specifically designed to expect transcriptomic data (such as the commonly used Scrublet) and identify barcodes with mixed transcriptional signatures of multiple clusters/cell types, and these methods do not accept scATAC-seq input. Some transcription based tools can be given modified input to detect doublets in scATAC-seq data, as described in documentation from the Demuxafy project. There are also tools like AMULET which leverage the fact that the number of ATAC-seq reads at any locus in a single cell are limited by the number of copies of a chromosome to detect doublets. Overall, doublet detection is not as common of a step in scATAC-seq analysis as it is in scRNA-seq analysis, owing to the limited tools available and the difficulty of performing this analysis on extremely sparse data. 17.12.0.2 Visualization Scanpy (Python) and Seurat (R) are the most commonly used tools for visualizing scATAC-seq data. These tools allow you to plot the accessibility of specific peaks or gene scores, as well as metadata such as cell type, clusters, etc. on the UMAP (or other) embedding at the single-cell level. Both packages include built-in functions to perform this plotting in a streamlined manner and to manipulate the data objects for additional quantification and visualization using general plotting packages such as matplotlib or ggplot. The choice between these tools is primarily determined by the programming language you choose for your analysis, as they share many of the same core features. Additionally, tools such as deepTools or enrichedHeatmap may be useful for visualizing heatmaps of pseudo-bulk data, and bedGraph or BigWig representations of pseudo-bulk data can be visualized using genome browsers such as IGV or UCSC genome browser. pyGenomeBrowser is a package which allows more customizable visualization of browser tracks and may be useful for generating publication-quality figures. 17.13 Trajectory analysis Several tools are available for single-cell trajectory analysis. These approaches are primarily distinguished by variations used in their mathematical approaches for calculating trajectories, but most make use of graph-based approaches which model the similarity or connections between cells in a dataset. The distinct approaches of the tools discussed here lead to varying levels of performance on different types of data, and extensive benchmarking has been performed (here) and (here) on synthetic datasets to determine the accuracy of different approaches. The most important consideration here is whether there are any cyclic trajectories expected in the dataset, where the end of the trajectory would connect back to the start, or disconnected trajectories, where not all trajectories originate from the same starting state. Not all approaches can reconstruct these trajectories accurately. Most popular methods expect a tree-like structure, with a single starting point and branches which lead toward terminal cell fates. Monocle is a popular choice that offers a comprehensive workflow for trajectory inference, visualization of trajectory analysis, pseudotime ordering of cells, and identification of differentially expressed genes along trajectories. Another commonly used tool is Slingshot, which utilizes a graph-based approach to infer trajectories, compute pseudotime ordering, and generate smooth curves to visualize trajectories. Additionally, it has the ability to infer multiple disconnected trajectories within a single dataset. PAGA (Partition-based Graph Abstraction) uses a distinct strategy with the goal of maintaining connections between similar groups of cells as well as the overall structure of the data. Palantir is a tool which uses a probabilistic approach to assign cell fate probabilities to each cell in a dataset, which can be used to define cells belonging to a specific trajectory. 17.14 Motif detection (ex. ChromVar) Single-cell chromVAR analysis is a computational approach used to assess cell-to-cell variation in chromatin accessibility profiles across a population of single cells. It aims to identify TF activity differences between cell types or states and elucidate the underlying regulatory dynamics. Single-cell chromVAR leverages the concept of TF motif enrichment or depletion within cell-specific accessible regions to infer TF activity. It compares the chromatin accessibility profiles of individual cells to a background model derived from the aggregate accessibility profiles of all cells, enabling the detection of cell-specific TF binding patterns. By quantifying the enrichment or depletion of TF motifs within accessible regions, single-cell chromVAR provides insights into TF activity variation, potential regulatory networks, and cell-type-specific transcriptional regulation. It serves as a valuable tool for understanding the contribution of TFs to cellular heterogeneity and regulatory processes in single-cell chromatin accessibility data. 17.15 Regulatory network detection CisTopic is a computational tool used for the analysis of single-cell chromatin accessibility data to identify and characterize cell subpopulations with distinct regulatory patterns. It employs a topic modeling approach to capture the variability in chromatin accessibility profiles across cells and identifies the major regulatory patterns driving cell heterogeneity. CisTopic assigns cells to topics based on the similarity of their accessibility landscapes. By analyzing the differential accessibility of genomic regions within each topic, CisTopic facilitates the discovery of transcription factor binding motifs and CREs associated with specific cell subpopulations. 17.16 Tools for data type conversion A comprehensive explanation of packages to convert between single-cell data object types used by Python and R packages is found here. The most common data types for processed scATAC-seq data are: SingleCellExperiment Seurat/h5Seurat annData objects H5seurat objects can be converted to annData objects using SeuratDisk. 17.17 More resources and tutorials about scATAC-seq data Galaxy tutorial for sc-ATAC-seq analysis Signac scATAC-seq tutorial with pbmcs sc ATAC-seq chapter - Intro to Bioinformatics and Comp Bio Single Cell ATAC-seq youtube video Comprehensive analysis of single cell ATAC-seq data with SnapATAC References "],["chip-seq-1.html", "Chapter 18 ChIP-Seq 18.1 Learning Objectives 18.2 What are the goals of ChIP-Seq analysis? 18.3 ChIP-Seq general workflow overview 18.4 ChIP-Seq data strengths: 18.5 ChIP-Seq data limitations: 18.6 ChIP-Seq data considerations 18.7 ChiP-seq analysis tools 18.8 More resources about ChiP-seq data", " Chapter 18 ChIP-Seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 18.1 Learning Objectives 18.2 What are the goals of ChIP-Seq analysis? ChIP-Seq (chromatin immunoprecipitation sequencing) and related approaches are used to identify genome-wide binding sites of specific proteins or protein complexes. Given the diversity of interactions at the DNA-protein interface, sequencing-based methods for targeted chromatin capture have evolved to meet precise research needs and improve the quality of the results. Specifically, ChIP-Seq builds on protein immunoprecipitation techniques (IP) by applying next generation sequencing to a pulldown product. IP followed by sequencing can be applied to any nucleic-acid binding protein for which an antibody is available, including a known or putative transcription factor (TF), chromatin remodeler or histone modifications, or other DNA- or chromatin-specific factors. ChiP-Seq approaches have been honed to increase signal-to-noise, reduce input material, and more specifically map protein-DNA interactions, for example by treating the IP product with a exonuclease that chews-back unprotected DNA end (e.g. ChIP-exo). The main goals of analysis for ChIP-Seq approaches are: Identify the genomic regions where a specific protein or protein complex binds. This can be achieved by sequencing both the IP input and product, and then calculating the enrichment in the product sample over the input. Annotate binding sites via comparison to other datasets and genome annotations. This may include transcription start sites (TSSs) or gene-regulatory regions. Oftentimes it is best to validate your data against previous profiling of similar epitopes. Comparison of binding sites: Many ChIP-Seq experiments compare changes in protein-DNA interactions across different conditions. This type of analysis can leverage statistical tools for pairwise comparison and multiple hypothesis testing. Identification of co-occurring motifs: Many chromatin proteins exhibit a sequence-specific binding pattern that is shaped by evolutionary forces. These sequence patterns, or motifs, are thought to capture contacts between specific base pairs and the DNA-binding domain of a protein and are often represented as a position weight matrix (PWM) for computational analysis. Statistical tools have been developed for de novo motif discovery within a given set of genomic intervals, like a ChIP-seq peaklist. The list of discovered motifs can be meaningfully interpreted by cross referencing with a motif database and recovery of known motifs represent another means of data validation. Integration with other -omics data: Given the expansive repositories of publicly available sequencing data, creating a comprehensive narrative from a ChIP-Seq experiment usually involves comparison with other types of sequencing data. Just like how a ChIP-Seq peak list can be interpreted through existing genome annotations, other sequencing data can be interpreted through the binding sites identified from a given ChIP-Seq experiment. For example, a sequence variant might be enriched for or against in protein binding sites versus previously identified motifs. This would suggest that a mutation would alter DNA-protein interactions. Binding of a specific gene-regulatory element might also correlate with changes in gene expression. 18.3 ChIP-Seq general workflow overview &lt;TODO: add data formats in a graphical format&gt; A key contribution of large consortia, such as the ENCODE consortium, are standardized processing workflows to facilitate the integration of ChIP-seq data generated in different labs. While the exact data processing needs of any given experiment may vary, established pipelines provide a helpful starting point. In choosing a data processing workflow, it is essential to note the input data format. For example, the read length should be considered, as well as the sequencing paradigm (i.e. whether the data is single-end or paired-end). The most generic steps for processing ChIP-Seq data are: Quality control: The first step in ChIP-Seq data processing is to perform quality control checks on the raw sequencing data to assess its quality and identify any potential issues, such as poor sequencing quality or adapter contamination which can be assessed via FASTQC. Read alignment: The next step is to align the ChIP-Seq reads to a reference genome using a suitable alignment tool such as Bowtie or BWA. Notably, many publicly available ChIP-Seq datasets are single-ended and it is important to use the correct alignment parameters for a given sequencing approach. In the case of ChIP-seq approaches that include exonuclease treatment, such as ChIP-exo and ChIP-nexus, a paired-end sequencing approach is often taken and then insert size can be useful for validating alignment. For example, profiling of a histone modification should yield nucleosome-sized fragments, ranging up from 120 bp for mononucleosomes, whereas TFs should yield smaller, sub-nucleosomal fragments and polymerase is in between at 20-50bps (PMID: 30030442). Peak calling: After the reads have been aligned to the genome, the next step is to identify the genomic regions where the protein or protein complex of interest is bound. This is done using peak-calling algorithms, such as MACS2, SICER, or HOMER, which can calculate enrichment as fold change over the input control with statistical testing. Quality control of peaks: Once the peaks have been called, it is important to perform quality control checks to ensure that the peaks are of high quality and biologically relevant. This can be done by assessing the number of peaks, fraction of reads in peaks (FRiPs), enrichment of the peaks in specific genomic regions, comparing the peaks to known gene annotations, or performing motif analysis. Often, peaks will be merged across replicates to create a consensus peak set. Peaks should be assessed visually with tools like IGV or the UCSC genome browser to ensure they overlap regions of high coverage. The Cistrome Data Browser is another useful resource for comparing with published ChIP-seq, DNase-seq and ATAC-seq data. Differential binding analysis: If the ChIP-Seq experiment involves comparing the binding of the protein or protein complex in different conditions or cell types, statistical testing can be performed to identify the regions of the genome where the protein or protein complex binds differentially. Tools developed for multiple comparison testing, like Limma, Deseq2, and EdgeR are useful for this type of comparative analysis. Integrative analysis: Finally, integrative analysis with other -omics data can be performed to gain biological insights into the ChIP-Seq data. This can involve interpreting ChiP-Seq data through existing annotations by looking at signal enrichment in different genomic regions, like transcription start sites (TSSs), gene bodies, and previously-identified cis-regulatory elements (CREs). ChIP-Seq data can even be interpreted through other ChIP-seq data to see if features overlap with statistical testing for similarity using packages like BEDTools and Bedops. 18.4 ChIP-Seq data strengths: ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful tool for understanding the genomic locations where a specific protein or protein complex binds. ChIP-Seq is particularly good at showing or illustrating: Identification of regulatory elements: ChIP-Seq can be used to identify the genomic regions where a protein or protein complex binds to regulatory elements, such as promoters, enhancers, and silencers. For example, certain histone modifications characterize active promoters and enhancers, such as H3K4 methylation and H3K27 acetylation. Characterization of protein-protein interactions: ChIP-Seq can be used to identify the genomic regions where multiple proteins bind. In this way, cobinding can be inferred to provide insight into the protein-protein interactions that are involved in regulating gene expression. Identification of binding site motifs: ChIP-Seq can be used to identify the DNA motifs that are enriched in the binding sites of a protein or protein complex. This information can be used to identify other transcription factors or cofactors that are involved in the same regulatory network. Databases of known TF binding motifs include JASPAR, Cis-BP, Hocomoco. Differential binding analysis: ChIP-Seq can be used to compare the binding of a protein or protein complex in different conditions or cell types, which can provide insight into the mechanisms that regulate protein binding and the impact of different cellular states on the regulatory networks. 18.5 ChIP-Seq data limitations: ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful technique, but there are several biases, caveats, and problems that can arise when analyzing ChIP-Seq data. Some of the most common biases, caveats, and problems are: Accessibility bias: ChIP-Seq relies on fragmentation of chromatin prior to immunoprecipitation, which is observed to enrich for genomic regions that are highly accessible to TFs in general . Antibody specificity and cross-reactivity: The specificity of the antibody used in ChIP-Seq is crucial for the accuracy of the results. Finding an antibody for specific epitopes can pose a challenge because antibodies can have cross-reactivity with other epitopes, which can result in false positives or misinterpretation of the data. DNA fragmentation bias: The length and quality of the DNA fragments used in ChIP-Seq can impact the results. Shorter fragments are often located in regions with more highly accessible chromatin, especially nucleosome linker regions and promoters of active genes. Sequencing depth bias: The amount of sequencing depth can impact the results of ChIP-Seq analysis. Insufficient sequencing depth can result in false negatives or miss important binding sites. Reproducibility and sample variation: ChIP-Seq experiments can be highly variable, and reproducibility between replicates can be an issue. Additionally, the composition and quality of the sample can also impact the results. Peak-calling algorithm choice: The choice of peak-calling algorithm can impact the results of ChIP-Seq analysis, as different algorithms have different strengths and weaknesses. Interpretation of binding sites: Finally, the interpretation of binding sites identified by ChIP-Seq can be complex and requires additional validation to confirm their biological relevance and function. Notably, ChIP-Seq cannot distinguish direct protein-DNA interaction from indirect binding (e.g. where a protein may bind another protein that binds to DNA). 18.6 ChIP-Seq data considerations As a general guideline, a minimum sequencing depth of 20 million reads is recommended for ChIP-seq experiments in Drosophila, whereas 40–50 million reads is a practical minimum for most marks in human tissue (PMID: 24598259). However, this depth may not be sufficient for some analyses, particularly for studies that require high resolution or low signal-to-noise ratio. In such cases, deeper sequencing may be necessary to achieve the desired level of sensitivity and specificity. In general, epitopes that cover large sequence space (e.g. repressive histone modification such as H3K27me3) require greater sequencing depth than epitopes confined to more narrow genomic regions (e.g. active histone modifications such as H3K4 methylation and H3K27ac). ChIP-seq for TFs may require even less sequencing depth; however, low antibody specificity may necessitate deeper sequencing due to low signal-to-noise. In practice, the depth of sequencing required for ChIP-seq experiments can vary widely depending on the specific experimental design and research question. It is important to perform a pilot study or use appropriate statistical methods to estimate the necessary sequencing depth for a given experiment. Choosing a specific antibody is essential, otherwise even deep sequencing may not recover signal over high background. Sequencing depth should also account for genome size (e.g. larger genome requires deeper sequencing). 18.7 ChiP-seq analysis tools 18.7.1 Tools for quality checks FastQC is a widely used tool that is used to assess the quality of sequencing data. It analyzes the raw sequencing data and generates a report that provides an overview of various metrics such as base quality, sequence length distribution, and GC content. Picard tools and SAMtools: Picard tools and SAMtools are two collections of command-line tools that are used to manipulate and analyze high-throughput sequencing data. They can be used to check the quality of the data, remove duplicates, and generate summary statistics. MACS2 (Model-based Analysis of ChIP-Seq) is a software tool that is specifically designed for the analysis of ChIP-Seq data. It is used to identify regions of the genome that are enriched for DNA-protein interactions. ENCODE Uniform Processing Pipelines: The ENCODE (Encyclopedia of DNA Elements) Uniform Processing Pipelines are a set of standardized protocols and tools that are used to process and analyze ChIP-Seq data. They ensure that the data generated by different labs are consistent and can be easily compared. These tools are just a few examples of the many quality control tools available for ChIP-Seq analysis. The choice of tool(s) to use will depend on the specific analysis being performed and the preferences of the user. 18.7.2 Tools for Peak calling: MACS2 (Model-based Analysis of ChIP-Seq) is a widely used tool for peak calling in ChIP-Seq data. It uses a Poisson distribution to model the local noise and identifies peaks based on the fold enrichment over the background noise. SICER: Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) is a peak caller that takes into account the spatial clustering of enriched regions in ChIP-Seq data. It uses a clustering algorithm to identify peaks based on the local density of enriched regions. HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools that includes a peak caller for ChIP-Seq data. It uses a sliding window approach to identify peaks based on the local enrichment of reads. PeakSeq is a peak caller that uses a Bayesian approach to identify enriched regions in ChIP-Seq data. It models the relationship between the read counts and the signal-to-noise ratio and identifies peaks based on the posterior probability of enrichment. 18.7.3 Tools for Differential Analysis DESeq2: This is a widely used R package for differential analysis of sequencing count data, including ChIP-seq. It uses a negative binomial model to normalize and test for differential enrichment of ChIP-seq peaks. edgeR: Another popular R package for differential expression analysis of RNA-seq data, edgeR can also be used for differential analysis of ChIP-seq data. It uses a generalized linear model to estimate differential enrichment and has been shown to be effective for ChIP-seq data with low read counts. Annotation ChIPseeker: This R package can be used for annotating ChIP-seq peaks with genomic features such as gene annotation, gene ontology, and pathway analysis. It can also generate plots and heatmaps for visualization. HOMER: This suite of tools includes several programs for motif discovery, peak annotation, and visualization. The annotatePeaks.pl program can be used for assigning genomic regions to specific functional categories, including promoter, exon, intron, intergenic, and enhancer regions. GREAT: This web-based tool can be used for annotating genomic regions with functional annotations such as gene ontology terms and regulatory domains. It uses a statistical approach to associate genomic regions with biological functions. Cistrome-GO: A web-based tool for determining the gene ontologies of genes likely to be regulated by regions discovered through TF ChIP-seq. GenomicRanges: This R package provides a framework for working with genomic ranges, including intersection, overlap, and annotation of genomic regions with functional categories. It can be used in conjunction with other R packages for ChIP-seq analysis, such as ChIPseeker and DiffBind. ChIP-Enrich: This web-based tool can be used for annotating ChIP-seq peaks with functional categories such as gene ontology, pathway analysis, and transcription factor binding sites. It uses a hypergeometric test to identify overrepresented functional categories. Cistrome DB: The website allows users to upload their enriched regions, returning TF ChIP-seq, DNase-seq or ATAC-seq samples with similar profiles. 18.7.4 Motif Analysis MEME Suite: The MEME Suite is a comprehensive suite of tools for motif analysis, including motif discovery and motif-based sequence analysis. It includes tools for discovering de novo motifs from ChIP-Seq data and for searching for known motifs in the regions bound by the protein of interest. HOMER is a suite of tools for motif discovery and analysis. It includes tools for identifying de novo motifs from ChIP-Seq data, as well as for searching for known motifs in the regions bound by the protein of interest. HOMER also provides tools for performing gene ontology analysis and pathway analysis based on the identified motifs. MEME-ChIP is a specialized version of the MEME Suite that is specifically designed for motif analysis in ChIP-Seq data. It includes tools for discovering de novo motifs from ChIP-Seq data, as well as for searching for known motifs in the regions bound by the protein of interest. CentriMois a tool for identifying enriched motifs in ChIP-Seq data based on the position of the motif relative to the peak summit. It can be used to identify motifs that are enriched at the center of the peak, as well as those that are enriched near the edges of the peak. 18.7.5 Tools for preprocessing Trimmomatic is a widely used tool for trimming and filtering Illumina sequencing data. It is often used to remove low-quality reads, adapter sequences, and other artifacts that can affect downstream analysis. Cutadapt is another popular tool for trimming adapter sequences from high-throughput sequencing data. It is particularly useful for removing adapters that contain degenerate nucleotides or that have been ligated with variable lengths. Bowtie2 is a fast and memory-efficient tool for aligning sequencing reads to a reference genome. It is often used to map ChIP-Seq reads to the genome prior to peak calling. SAMtools is a suite of tools for manipulating SAM/BAM files, which are commonly used to store alignment data from high-throughput sequencing experiments. It can be used for filtering and sorting reads, as well as for generating summary statistics. BEDTools is a powerful suite of tools for working with genomic intervals, such as those generated by ChIP-Seq peak calling. It can be used for operations such as intersecting, merging, and subtracting intervals. 18.7.6 Tools for making visualizations Integrative Genomics Viewer (IGV) is a popular genome browser that is widely used for the visualization of genomic data, including ChIP-Seq data. It provides a user-friendly interface for exploring genomic data at different levels of resolution, from the whole-genome level down to individual nucleotides. The UCSC Genome Browser is another widely used genome browser that can be used to visualize ChIP-Seq data. It provides an intuitive interface for navigating and visualizing genomic data, including the ability to zoom in and out and to overlay multiple data tracks. Genome Visualization Tool (GViz) is a package for the R statistical computing environment that provides functions for generating publication-quality visualizations of genomic data, including ChIP-Seq data. It offers a high degree of flexibility and customization, allowing users to create complex and informative plots that convey the relevant information in a clear and concise manner. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ChIP-seq data. Cistrome-Explorer A web-based visualization of compendia of ATAC-seq and histone modification ChIP-seq data for diverse samples, represented as a heatmap. Users can upload their ChIP-seq peak sets to assess the tissue specificity of their regions on the genome. 18.7.7 Tools for making heatmaps Deeptools is a widely used package for analyzing ChIP-seq data, and it includes a tool called “plotHeatmap” that can generate heatmaps from ChIP-seq data. Integrative Genomics Viewer (IGV) is a popular tool for visualizing and exploring genomic data. It includes a heatmap function that can be used to generate heatmaps from ChIP-seq data. EnrichedHeatmapis an R package for making heatmaps that visualize the enrichment of genomic signals on specific target regions. SeqMonk is a software package designed for the visualization and analysis of large-scale genomic data. It includes a heatmap function that can generate heatmaps from ChIP-seq data. ngs.plot is a tool that can generate different types of plots, including heatmaps, from NGS data. It includes a ChIP-seq specific mode that can be used to generate heatmaps from ChIP-seq data. ChAsE: ChAsE (ChIP-seq Analysis Engine) is a web-based platform for ChIP-seq analysis that includes a heatmap function that can generate heatmaps from ChIP-seq data. These tools allow users to generate heatmaps of ChIP-seq data, which can be used to identify enriched regions of binding and to visualize patterns of binding across genomic regions. The Cistrome Project has a large collection of human and mouse ChIP-seq, DNase-seq and ATAC-seq data, as well as tools for analyzing user generate ChIP-seq data with publicly available samples. These tools include the Cistrome Data Browser toolkit function that can find publicly available datasets that are similar to a ChIP-Seq peak set, and Cistrome-GO for gene ontology analysis of TF ChIP-seq target genes. 18.8 More resources about ChiP-seq data &lt;TODO: Put links to any resources and tutorials that are useful for ChIP-Seq data&gt; Shirley Liu’s Computational biology course Galaxy ChIP-seq tutorial ENCODE ChiP-seq tutorial Crazyhottommy’s ChIp-seq tutorial Harvard CUT&amp;RUN tutorial 4DN CUT&amp;RUN tutorial Henikoff Lab CUT&amp;Tag tutorial ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ChIP-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. "],["cutrun-and-cuttag.html", "Chapter 19 CUT&amp;RUN and CUT&amp;Tag 19.1 Learning Objectives 19.2 Technologies 19.3 Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology 19.4 Differences between CUT&amp;RUN and CUT&amp;Tag 19.5 Limitation of CUT&amp;RUN and CUT&amp;Tag 19.6 General Data Analysis Workflow 19.7 More resources about CUT&amp;RUN and CUT&amp;Tag data analysis", " Chapter 19 CUT&amp;RUN and CUT&amp;Tag This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 19.1 Learning Objectives 19.2 Technologies 19.3 Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology Lower Cell Number and Less Starting Material Requirement: CUT&amp;RUN and CUT&amp;Tag can be performed with much lower cell number than ChIP-seq. This is particularly beneficial when working with rare cell types or limited biological samples. The CUT&amp;RUN and CUT&amp;Tag techniques involve less sample manipulation compared to ChIP-seq. This minimizes the risk of losing material and potential artifacts from extensive sample handling and processing. Higher Resolution and Specificity: CUT&amp;RUN and CUT&amp;Tag provide higher resolution and greater specificity in identifying protein-DNA interactions. This results from the method’s direct targeting and cleavage of DNA at the binding sites, reducing background noise. Reduced Background Noise: CUT&amp;RUN and CUT&amp;Tag typically result in lower background noise due to the direct tagging of DNA at the site of the protein-DNA interaction, enhancing the clarity and quality of the results. The sensitivity of sequencing depends on the depth of the sequencing run (i.e., the number of mapped sequence tags), the size of the genome, and the distribution of the target factor. The sequencing depth is directly correlated with cost and negatively correlated with background. Therefore, low-background CUT&amp;RUN and CUT&amp;Tag will waste less sequencing resources on profiling the background and hence is inherently more cost-effective than high-background ChIP-seq. Cost-Effectiveness: In addition to high efficiency in sequencing the target region, due to the lower requirement for reagents and enzymes, CUT&amp;RUN and CUT&amp;Tag can be more cost-effective, especially in high-throughput settings. More Efficient Protocol Workflow and Faster Turnaround Time: The protocol for CUT&amp;RUN and CUT&amp;Tag is more streamlined and less labor-intensive than ChIP-seq. It eliminates the need for sonication, DNA purification, and ligation steps, simplifying the procedure. The overall protocols of CUT&amp;RUN and CUT&amp;Tag are generally quicker and more straightforward than ChIP-seq, leading to faster experiment turnaround times. 19.3.1 CUT&amp;RUN Cleavage Under Targets and Release Using Nuclease, CUT&amp;RUN for short, is an antibody-targeted chromatin profiling method to measure the histone modification enrichment or transcription factor binding. This is a more advanced technology for epigenomic landscape profiling compared to the traditional ChIP-seq technology and known for its easy implementation and low cost. The procedure is carried out in situ where micrococcal nuclease tethered to protein A binds to an antibody of choice and cuts immediately adjacent DNA, releasing DNA-bound to the antibody target. Therefore, CUT&amp;RUN produces precise transcription factor or histone modification profiles while avoiding crosslinking and solubilization issues. Extremely low backgrounds make profiling possible with typically one-tenth of the sequencing depth required for ChIP-seq and permit profiling using low cell numbers (i.e., a few hundred cells) without losing quality. Publications: An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife. 2017 Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nature Protocols. 2018 Improved CUT&amp;RUN chromatin profiling tools. eLife. 2019 Protocols: CUT&amp;RUN: Targeted in situ genome-wide profiling with high efficiency for low cell numbers (Version 3) CUT&amp;RUN with Drosophila tissues (Version 1) 19.3.1.1 AutoCUT&amp;RUN CUT&amp;RUN has been automated using a Beckman Biomek FX liquid-handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. DNA end polishing and direct ligation of adapters permit sample-to-Illumina library processing of 96 samples in two days. AutoCUT&amp;RUN can be used for cell-type specific gene activity and enhancer profiling based on histone modifications and transcription factors, including in frozen tissue samples of tumor xenografts. Publication: Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigentics &amp; Chromatin. 2018 Protocol: AutoCUT&amp;RUN: genome-wide profiling of chromatin proteins in a 96 well format on a Biomek (Version 1) 19.3.2 CUT&amp;Tag Cleavage Under Targets and Tagmentation, CUT&amp;Tag for short, is an enzyme tethering approach to profiling chromatin proteins, including histone marks and RNA Pol II. CUT&amp;Tag generates sequence-ready libraries without the need for end polishing and adapter ligation. It uses a proteinA-Tn5 fusion to tether Tn5 transposase near the site of an antibody to a chromatin protein of interest. A secondary antibody, such as guinea pig anti-rabbit antibody, is used to increase the efficiency of tethering the pA-Tn5 to the target primary antibody. The pA-Tn5 complex is pre-loaded with sequencing adapters that insert into adjacent DNA upon activation with magnesium. CUT&amp;Tag has a very low background and can be performed in a single tube in as little as a day, though primary antibodies are typically incubated overnight. It can also be used with the ICELL8 nano dispensation system to profile single cells. A streamlined CUT&amp;Tag protocol was introduced by the Henikoff Lab that suppresses DNA accessibility artifacts to ensure high-fidelity mapping of the antibody-targeted protein and improves the signal-to-noise ratio over current chromatin profiling methods. Streamlined CUT&amp;Tag can be performed in a single PCR tube, from cells to amplified libraries, providing low-cost genome-wide chromatin maps. By simplifying library preparation, CUT&amp;Tag-direct requires less than a day at the bench, from live cells to sequencing-ready barcoded libraries. As a result of low background levels, barcoded and pooled CUT&amp;Tag libraries can be sequenced for as little as $25 per sample. This enables routine genome-wide profiling of chromatin proteins and modifications and requires no special skills or equipment. Publication: CUT&amp;Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications. 2019 Efficient low-cost chromatin profiling with CUT&amp;Tag. Nature Protocols. 2020 Scalable single-cell profiling of chromatin modifications with sciCUT&amp;Tag. Nature Protocols. 2023 Protocol: Bench top CUT&amp;Tag (Version 3) 3XFlag-pATn5 Protein Purification and MEDS-loading (5x scale, 2L volume, Version 1) CUT&amp;Tag with Drosophila tissues (Version 1) 19.3.2.1 AutoCUT&amp;Tag CUT&amp;Tag has been automated using a Beckman Coulter Biomek FX liquid handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. AutoCUT&amp;Tag can be used to profile the gene targets of fusions of the KMT2A lysine methyltransferase to other chromatin proteins, which characterize lymphoid, myeloid, and mixed lineage leukemias, uncovering heterogeneities that may underlie lineage plasticity. Publication: Automated CUT&amp;Tag profiling of chromatin heterogeneity in mixed-lineage leukemia. Nature Genetics. 2021 Simplified Epigenome Profiling Using Antibody-tethered Tagmentation Epigenomic analysis of formalin-fixed paraffin-embedded samples by CUT&amp;Tag Protocol: AutoCUT&amp;Tag: streamlined genome-wide profiling of chromatin proteins on a liquid handling robot (Version 1) 19.3.2.2 CUTAC Cleavage Under Targeted Accessible Chromatin, CUTAC, for short, is a simple modification of the Tn5 transposase-mediated antibody-directed CUT&amp;Tag method that provides high-quality accessibility mapping in parallel with mapping of specific components of the chromatin landscape. Findings imply that regulatory sites detected by hyperaccessibility mapping are coupled to the initiation of RNA Polymerase II transcription via H3K4 methylation. CUTAC requires few resources and is sufficiently simple that it can be performed from nuclei to purified sequencing-ready libraries in single PCR tubes on a home workbench. Publication: Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation. eLife. 2020 Protocol: CUT&amp;Tag-direct for whole cells with CUTAC (Version 4) 19.4 Differences between CUT&amp;RUN and CUT&amp;Tag CUT&amp;RUN is more suitable than CUT&amp;Tag for transcription factor (TF) profiling because the salt will compete with TF binding to DNA during the high salt incubation. TF depending on the motif affinity, only binds to a few DNA basepairs, and TF binding can be weak and compelled by salt. As demonstrated by Kaya-Okur et al. 2019, the CUT&amp;Tag signal of CTCF, one of the strongest binding factors, can be observed but become relatively weak. Therefore, it can be challenging for the peak caller to detect the enrichment of CTCF profiled by CUT&amp;Tag. Hence, it can also be hard to find the motif pattern practically. CUT&amp;Tag is more suitable for histone modification and RNA polymerase profiling as DNA wraps around the histone and RNA polymerase structure inserts and grabs the DNA. The DNA binding from both histone modification marks and PolII is strong. CUT&amp;Tag for histone modification also showed moderately higher signals compared to CUT&amp;RUN throughout the list of sites in Kaya-Okur et al. 2019. CUT&amp;RUN must be followed by DNA end polishing and adapter ligation to prepare sequencing libraries, which increases the time, cost, and effort of the overall procedure. Moreover, the release of MNase-cleaved fragments into the supernatant with CUT&amp;RUN is not well-suited for application to single-cell platforms. 19.5 Limitation of CUT&amp;RUN and CUT&amp;Tag Dependency on Antibody Quality: Similar to ChIP-seq, CUT&amp;RUN and CUT&amp;Tag’s success heavily relies on the quality and specificity of the antibodies used. High-quality, highly specific antibodies are essential for reliable results, and the lack of such antibodies can limit the application of this technique. Likelihood of Over-digestion of DNA: Due to inappropriate timing of the Magnesium-dependent Tn5 reaction with CUT&amp;RUN, DNA can be over-cut, a similar limitation exists for contemporary ChIP-Seq protocols where enzymatic or sonicated DNA shearing must be optimized. GC Bias: For CUT&amp;Tag, as with other techniques using Tn5, the library preparation has a strong GC bias and has poor sensitivity in low GC regions or genomes with high variance in GC content. Not Suitable for All Epitopes: CUT&amp;RUN and CUT&amp;Tag may not work efficiently for all protein-DNA interactions, especially if the epitope recognized by the antibody is obscured or altered in the chromatin context. However, companies are testing thoroughly therefore this issue is decreasing with time. Challenges in Detecting Low Abundance TFs: While CUT&amp;RUN and CUT&amp;Tag are more sensitive than ChIP-seq, they can still face challenges in detecting TFs present in very low abundance in the cell. 19.6 General Data Analysis Workflow CUT&amp;RUN and CUT&amp;Tag data analysis share a very similar strategy. Data analysis generally involves raw sequencing data alignment, quality control, normalization, peak calling, visualization, differential analysis, and other specific analyses for target scientific discoveries. A detailed data processing and analysis tutorial with reproducible codes and demo data can be found at CUT&amp;Tag Data Processing and Analysis Tutorial, 19.6.1 Adapter Trimming If the read length is long, adapter trimming may be needed for more accurate alignment results. However, for CUT&amp;RUN and CUT&amp;Tag, if the read length is short (i.e., 25bp per end), the aligner can use a “soft-match” style algorithm to handle the remaining adapter at the end of the read. Therefore, the adapter trimming is not necessary in that scenario. Cutadapt: Cutadapt finds and removes adapter sequences, primers, poly-A tails, and other types of unwanted sequences from your high-throughput sequencing reads. It can remove a wide range of adapter sequences and is not limited to Illumina-specific adapters. Users can specify multiple adapter sequences. Cutadapt supports quality trimming, though with less granularity than Trimmomatic. It can be used for both paired-end and single-end reads and allows for filtering based on length after trimming. For instance, with Illumina’s NextSeq 2000 machine and 50 base pairs paired-end reads, the adapters clipped by cutadapt 4.1 with parameters: -j 8 --nextseq-trim 20 -m 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -Z Trimmomatic: A flexible trimmer for Illumina Sequence Data. It trims low-quality bases from the start and end of the reads and scans the read with a sliding window to trim based on average quality. Trimmomatic can also remove Illumina-specific adapters with an option to specify custom adapter sequences. It is known for its high precision and flexibility. It can handle paired-end and single-end data. 19.6.2 Alignment Bowtie2: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100 characters to relatively large (e.g., mammalian) genomes. When aligning paired-end reads to the reference genome, filter and keep read pairs whose fragment lengths are between 10bp and 1000bp. Detailed recommended parameters can be found in the [tutorial]. The alignment of the 50 base pairs paired-end reads out of Illumina’s NextSeq 2000 machine by Bowtie2 version 2.4.4 to reference sequence with parameters: --very-sensitive-local --soft-clipped-unmapped-tlen --dovetail --no-mixed --no-discordant -q --phred33 -I 10 -X 1000 BWA: BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. 19.6.3 Quality control The quality of the aligned data can be evaluated from the following aspects: Sequencing depth: Check the number of reads mapped to the genome to see if it matches the expected sequencing depth. CUT&amp;RUN/CUT&amp;Tag data typically has very low backgrounds, so as few as 1 million mapped fragments can give robust profiles for a histone modification in the human genome. Alignment rate: Alignment frequencies are expected to be &gt;80% for high-quality data. Duplication rate: Duplication rate is the percentage of duplicated reads, and Picard is widely used to detect duplicates. PCR duplicates are read with the same start and end coordinates and are not biological duplicates. PCR duplicates are created during the library amplification. Generally, the duplication rate is expected to be &lt;20% for high-quality data. However, as long as the duplicates rate is lower than 80-90 %, meaning the sequencing is not completely saturated, duplicates should be kept for downstream analysis. Even for relatively high duplicated samples (e.g., 50% duplication rate), PCR duplicates tend to happen more at the signal part, and removing duplicates with favor towards the background noise. In other words, keeping the duplicates can help us locate the peak region. When the sequencing depth is not saturated, the duplicate rate is linearly correlated with the sequencing depth. Therefore, normalization that removes the sequencing depth variations across samples can take care of the duplicate rate simultaneously. Estimated library size: Estimated library size is the estimated number of unique molecules in the library based on PE duplication calculated by Picard. The estimated library sizes are proportional to the abundance of the targeted epitope and the quality of the antibody used, while the estimated library sizes of IgG samples are expected to be very low. Suppose users follow the sequencing depth tradition for the ChIP-seq data and sequence 100+ million reads but end up with only 1-2 million estimated library size. In that case, it is expected to have an ultra-high duplication rate. In that case, the sequencing depth is too high, and the sequencing is saturated. Duplicates are expected to be removed for downstream analysis. Fragment length distribution: CUT&amp;RUN and CUT&amp;Tag targeting at a histone modification predominantly result in nucleosomal fragments (~180 bp) or multiples of that length. Therefore, the fragment length density distribution usually has several peaks whose modes are 180bp apart, matching the nucleosomal length. CUT&amp;RUN/CUT&amp;Tag targeting transcription factors predominantly produce nucleosome-sized fragments and variable amounts of shorter fragments from neighboring nucleosomes and the factor-bound site, respectively. Moreover, tagmentation of DNA on the surface of nucleosomes also occurs, and plotting fragment length distribution with single-basepair resolution reveals a 10-bp sawtooth periodicity, which is typical of successful CUT&amp;Tag experiments. Such 10 bp periodic cleavage preferences match the 10 bp/turn periodicity of B-form DNA, which suggests that the DNA on either side of these bound TFs is spatially oriented such that tethered MNase has preferential access to one face of the DNA double helix. The presence of this 10 bp periodicity is a good indicator that the experiment has specifically targeted nucleosomal DNA or proteins in close association with it. If this pattern is absent, it might suggest non-specific binding or other technical issues. 19.6.4 Normalization 19.6.4.1 Spike-in Scaling E. coli DNA is carried along with bacterially-produced pA-Tn5 protein and gets tagmented non-specifically during the reaction. The fraction of total reads that map to the E.coli genome depends on the yield of epitope-targeted CUT&amp;Tag and roso depends on the number of cells used and the abundance of that epitope in chromatin. Since a constant amount of pATn5 is added to CUT&amp;Tag reactions and brings along a fixed amount of E. coli DNA, E. coli reads can be used to normalize epitope abundance across experiments. The underlying assumption is that the ratio of fragments mapped to the primary genome to the E. coli genome (or other added DNA sequences if pA-Tn5 is purified and E.coli is not available anymore) is the same for a series of samples, each using the same number of cells. Because of this assumption, we do not normalize between experiments or batches of pATn5, which can have very different amounts of carry-over E. coli DNA. Using a constant C to avoid small fractions in normalized data, we define a scaling factor S as \\(S = \\frac{C}{(Fragments Mapped To E.coli Genome)}\\) \\(Normalized coverage = (Primary Genome Coverage) * S\\) The scaling can be done using bedtools, genomecov function and parameter “-scale”. 19.6.4.2 Sequencing depth and coverage normalization Without a spike-in, normalization to eliminate the sequencing depth and coverage variations can be done by the following formula: Normalized Count = \\(\\frac{Raw Count}{Sum of Fragments Coverage} * Genome_Size\\) Sum of Fragments Coverage = sum of all fragment lengths. Namely, Sum_of_Fragments_Coverage includes both the sequencing depth and coverage information. Note that only fragments that are within 1bp~1000bp are considered. 19.6.5 Peak Calling 19.6.5.1 SEACR The Sparse Enrichment Analysis for CUT&amp;RUN, SEACR for short, is a R package designed to call peaks and enriched regions from chromatin profiling data with very low backgrounds (i.e., regions with no read coverage) that are typical for CUT&amp;Tag chromatin profiling experiments. SEACR requires bedGraph files from paired-end sequencing as input and defines peaks as contiguous blocks of basepair coverage that do not overlap with blocks of background signal delineated in the IgG control dataset. If IgG control is available, use the IgG sample as the “control sample” and choose the “norm stringent” setting. If IgG is unavailable, users can use the “top *% peaks” by only providing the target marker sample. Web server: Peak calling by Sparse Enrichment Analysis for CUT&amp;RUN (SEACR) Web Interface 19.6.5.2 MACS2 The Model-based Analysis of ChIP-Seq version 2, MACS2 for short, is widely used for identifying transcription factor binding sites and histone modification regions in ChIP-Seq data. MACS2 has been widely adapted to analyze the CUT&amp;RUN/CUT&amp;Tag data. Installation details can be found at https://github.com/taoliu/MACS/wiki. 19.6.5.3 SEACR vs MACS2 SEACR is better suited for datasets with broad signal enrichment, such as H3K27me3, where peaks are broader and can continuously cover a large genomic region. MACS2 excels in datasets with sharp peaks, such as H3K4me3, where peaks are concentrated and isolated from the background and adjacent peaks. SEACR uses a straightforward thresholding approach, which can be more intuitive but may miss some nuances in the data. MACS2 uses a more complex statistical model to identify peaks, offering potentially greater accuracy but at the cost of computational complexity. SEACR offers more flexibility in handling different types of CUT&amp;RUN/CUT&amp;Tag data, especially in the absence of control samples or the control samples are of low quality. MACS2 generally requires high-quality control samples for best performance and is less flexible in this regard. 19.6.5.4 FRagment proportion in Peaks regions (FRiPs) Fragment proportion in Peak Regions, FRiPs for short, is also a critical signal-to-noise measurement. Although sequencing depths for CUT&amp;Tag are typically only 1-5 million reads, the low background of the method usually results in high FRiP scores. In other words, it measures the percentage of sequencing resources accurately allocated to the target epitope regions. Note that the number of peaks and FRiPs typically increase with the sequencing depth and mappable fragment number, therefore comparisons should be done by downsampling samples to the same number of fragment. For example, the comparison across technologies in Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation Figure 5A: 19.6.6 Visualization Integrative Genomic Viewer: IGV visualizes the chromatin landscape in regions using a genome browser. It provides a web app version and a local desktop version that is easy to use. UCSC Genome Browser: UCSC Genome Browser provides the most comprehensive supplementary genome information. deepTools: deepTools is a suite of Python tools particularly developed for efficiently analyzing high-throughput sequencing data. It is particularly helpful to check chromatin features at a list of annotated sites. For example, we can use it to check the histone modification enrichment/absence signals around transcription starting sites or the peak center. We can use the “computeMatrix” and “plotHeatmap” functions from deepTools to generate the following heatmap. 19.6.7 Differential Analysis chromVAR - getCounts. The “getCounts” function in the chromVAR R package can convert an aligned bam file into a region by sample matrix, where the region can be genomic binning or peaks. The differential detection analysis can be performed on the region by sample matrix. DESeq2: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 DESeq2 estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. DESeq2 can also be utilized to detect the differentially enriched region using the region by sample matrix from the CUT&amp;RUN/CUT&amp;Tag data. Limma: limma powers differential expression analyses for RNA-sequencing and microarray studies Limma is an R package for analyzing gene expression microarray data, especially using linear models for analyzing designed experiments and assessing differential expression. Limma provides the ability to analyze comparisons between many RNA targets simultaneously in arbitrary, complicated designed experiments. Empirical Bayesian methods are used to provide stable results even when the number of arrays is small. Limma can be extended to study differential fragment enrichment analysis within peak regions. Notably, limma can deal with both the fixed effect model and random effect model. edgeR: Differential Expression Analysis of Multifactor RNA-Seq Experiments With Respect to Biological Variation Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodologies based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, and quasi-likelihood tests. As well as RNA-seq, it is applied to the differential signal analysis of other types of genomic data that produce read counts, including CUT&amp;RUN/CUT&amp;Tag, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, and CAGE. edgeR can deal with multifactor problems. 19.7 More resources about CUT&amp;RUN and CUT&amp;Tag data analysis CUT&amp;RUNTools: a flexible pipeline for CUT&amp;RUN processing and footprint analysis. CUT&amp;RUNTools is a flexible and general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&amp;RUN primary cleavage data. CUT&amp;RUNTools extracts endonuclease cut site information from sequences of short-read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&amp;RUN. CUT&amp;RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&amp;RUN and CUT&amp;Tag data analysis. CUT&amp;RUNTools 2.0 is a major update of CUT&amp;RUNTools, including a set of new features specially designed for CUT&amp;RUN and CUT&amp;Tag experiments. Both the bulk and single-cell data can be processed, analyzed, and interpreted using CUT&amp;RUNTools 2.0. Nextflow Analysis Pipeline for CUT&amp;RUN and CUT&amp;TAG Experiments: nf-core/cutandrun is a best-practice bioinformatic analysis pipeline for CUT&amp;RUN, CUT&amp;Tag, and TIPseq experimental protocols that were developed to study protein-DNA interactions and epigenomic profiling. GoPeaks: histone modification peak calling for CUT&amp;Tag. GoPeaks is a peak caller designed for CUT&amp;TAG/CUT&amp;RUN sequencing data. GoPeaks, by default, works best with narrow peaks such as H3K4me3 and transcription factors. However, broad epigenetic marks like H3K27Ac/H3K4me1 require different step, slide, and minwidth parameters. "],["dna-methylation-sequencing.html", "Chapter 20 DNA Methylation Sequencing 20.1 Learning Objectives 20.2 What are the goals of analyzing DNA methylation? 20.3 Methylation data considerations 20.4 Methylation data workflow 20.5 Methylation Tools Pros and Cons 20.6 More resources", " Chapter 20 DNA Methylation Sequencing This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 20.1 Learning Objectives 20.2 What are the goals of analyzing DNA methylation? To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite (BS) conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively. For a given sample, you will obtain a fraction, known as the Beta value, that indicates the relative abundance of the methylated and unmethylated versions of the sequence. Beta values exist then on a scale of 0 to 1 where 0 indicates none of this particular base is methylated in the sample and 1 indicates all are methylated. Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics. Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite sequencing (OxBS) [Booth et al. (2013). oxidative bisulfite conversion measures both 5mC and 5hmC. If you want to identify 5hmC bases you either have to pair oxBS data with BS data OR you have to use Tet-assisted bisulfite (TAB) sequencing which will exclusively tag 5hmC bases (Yu et al. 2012). 20.3 Methylation data considerations 20.3.1 Beta values binomially distributed Because beta values are a ratio, by their nature, they are not normally distributed data and should be treated appropriately. This means data models (like those used by the limma package) built for RNA-seq data should not be used on methylation data. More accurately, Beta values follow a binomial distribution. This generally involves applying a generalized linear model. 20.3.2 Measuring 5mC and/or 5hmC If your data and questions are interested in both 5mC and 5hmC, you will have separate sequencing datasets for each sample for both the BS and OBS processed samples. 5mC is often a step toward 5hmC conversion and therefore the 5mC and 5hmC measurements are, by nature, not independent from each other. In theory, 5mC, 5hmC and unmethylated cytosines should add up to 1. Because of this, its been proposed that the most appropriate way to model these data is to combine them together in a model (Kochmanski, Savonen, and Bernstein 2019). 20.4 Methylation data workflow Like other sequencing methods, you will first need to start by quality control checks. Next, you will also need to align your sequences to the genome. Then, using the base calls, you will need to make methylation calls – which are methylated and which are not. This details of step depends on whether you are measuring 5mC and/or 5hmC methylation calls. Lastly, you will likely want to use your methylation calls as a whole to identify differentially methylated regions of interest. 20.5 Methylation Tools Pros and Cons This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. 20.5.1 Quality control: FastQC: A popular tool for evaluating the quality of sequencing reads, generating various quality control plots and statistics. It is fast, easy to use and has a simple user interface (Andrews, n.d.). Pros: Fast and easy to use. Very commonly used. Provides various quality control metrics and plots. Can generate reports that can be easily shared with collaborators Cons: Does not perform any trimming or filtering of low-quality reads Not specifically designed for bisulfite sequencing data Trim Galore!: A wrapper tool for Cutadapt and FastQC that provides a simple way to trim adapters and low-quality reads. It also has built-in support for bisulfite sequencing data (Krueger and Andrews, n.d.). Pros: Easy to use, with a simple command line interface. Automatically trims adapters and low-quality reads. Specifically designed for bisulfite sequencing data Cons: Limited flexibility in terms of the trimming and filtering options. Does not provide quality control metrics or plots 20.5.2 Analysis: Bismark: A widely used tool for aligning bisulfite sequencing reads to a reference genome. It allows for paired-end and single-end reads, provides many options for handling sequencing errors and can output methylation calls in various formats (Liu et al. 2019). Pros: Performs alignment, quantification and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters Cons:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome Bowtie2: A fast and efficient aligner that can be used for bisulfite sequencing data, and can align reads to bisulfite-converted genomes or to an unconverted genome with a pre-built bisulfite index (Langmead and Salzberg 2012). Pros: Very fast and efficient, making it suitable for large datasets. Can align reads to either a bisulfite-converted genome or to an unconverted genome with a pre-built bisulfite index. Provides options for handling sequencing errors and optimizing alignment parameters Cons: Does not perform methylation calling or quantification 20.5.3 Methylation calling: Bismark: As well as performing alignment, Bismark can also be used to call methylation from aligned reads. It reports the percentage of cytosines methylated at each site (Liu et al. 2019). Pros: Performs both alignment and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters Cons:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome MethylDackel: A fast and efficient tool for methylation calling from bisulfite sequencing data. It can output methylation calls in various formats, including a methylation bedGraph. Pros: Very fast and efficient, making it suitable for large datasets. Provides options for handling sequencing errors and optimizing methylation calling parameters. Can output methylation calls in various formats, including a methylation bedGraph Cons:Does not perform alignment or methylation quantification 20.5.4 Methylation quantification: MethylKit: A popular tool for quantifying methylation levels from bisulfite sequencing data. It can handle various types of data and provides options for filtering out low-quality data and detecting differentially methylated regions (Akalin et al. 2012). Pros: Provides various options for filtering out low-quality data and detecting differentially methylated regions. Can handle various types of data, including bisulfite sequencing and reduced representation bisulfite sequencing. Provides many visualization tools for analyzing methylation data Cons: Can be computationally intensive for large datasets. Requires some knowledge of R programming language to use effectively Bismark: As well as methylation calling, Bismark can also quantify methylation levels at each cytosine site. It reports the number of methylated and unmethylated reads, as well as the percentage of methylation (Liu et al. 2019). 20.5.5 Analysis: DSS: A popular tool for identifying differentially methylated regions (DMRs) between groups of samples. It uses a statistical model to detect significant changes in methylation levels and reports DMRs with associated p-values (Feng and Conneely 2016). Pros: Uses a statistical model to identify differentially methylated regions between groups of samples. Provides various options for controlling false discovery rate and adjusting for multiple comparisons. Suitable for large datasets. Cons: Requires some knowledge of statistical methods and programming language to use effectively. May not be suitable for smaller datasets or datasets with low coverage. MethylKit: As well as methylation quantification, MethylKit can also be used for downstream analysis, such as clustering samples based on methylation patterns and performing functional annotation of differentially methylated regions (Akalin et al. 2012). 20.6 More resources DNA methylation analysis with Galaxy tutorial The mint pipeline for analyzing methylation and hydroxymethylation data. Book chapter about finding methylation regions of interest References "],["microbiome-sequencing.html", "Chapter 21 Microbiome Sequencing 21.1 Learning Objectives 21.2 Goals of Amplicon analysis 21.3 Microbiome Analysis with QIIME 2", " Chapter 21 Microbiome Sequencing This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 21.1 Learning Objectives ## A Brief Introduction to Microbiomes Microbes are everywhere. We have found these tiny organisms in the deepest regions of the ocean and in the upper atmosphere. We have found them in: + water that has been solid ice for millennia in the Antarctic + boiling water in the geysers of Yellowstone National Park. + the driest natural environments on Earth, including the Atacama Desert in Chile, where desiccation resistant microbes hide in the soil sometimes waiting ten years for the drop of rain that will jump start their metabolism long enough for them to reproduce before they return to dormancy. + perpetually damp environments, like the intestinal tract of the human body where they are constantly the subject of inspection by our diligent immune cells, and where they impact our health in positive and negative ways that we are only beginning to understand. + our nuclear reactors, prompting questions about whether we could harness them as tiny machines to help us remediate environmental disasters of the past, present, and future. If we looked hard enough, I think we’d find them on the surface of the moon and Mars, though they are probably microbes who stowed away on our spacecraft and are now patiently waiting for a drop of water that may or may not ever show up. If we ever colonize those worlds, microbes will be an indispensable ally in creating an environment that could sustain us. This figure is adapted from (Tignat-Perrier et al. 2022) under Creative Commons license. Microbes almost never live alone in the real world (i.e., outside of a laboratory). Rather they exist in communities of different species who are interacting with each other and their environment. Some of these communities will have many different types of organisms, and some will have only a few. Because of the large number of species and individuals involved, no two communities will ever be exactly alike, and quantifying differences between microbial communities is an important area of research at the moment. The types of interactions between organisms are also highly varied. These can include mutualistic relationships, where both organisms benefit from the interaction; parasitic relationships, where one organism exclusively benefits to the detriment of the other; and the full gradient in between. Microbiome science is everywhere. There are tens of articles published daily in the scientific literature, and many popular science articles and books present these findings to the world of non-scientists. Understanding the promises and limitations of the methods of microbiome science can help avoid misconceptions about microbiome research, and it’s important for practitioners of microbiome science to understand and convey the promise and limitations of our field. Misconceptions abound, frequently arising from the same sources as high-quality popular science microbiome reporting. For example, on 5 Feb 2015 an article appeared in the New York Times noting (almost offhand) that Yersinia pestis, the organism responsible for Bubonic plague, had been found in multiple locations throughout the New York City subway system as part of its normal built environment microbiome. This was rapidly followed up on 6 Feb 2015 with an article noting that there was probably not Bubonic plague on the subway system after all, but rather that the approaches used by the research team are limited in their taxonomic resolution, and that likely a harmless close relative of Y. pestis was observed: “What the researchers probably found, [a spokesman for the university where the study originated] said, was bacteria from an unknown species or from organisms that happened to share some gene sequences with the plague bacterium…”. As microbiome services and products are increasingly marketed directly to the public, consumers of microbiome research findings, products, and services need to know how to critically evaluate these offerings and their associated claims. As practitioners in the field, we can help by ensuring that the methods we apply are appropriate and reliable, and that we make our work accessible. 21.2 Goals of Amplicon analysis The technologies that are enabling work in microbiome science are the same that are driving the data revolution in biology. Primarily this work is driven by high-throughput DNA sequencing, which is applied for profiling microbial community composition: marker gene profiling (such as 16S or ITS sequencing) functional potential (such as shotgun metagenomic sequencing) functional activity (such as metatranscriptome sequencing) Other “omics” technologies are now playing an increasing role in microbiome research, such as: mass-spectrometry-based metabolomics, which provides profiles of small molecule metabolites in an environment. metaproteomics which provides more detailed descriptions of functional activities of microbes (and their hosts, if applicable). As a result, bioinformatics software tools are essential to microbiome research. For many microbiome researchers, bioinformatics is an intimidating and challenging aspect of their projects. 21.3 Microbiome Analysis with QIIME 2 QIIME 2 is an all in one bioinformatics microbiome analysis platform. This platform allows for users to go from sequenced microbiome data to publication ready visualizations. The original QIIME, now referred to as QIIME 1, was published in 2010 (Caporaso et al. 2010) and has been cited tens of thousands of times in the primary literature. QIIME 2, which was published in July of 2019 (Bolyen et al. 2019), succeeded QIIME 1 on 1 January 2018. QIIME 2 is better than QIIME 1 in all ways, and QIIME 1 is no longer actively supported. If you have previously used QIIME 1, you should invest time in learning and switching to QIIME 2. If you’re new to QIIME, start with QIIME 2. (When I refer to QIIME in this book, without specifying whether I’m referring to QIIME 1 or QIIME 2, I’m referring to the platform generally.) QIIME 2 has large and growing user and developer communities, and these communities make QIIME 2 possible. The epicenter of the community is the QIIME 2 Forum. The forum is primarily known as a place where users can get technical support with QIIME 2 for no charge. Developers of QIIME 2 moderate the forum, and typically respond to technical support questions within a couple of business days. The forum is also a great place to discuss general topics in microbiome bioinformatics, or microbiome research methods generally. There are many active discussions on these topics on the forum. Keeping up with the discussions on the forum is a great way to learn about current topics in microbiome research methods. There’s also a free job board on the forum - you can use the forum to find jobs, or post your own job ads there to find employees who are well-versed in QIIME 2 and other bioinformatics tools. If you’re not already a member of the QIIME 2 Forum, you should consider joining. It’s a great way for you to get help, and as you develop your QIIME 2 skills helping others on the forum is a great way to reenforce your learning and to get involved in the community. Here is a high-level introduction to microbiome analysis using QIIME 2. This introduction will go over common methods, metrics and approaches used for microbiome science. So grab a cup of your favorite hot beverage and let’s get started! ☕ References "],["itcr--omic-tool-glossary.html", "Chapter 22 ITCR -omic Tool Glossary 22.1 ARCHS4 22.2 Bioconductor 22.3 Cancer Models 22.4 CIViC 22.5 CTAT 22.6 DeepPhe 22.7 Genetic Cancer Risk Detector (GARDE) 22.8 GenePattern 22.9 Gene Set Enrichment Analysis (GSEA) 22.10 Integrative Genomics Viewer (IGV) 22.11 NDEx 22.12 MultiAssayExperiment 22.13 OpenCRAVAT 22.14 pVACtools 22.15 TumorDecon 22.16 WebMeV 22.17 Xena", " Chapter 22 ITCR -omic Tool Glossary Here’s all the tools that have been mentioned in this course or are otherwise recommended for your use. The list is in alphabetical order. ARCHS4 Bioconductor Notable Bioconductor genomics tools: Cancer Models CIViC CTAT DeepPhe Genetic Cancer Risk Detector (GARDE) GenePattern Gene Set Enrichment Analysis (GSEA) Integrative Genomics Viewer (IGV) NDEx MultiAssayExperiment OpenCRAVAT pVACtools TumorDecon WebMeV Xena 22.1 ARCHS4 All RNA-seq and ChIP-seq sample and signature search (ARCHS4) (https://maayanlab.cloud/archs4/) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. The ARCHS4 website provides the uniformly processed data for download and programmatic access in H5 format, and as a 3-dimensional interactive viewer and search engine. Users can search and browse the data by metadata enhanced annotations, and can submit their own gene sets for search. Subsets of selected samples can be downloaded as a tab delimited text file that is ready for loading into the R programming environment. To generate the ARCHS4 resource, the kallisto aligner is applied in an efficient parallelized cloud infrastructure. Human and mouse samples are aligned against the most recent Ensembl annotation (Ensembl 107). 22.2 Bioconductor The mission of the Bioconductor project is to develop, support, and disseminate free open source software that facilitates rigorous and reproducible analysis of data from current and emerging biological assays. We are dedicated to building a diverse, collaborative, and welcoming community of developers and data scientists. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community. Bioconductor is also available as Docker images. 22.2.1 Notable Bioconductor genomics tools: annotatr ensembldb GenomicRanges - useful for manipulating and identifying sequences. GO.db - Gene ontology annotation org.Hs.eg.db RSamtools A full list of Bioconductors annotation packages - contains annotation for all kinds of species and versions of genomes and transcriptomes. ComplexHeatmap MultiAssayExperiment limma DESEq2 edgeR curatedTCGAData cBioPortalData SingleCellMultiModal 22.3 Cancer Models Patient Derived Cancer Models Finder (www.cancermodels.org) is a cancer research platform that aggregates clinical, genomic and functional data from patient-derived xenografts, organoids and cell lines. The PDCM Finder standardises, harmonises and integrates the complex and diverse data associated with PDCMs for cancer community. Data types used are model meta data, related clinical metadata from the sample for which the model was derived, e.g. molecular and treatment-based. Data are preprocessed, consistently semantically annotated, harmonised and FAIR. PDCM Finder contains &gt;6200 models across 13 cancer types, including rare pediatric models (17%) and models from minority ethnic backgrounds (33%), making it the largest free to consumer and open access resource of this kind. Get started at www.cancermodels.org to browse and query models by cancer type 22.4 CIViC CIViC is a knowledgebase and curation interface for the clinical interpretation of variants in cancer. Evidence is curated from published literature describing the diagnostic, prognostic, predictive, predisposing, oncogenic, or functional role of variants in specific cancer types. Evidence submitted by community curators is revised and moderated by expert editors. Individual evidence is synthesized into gene summaries, variant summaries and variant-disease assertions of specific clinical relevance. Anyone can make use of CIViC knowledge through the open web interface or API. Information on how to use or contribute to CIViC is available in our help docs (docs.civicdb.org). The main distinguishing feature of CIViC compared to similar resources it is total commitment to open data sharing. All data are available in the Public Domain (CC0). The code is available for any use under an MIT license. 22.5 CTAT The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) provides a diverse collection of tools to gain insights into the biology of cancer through the lens of the transcriptome. Using RNA-seq as input, CTAT modules enable detection of mutations, fusion transcripts, copy number aberrations, cancer-specific splicing aberrations, and oncogenic viruses including insertions into the human genome. CTAT uses both read mapping and de novo assembly methods to analyze RNA-seq, leveraging tumor bulk and single cell transcriptomes. CTAT modules provide interactive visualizations as outputs, are easily installed for local execution or run via cloud computing (eg. Terra), have detailed user guides and tutorials, and are well-supported through user forums. 22.6 DeepPhe DeepPhe: Natural Language Processing Tools for Cancer Research Under development since 2014, the DeepPhe suite of software tools aims to extract deep phenotype information from the Electronic Medical Records from patients with cancer. DeepPhe combines: multiple natural language processing (NLP) techniques based on cTAKES,1 a structured cancer information model including concepts from the NCIT and the HemOnc ontology a graph data model supporting persistence of extracted details including links between patient data enabling semantically informed interpretation, aggregation, and disaggregation of key attributes, visual analytics tools supporting patient- and cohort-level displays of extracted data5 including identification of patients matching key research criteria and the examination of individual patient records such as exploration of links between summary items and supporting text mentions, and multiple strategies for use, including containerized REST services and GUIs for installation and pipeline execution. DeepPhe tools are available for download and installation from the DeepPhe website under an open-source license for non-commercial use. 22.7 Genetic Cancer Risk Detector (GARDE) Genetic Cancer Risk Detector (GARDE) screens and identifies patients who meet National Comprehensive Cancer Network (NCCN) criteria for genetic evaluation of familial cancer risk based on their family history in the EHR using both structured data and natural language processing of free-text data. Patients identified by GARDE are imported into an EHR’s population health management dashboard (e.g., Epic’s Healthy Planet module) where genetic counseling staff review individual cases, select, and send bulk outreach messages to patients via chatbot and/or through the patient portal. GARDE is a population clinical decision support (CDS) platform based on Fast Healthcare Interoperability Resources (FHIR) and CDS Hooks standards to support interoperability and logic sharing beyond single vendor solutions. 22.8 GenePattern GenePattern, www.genepattern.org, is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. Analyses include general machine learning methods, the gene set enrichment analysis suite, ’omics-specific tools for bulk and single-cell gene expression, proteomics, flow cytometry, variant annotation, sequence variation and others, as well as cancer-specific analyses. Also included are data preprocessing and utility tools. A web-based interface provides easy, non-programmatic access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research. The GenePattern Notebook interface, notebook.genepattern.org, extends the Jupyter Notebook system to allow users to combine GenePattern analyses with text, graphics, and code to create complete research narratives. It includes many additional features to make notebooks accessible to non-programmers. The online GenePattern Notebook Workspace allows investigators to create, run, and collaborate on notebooks using only a web browser. A library of GenePattern Notebooks implementing common scientific workflows is available for investigators to use as templates and adapt to their own requirements. To get started with GenePattern you can go through the GenePattern Quick Start Tutorial, view the GenePattern User Guide, or the videos on our YouTube channel. To learn more about GenePattern Notebook, view the GenePattern Notebook Quick Start, GenePattern Notebook documentation, run through the tutorial notebooks (click the Tutorial button), or view the videos on the GenePattern Notebooks YouTube channel. 22.9 Gene Set Enrichment Analysis (GSEA) Gene Set Enrichment Analysis (GSEA) is a method to identify the coordinate activation or repression of groups of genes that share common biological functions, pathways, chromosomal locations, or regulation, thereby distinguishing even subtle differences between phenotypes or cellular states. Gene set-based enrichment analysis is now standard practice for interpreting global transcription profiling experiments and elucidating the biological mechanisms associated with disease and other biological phenotypes of interest. The method is more powerful than typical single-gene approaches to comparing phenotypes, as it can identify sets of genes (e.g., perturbation signatures or molecular pathways) that are coordinately up- or downregulated when each gene in the set may not be significantly differentially expressed. The GSEA software provides useful visualizations and reports for the exploration and interpretation of results. GSEA bundles direct access to the Molecular Signatures Database (MSigDB) – a comprehensive curated repository of annotated gene sets representing signatures derived from publications, pathway databases, and other sources of public data; MSigDB can also be used independently. The website for the GSEA-MSigDB resource can be found at gsea-msigdb.org. To get started with GSEA you can view the GSEA User Guide, and access the GSEA software through the downloads page or through the GSEA modules available on GenePattern. See the MSigDB section of the website for more information about MSigDB and to interactively explore the gene sets and their annotations. User support for GSEA and MSigDB is available through our help forum. 22.10 Integrative Genomics Viewer (IGV) The Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. IGV supports all the standard genomic data types (aligned reads, variants, signal peaks, genome annotations, copy number variation, etc.) as well as sample information, such as clinical, phenotypic, or other attributes. IGV provides great flexibility in loading data, whether investigator generated or publicly available, directly from multiple disparate sources without the need for any pre-processing. Supported data sources include local file systems; web servers on the user’s intranet or the Internet; commercial cloud providers (Google, Amazon, Azure, Dropbox); web links to data in public repositories. Authentication to access private data on the web is supported with the industry standard OAuth protocol. IGV is available in multiple forms, including both end-user applications and versions for use by developers. The IGV website at https://igv.org provides access to all modalities of IGV. Download and install the IGV Desktop application from the downloads page. To learn about using the application see the tutorial videos on the IGV YouTube channel and the online User Guide. The IGV-Web app is available at https://igv.org/app. To learn about using the app, the Help link in the menu bar provides access to the documentation, and see also the tutorial videos on the YouTube channel. The igv.js JavaScript component is for web developers who wish to embed IGV in their web apps or portals. More information can be found in the Readme file and the Wiki in the igv.js GitHub repository. IGV user support is available through the igv-help online forum and the GitHub repositories. 22.11 NDEx The Network Data Exchange (NDEx) project provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge. A distinctive feature of NDEx is that it serves as a home for models that are currently available only as figures, tables, or supplementary information, such as networks produced via systematic mining and integration of large-scale molecular data. NDEx includes features to support data distribution and access according to FAIR principles. Its full integration with Cytoscape, the popular desktop application for network analysis and visualization, provides the cloud back-end component for data I/O; so, if a network file format can be opened in Cytoscape, it can also be stored in (and retrieved from) NDEx. NDEx can be accessed via its web user interface or programmatically, via REST API and client libraries in Python, R, Java. Web applications can interface with NDEx via JavaScript: MSigDB, CRAVAT, cBioPortal and IQuery, are all examples of web applications integrated with NDEx. For more information, please review the About NDEx page. To get started, visit the NDEx public server: there, you can review the NDEx FAQ, access documentation, contact us, and search or browse thousands of biological network models. 22.12 MultiAssayExperiment MultiAssayExperiment is an R/Bioconductor package that harmonizes data management, manipulation, and subsetting of multiple experimental assays performed on an overlapping set of specimens. It supports on-disk and remote data storage, and provides reshaping tools for adaptability to arbitrary downstream analysis. MultiAssayExperiment is distinct from alternative approaches in its focus on multi’omic data management and manipulation and in its integration with the Bioconductor ecosystem: it is used by more than 50 other Bioconductor packages, it provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment while supporting an open-ended mix of data classes for individual assays, and it allows subsetting by genomic ranges, row names, phenotypic data, and assays. You can get started with the MultiAssayExperiment Bioconductor package documentation, or start with prebuilt MultiAssayExperiments objects from curatedTCGAData, cBioPortalData, or SingleCellMultiModal. 22.13 OpenCRAVAT OpenCRAVAT uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations. To get started go to opencravat.org. Download and run on your local machine, multi-user servers, at https://run.opencravat.org or in the cloud. We offer a broader selection of annotation tools than comparable software and results can be explored with an interactive GUI that provides customized filtering options, interactive tables and widgets. Use it for a single sample or a large cohort, or pull single variant reports with a structured url (Example: https://run.opencravat.org/webapps/variantreport/index.html?chrom=chr11&amp;pos=48123823&amp;ref_base=A&amp;alt_base=C ) 22.14 pVACtools Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines. We have built a computational framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant allele expression, peptide binding affinities, and determination whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows clinical users to efficiently generate, review, and interpret results, selecting candidate peptides for individual patient vaccine designs. Additional modules support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq and pVACfuse), prioritization, and selection using a graphical Web-based interface (pVACview), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide vaccines. pVACtools is available at http://www.pvactools.org. 22.15 TumorDecon It is only software that includes these four digital cytometry methods in one platform, so that users can compare the results of these methods. It is the only software that includes a method for creating signature matrix from single cell gene expression data. TumorDecon software includes four deconvolution methods (DeconRNAseq [Gong2013], CIBERSORT [Newman2015], ssGSEA [Şenbabaoğlu2016], Singscore [Foroutan2018]) and several signature matrices of various cell types, including LM22. The input of this software is the gene expression profile of the tumor, and the output is the relative number of each cell type and several visualization plots. Users have an option to choose any of the implemented deconvolution methods and included signature matrices or import their own signature matrix to get the results. Additionally, TumorDecon can be used to generate customized signature matrices from single-cell RNA-sequence profiles. In addition to the 3 tutorials provided on GitHub (tutorial.py, sig_matrix_tutorial.py, &amp; full_tutorial.py) there is a User Manual available at: https://people.math.umass.edu/~aronow/TumorDecon TumorDecon is available on Github (https://github.com/ShahriyariLab/TumorDecon) and PyPI (https://pypi.org/project/TumorDecon/). For more info please see: Rachel A. Aronow, Shaya Akbarinejad, Trang Le, Sumeyye Su, Leili Shahriyari, TumorDecon: A digital cytometry software, SoftwareX, Volume 18, 2022, 101072, https://doi.org/10.1016/j.softx.2022.101072. 22.16 WebMeV WebMeV is an online tool that facilitates analysis of large-scale RNA-seq and other multi-omic datasets by providing intuitive access to advanced analytical methods and high-performance computing for a wide range of basic, clinical, and translational researchers. Although WebMeV provides support for “bulk” RNA-seq data, single-cell RNA-seq, and other types of -omic data and provides easy access to public data resources such as The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx)—as well as user-provided data. WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses that are increasingly required for genomic data analysis. WebMeV’s design places an emphasis on user-driven data analysis by providing users the ability to visualize, interact with, and dissect genomic data at each step in the analysis with a “point-and-click” interactive data environment. Although the primary input is normalized “count matrices,” WebMeV does include tools for data normalization and quality control and uses Dropbox and Google Drive as means of easily uploading data. Analytical methods include statistical tests for comparing cohorts, for identifying gene seats, for doing functional enrichment analysis on gene sets (GSEA), and for inferring gene regulatory network models and comparing these networks between phenotypes to understand the drivers of disease. WebMeV also provides a platform to support reproducible research and makes code for the entire system and its component methods available as open-source software code. 22.17 Xena UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. Xena showcases seminal cancer genomics datasets from TCGA, the Pan-Cancer Atlas, GDC, PCAWG, ICGC, and more; a total of more than 1500 datasets across 50 cancer types. We support virtually any type of functional genomics data (sometimes known as level 3 or 4 data). This includes SNPs, INDELs, copy number variation, gene expression, ATAC-seq, DNA methylation, exon-, transcript-, miRNA-, lncRNA-expression and structural variants. We also support clinical data such as phenotype information, subtype classifications and biomarkers. All of our data is available for download via python or R APIs, or through our URL links. 22.17.1 Questions Xena can help you answer include: Is overexpression of this gene associated with better survival? What genes are differentially expressed between these two groups of samples? What is the relationship between mutation, copy number, expression, etc for this gene? Our tool differentiates itself by its ability to visualize more uncommon data types, such as DNA methylation, its visual integration of multiple types of genomic data side-by-side, and its ability to easily privately visualize your own data. Get started with our tutorials: https://ucsc-xena.gitbook.io/project/tutorials. If you use us please cite us: https://www.nature.com/articles/s41587-020-0546-8 "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) Candace Savonen Lecturer(s) Candace Savonen Content Contributor(s) Cailin Jordan - sc-ATAC-Seq Carrie Wright Claire Mills - Whole Genome Sequencing Jacob Greene - ChIP-seq Kate Isaac - Goals of DNA Methods Oscar Ospina - Spatial transcriptomics Ye Zheng - CUTRUN/CUTTag Content Directors Jeff Leek Content Consultants Carrie Wright Cliff Meyer - ATAC-seq Frederick Tan Content Editors/Reviewers Kate Isaac Acknowledgments Technical Course Publishing Engineer Candace Savonen Template Publishing Engineers Candace Savonen, Carrie Wright Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Funding Funder National Cancer Institute (NCI) UE5 CA254170 Funding Staff Sandy Ormbrek, Shasta Nicholson   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os Ubuntu 20.04.5 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-05-23 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## askpass 1.1 2019-01-13 [1] RSPM (R 4.0.3) ## assertthat 0.2.1 2019-03-21 [1] RSPM (R 4.0.5) ## bookdown 0.24 2024-03-13 [1] Github (rstudio/bookdown@88bc4ea) ## bslib 0.6.1 2023-11-28 [1] CRAN (R 4.0.2) ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.0.2) ## callr 3.5.0 2020-10-08 [1] RSPM (R 4.0.2) ## cli 3.6.2 2023-12-11 [1] CRAN (R 4.0.2) ## crayon 1.3.4 2017-09-16 [1] RSPM (R 4.0.0) ## desc 1.2.0 2018-05-01 [1] RSPM (R 4.0.3) ## devtools 2.3.2 2020-09-18 [1] RSPM (R 4.0.3) ## digest 0.6.25 2020-02-23 [1] RSPM (R 4.0.0) ## ellipsis 0.3.1 2020-05-15 [1] RSPM (R 4.0.3) ## evaluate 0.23 2023-11-01 [1] CRAN (R 4.0.2) ## fansi 0.4.1 2020-01-08 [1] RSPM (R 4.0.0) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.0.2) ## fs 1.5.0 2020-07-31 [1] RSPM (R 4.0.3) ## glue 1.4.2 2020-08-27 [1] RSPM (R 4.0.5) ## hms 0.5.3 2020-01-08 [1] RSPM (R 4.0.0) ## htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20 [1] RSPM (R 4.0.3) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] RSPM (R 4.0.2) ## knitr 1.33 2024-03-13 [1] Github (yihui/knitr@a1052d1) ## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.0.2) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.0.2) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.0.2) ## openssl 1.4.3 2020-09-18 [1] RSPM (R 4.0.3) ## ottrpal 1.2.1 2024-03-13 [1] Github (jhudsl/ottrpal@48e8c44) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13 [1] RSPM (R 4.0.2) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.0.3) ## pkgload 1.1.0 2020-05-29 [1] RSPM (R 4.0.3) ## prettyunits 1.1.1 2020-01-24 [1] RSPM (R 4.0.3) ## processx 3.4.4 2020-09-03 [1] RSPM (R 4.0.2) ## ps 1.4.0 2020-10-07 [1] RSPM (R 4.0.2) ## R6 2.4.1 2019-11-12 [1] RSPM (R 4.0.0) ## readr 1.4.0 2020-10-05 [1] RSPM (R 4.0.2) ## remotes 2.2.0 2020-07-21 [1] RSPM (R 4.0.3) ## rlang 1.1.3 2024-01-10 [1] CRAN (R 4.0.2) ## rmarkdown 2.10 2024-03-13 [1] Github (rstudio/rmarkdown@02d3c25) ## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.0.2) ## sass 0.4.8 2023-12-06 [1] CRAN (R 4.0.2) ## sessioninfo 1.1.1 2018-11-05 [1] RSPM (R 4.0.3) ## stringi 1.5.3 2020-09-09 [1] RSPM (R 4.0.3) ## stringr 1.4.0 2019-02-10 [1] RSPM (R 4.0.3) ## testthat 3.0.1 2024-03-13 [1] Github (R-lib/testthat@e99155a) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.0.2) ## usethis 1.6.3 2020-09-17 [1] RSPM (R 4.0.2) ## utf8 1.1.4 2018-05-24 [1] RSPM (R 4.0.3) ## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.0.2) ## withr 2.3.0 2020-09-22 [1] RSPM (R 4.0.2) ## xfun 0.26 2024-03-13 [1] Github (yihui/xfun@74c2a66) ## xml2 1.3.2 2020-04-23 [1] RSPM (R 4.0.3) ## yaml 2.2.1 2020-02-01 [1] RSPM (R 4.0.3) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
+[["index.html", "Choosing Genomics Tools About this Course 0.1 Available course formats", " Choosing Genomics Tools December, 2024 About this Course This course is part of a series of courses for the Informatics Technology for Cancer Research (ITCR) called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. This initiative is funded by the following grant: National Cancer Institute (NCI) UE5 CA254170. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at www.itcrtraining.org for more information. 0.1 Available course formats This course is available in multiple formats which allows you to take it in the way that best suites your needs. You can take it for certificate which can be for free or fee. The material for this course can be viewed without login requirement on this Bookdown website. This format might be most appropriate for you if you rely on screen-reader technology. This course can be taken on Coursera for certification here (but it is not available for free on Coursera). Our courses are open source, you can find the source material for this course on GitHub. "],["introduction.html", "Chapter 1 Introduction 1.1 Target Audience 1.2 Topics covered: 1.3 Motivation 1.4 Curriculum 1.5 How to use the course", " Chapter 1 Introduction This is a living course meaning it is constantly changing and being updated. The goal for this course is to be a “wikipedia” of omic data. If you’d like to contribute, you can file a pull request on GitHub if you are comfortable with that sort of thing or email csavonen@fredhutch.org to ask how to get started. 1.1 Target Audience The course is intended for students in the biomedical sciences and researchers who have been given data and don’t know what to do with it or would like an overview of the different genomic data types that are out there. This course is written for individuals who: Have genomic data and don’t know what to do with it. Want a basic overview of genomic data types. Want to find resources for processing and interpreting genomics data. 1.2 Topics covered: 1.3 Motivation Cancer datasets are plentiful, complicated, and hold untold amounts of information regarding cancer biology. Cancer researchers are working to apply their expertise to the analysis of these vast amounts of data but training opportunities to properly equip them in these efforts can be sparse. This includes training in reproducible data analysis methods. Often students and researchers need to utilize genomic data to reach the next steps of their research but may not have formal training in computational methods or the basics of the genomic data they are attempting to utilize. Often researchers receive their genomic data processed from another lab or institution, and although they are excited to gain insights from it to inform the next steps of their research, they may not have a practical understanding of how the data they have received came to be or what needs to be done with it. As an example, data file formats may not have been covered in their training, and the data they received seems unintelligible and not as straightforward as they hoped. This course attempts to give this researcher the basic bearings and resources regarding their data, in hopes that they will be equipped and informed about how to obtain the insights for their researcher they originally aimed to find. 1.4 Curriculum Goal of this course: Equip learners with tutorials and resources so they can understand and interpret their genomic data in a way that helps them meet their goals and handle the data properly. This includes helping learners formulate questions they will need to ask others about their data What is not the goal Teach learners about choosing parameters or about the ins and outs of every genomic tool they might be interested in. This course is meant to connect people to other resources that will help them with the specifics of their genomic data and help learners have more efficient and fruitful discussions about their data with bioinformatic experts. 1.5 How to use the course This course is designed to be a jumping off point to more specific resources based on a genomic data type the learner has in mind (or currently on their computer). We encourage learners to follow links to resources we provide and feel free to jump around to chapters that are most useful for them. "],["a-very-general-genomics-overview.html", "Chapter 2 A Very General Genomics Overview 2.1 Learning Objectives 2.2 General informatics files", " Chapter 2 A Very General Genomics Overview 2.1 Learning Objectives In this chapter we are going to cover sequencing and microarray workflows at a very general high level overview to give you a first orientation. As we dive into specific data types and experiments, we will get into more specifics. Here we will cover the most common file formats. If you have a file format you are dealing with that you don’t see listed here, it may be specific to your data type and we will discuss that more in that data type’s respective chapter. We still suggest you go through this chapter to give you a basic understanding of commonalities of all genomic data types and workflows 2.1.1 What do genomics workflows look like? In the most general sense, all genomics data when originally collected is raw, it needs to undergo processing to be normalized and ready to use. Then normalized data is generally summarized in a way that is ready for it to be further consumed. Lastly, this summarized data is what can be used to make inferences and create plots and results tables. 2.1.2 Basic file formats Before we get into bioinformatic file types, we should establish some general file types that you likely have already worked with on your computer. These file types are used in all kinds of applications and not specific to bioinformatics. 2.1.2.1 TXT - Text A text file is a very basic file format that contains text! 2.1.2.2 TSV - Tab Separated Values Tab separated values file is a text file is good for storing a data table. It has rows and columns where each value is separated by (you guessed it), tabs. Most commonly, if your genomics data has been provided to you in a TSV or CSV file, it has been processed and summarized! It will be your job to know how it was processed and summarized Here the literal ⇥ represents tabs which often may show up invisible in your text editor’s preference settings. gene_id⇥sample_1⇥sample_2 gene_a⇥12⇥15, gene_b⇥13⇥14 2.1.2.3 CSV - Comma Separated Values A comma separated values file is list just like a TSV file but instead of values being separated by tabs it is separated by… (you guessed it), commas! In its raw form, a CSV file might look like our example below (but if you open it with a program for spreadsheets, like Excel or Googlesheets, it will look like a table) gene_id, sample_1, sample_2, gene_a, 12, 15, gene_b, 13, 14 2.1.3 Sequencing file formats 2.1.3.1 SAM - Sequence Alignment Map SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. For more about SAM files. 2.1.3.2 BAM - Binary Alignment Map BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily). 2.1.3.3 FASTA - “fast A” Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence): &gt;SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT For more about fasta files. 2.1.3.4 FASTQ - “Fast q” A Fastq file is like a Fasta file except that it also contains information about the Quality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly? @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !&#39;&#39;*((((***+))%%%++)(%%%%).1***-+*&#39;&#39;))**55CCF&gt;&gt;&gt;&gt;&gt;&gt;CCCCCCC65 For more about fastq files. Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you. Sequencing data that is not of high enough quality should not be trusted! It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters. 2.1.3.5 BCL - binary base call (BCL) sequence file format This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs. More about BCL to Fastq conversion. 2.1.3.6 VCF - Variant Call Format VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are variant from the reference genome or each other. This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter. For more on VCF files. 2.1.3.7 MAF - Mutation Annotation Format MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file. For more on MAF files. 2.1.4 Microarray file formats 2.1.4.1 IDAT - intensity data file This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly. Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into this package to help you do that. For more on IDAT files. 2.1.4.2 DAT - data file This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels. For more on DAT files. 2.1.4.3 CEL This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file. For more on CEL files 2.1.4.4 CHP CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files. For more about CHP files. 2.2 General informatics files At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter: 2.2.0.1 BED - Browser Extensible Data A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the chrom, chromStart and chromEnd columns to start. A BED file might look like this: chrom chromStart chromEnd other_optional_columns chr1 0 1000 good chr2 100 3000 bad For more on BED files. 2.2.0.2 GFF/GTF General Feature Format/Gene Transfer Format A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data. You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version. A GFF file may look like this (borrowed example from Ensembl): 1 transcribed_unprocessed_pseudogene gene 11869 14409 . + . gene_id &quot;ENSG00000223972&quot;; gene_name &quot;DDX11L1&quot;; gene_source &quot;havana&quot;; gene_biotype &quot;transcribed_unprocessed_pseudogene&quot;; Note that it will be useful for annotating genes and what we know about them. For more about GTF and GFF files. 2.2.1 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. "],["guidelines-for-good-metadata.html", "Chapter 3 Guidelines for Good Metadata 3.1 Learning Objectives 3.2 What are metadata? 3.3 How to create metadata?", " Chapter 3 Guidelines for Good Metadata 3.1 Learning Objectives 3.2 What are metadata? Metadata are critically important descriptive information about your data. Without metadata, the data themselves are useless or at best vastly limited. Metadata describe how your data came to be, what organism or patient the data are from and include any and every relevant piece of information about the samples in your data set. Metadata includes but isn’t limited to, the following example categories: At this time it’s important to note that if you work with human data or samples, your metadata will likely contain personal identifiable information (PII) and protected health information (PHI). It’s critical that you protect this information! For more details on this, we encourage you to see our course about data management. 3.3 How to create metadata? Where do these metadata come from? The notes and experimental design from anyone who played a part in collecting or processing the data and its original samples. If this includes you (meaning you have collected data and need to create metadata) let’s discuss how metadata can be made in the most useful and reproducible manner. 3.3.1 The goals in creating your metadata: 3.3.1.1 Goal A: Make it crystal clear and easily readable by both humans and computers! Some examples of how to make your data crystal clear: - Look out for typos and spelling errors! - Don’t use acronyms unless you need to and then if you do need to make sure to explain what the acronym means. - Don’t add extraneous information – perhaps items that are relevant to your lab internally but not meaningful to people outside of your lab. Either explain the significance of such information or leave it out. Make your data tidy. &gt; Tidy data is a standard way of mapping the meaning of a dataset to its structure. A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, variables and types. In tidy data: &gt; - Every column is a variable. &gt; - Every row is an observation. &gt; - Every cell is a single value. 3.3.1.2 Goal B: Avoid introducing errors into your metadata in the future! Toward these two goals, this excellent article by Broman &amp; Woo discusses metadata design rules. We will very briefly cover the major points here but highly suggest you read the original article. Be Consistent - Whatever labels and systems you choose, use it universally. This not only means in your metadata spreadsheet but also anywhere you are discussing your metadata variables. Choose good names for things - avoid spaces, special characters, or within the lab jargon. Write Dates as YYYY-MM-DD - this is a global standard and less likely to be messed up by Microsoft Excel. No Empty Cells - If a particular field is not applicable to a sample, you can put NA but empty cells can lead to formatting errors or just general confusion. Put Just One Thing in a Cell - resist the urge to combine variables into one, you have no limit on the number of metadata variables you can make! Make it a Rectangle - This is the easiest way to read data, for a computer and a human. Have your samples be the rows and variables be columns. Create a Data Dictionary - Have somewhere that you describe what your metadata mean in detailed paragraphs. No Calculations in the Raw Data Files - To avoid mishaps, you should always keep a clean, original, raw version of your metadata that you do not add extra calculations or notes to. Do Not Use Font Color or Highlighting as Data - This only adds to confusion to others if they don’t understand your color coding scheme. Instead create a new variable for anything you might be tempted to color code. Make Backups - Metadata are critical, you never want to lose them because of spilled coffee on a computer. Keep the original backed up in a multiple places. We recommend keeping writing your metadata in something like GoogleSheets because it is both free and also saved online so that it is safe from computer crashes. Use Data Validation to Avoid Errors - set data types to have googlesheets or excel check that the data in the columns is the type of data it expects for a given variable. Note that it is very dangerous to open gene data with Excel. According to Ziemann, Eren, and El-Osta (2016), approximately one-fifth of papers with Excel gene lists have errors. This happens because Excel wants to interpret everything as a date. We strongly caution against opening (and saving afterward) gene data in Excel. 3.3.2 To recap: If you are not the person who has the information needed to create metadata, or you believe that another individual already has this information, make sure you get ahold of the metadata that correspond to your data. It will be critical for you to have to do any sort of meaningful analysis! References "],["considerations-for-choosing-tools.html", "Chapter 4 Considerations for choosing tools 4.1 Learning Objectives 4.2 Overview 4.3 Coming to a decision 4.4 More resources", " Chapter 4 Considerations for choosing tools 4.1 Learning Objectives 4.2 Overview In this course, we will introduce you to the fundamentals of various data types and give you advice about choosing tutorials and tools whenever possible. However, it is critical to note that there is no “one size fits all” when it comes to genomic data decisions. Instead, our goals are to equip you with the knowledge you need as well as the questions you need to ask yourself (or others) when making decisions about your genomics data. We will discuss the following considerations you should gather information and otherwise ponder when comparing one or more tools for your analysis: 4.2.1 Is this tool appropriate for your data type? Certain tools are built for certain kinds of data. In each data-type-specific chapter we will attempt to point you tools that are appropriate for the given data type. However, note that some tools also might require tweaks in parameters for non-standard data collection methods. If you were not sure of the data collection methods used for your data type, be sure to follow the data type specific advice in the chapter to find out the information about your data that you need to know to make an informed decision. 4.2.2 Is this tool appropriate for your scientific question? Some tools may be appropriate for the general data type, but might mask information you will need to answer your particular scientific question or hypothesis. For example, for RNA-seq if you are interested in splice variants, you may not be able to use certain alignment tools that do not differentiate between splice variants. Be sure to make your goals and scientific questions clear when asking for advice or guidance. Some tools may be applicable to certain scientific questions, but other accommodations or preprocessing may need to be done 4.2.3 Is this tool in an interface or programming language you feel comfortable with? Genomics and informatics tools can be classified into two groups based on how you interact with them. These groups are 1) command line or 2) graphics user interface (GUI). GUIs are tools that you can use by clicking and pointing with your mouse whereas command line tools require input through writing out commands. Command line tools often lend to greater reproducibility of an analysis since a script can have all the steps needed to re-run analysis. This makes it so you could re-run and reproduce your results with one command instead of lots of clicking various buttons in particular order as you would need to do with a GUI based tool. Your level of comfort or willingness/time available to learn a programming language like R or Python will influence what tool options you have. If you are unfamiliar and uncomfortable writing in R, Python, or Bash scripting, this will influence what tools you have available to you or whether you will need to enlist more outside help. If you are interested in learning to use command line, we have many resources and recommendations for you to use for learning in this next chapter. However, if you do not have the bandwidth or motivation to learn how to code, you will want to gravitate toward tools that have GUIs. 4.2.4 How much computing power do you have? Some tools require a lot more computing resources (or runtime) than others. Many institutions have cloud computing resources or high powered computing clusters for your use. We’ll recommend you to our Computing Course for more information about this. But your computing budget access, and time allotment, may influence what tools you would like to use for a project. For example, for RNA seq data alignment, traditional aligners that use the genome take an order of magnitude greater amount of time to run than quantifying transcripts with pseudo alignment based tools. For many applications pseudoaligners are perfectly appropriate and efficient choices that can be run on a laptop. But if you prefer a traditional aligner because you are interested in something that is not detected by pseudosligners such as splice variants, then you may want to look into using some computing resources for this task. All these decisions need to be weighed in balance with each other. 4.2.5 Are there benchmarking papers that compare this tool to other options? Some tools and their algorithms have been more thoroughly examined and tested than others. And this doesn’t always align to a tool’s popularity. Seek out the literature and what studies have been done comparing this tool to others like it. Keep in mind the tool developer’s own bias if the paper is coming directly from the group or individual who is the creator of the tool. Developers will be more likely to understand and know how to tweak parameters of their own tool properly, while not necessarily spending as much time testing and adjusting tools made by others. This concept has sometimes been called the “Continental Breakfast Included” concept. 4.2.6 Is the tool well documented and usable? Well documented and usable tools can be very powerful. Poorly documented tools which may lead to unknown parameters or other mishandling of the data if it has not been made clear by the tool developers and maintainers. Good understanding of what a tool is doing with the data you give it is perhaps more important than using fancy algorithms that are unclear. Not only does documentation and usability increase your ability to use a tool, but your analysis will be more reproducible if others can also understand the tools that you used. The existence of forums and user groups for particular tools, not only makes it a useful resource for you for analysis, troubleshooting and interpretation of your results, but it also indicates a particular drive for the tool to continue to be maintained and developed overtime. 4.2.7 Is the tool well maintained? If a tool is actively being maintained this will aid in the reproducibility of your results. Tools on GitHub (an open-source platform for software) or other repositories often indicate when latest updates to a tool were made. Ideally updates are being made regularly to the tool, but a lack of updates does not speak well for the future existence of the tool. A tool that is not well maintained or supported may deprecate and make it increasingly difficult if not possible to reproduce, re-run or further develop your analysis. 4.2.8 Is the tool generally accepted by the field? While tool popularity should not be the only consideration when choosing a tool, it is an aspect that can influence communication or acceptance of your results. All things being equal, it can be better to choose a tool that is more accepted by the community as tried and true, and well benchmarked as opposed to the bleeding edge technology that may have not been truly scrutinized yet. In an analysis it is perhaps more valuable to know and weigh the known limitations of an older tool than to use a newer tool whose limitations may not have been identified yet (but it certainly will have its own limitations identified in time). 4.3 Coming to a decision It’s important to note that the questions we will discuss here need to be considered in balance of one another. Rarely should you make a decision about a tool without considering all of these items congruently. For example, some tools may have better benchmarking but if it is more computationally costly and you do not have access to the necessary computing resources to run the tool, then you may need to consider other options. 4.4 More resources A longer list of tools and resources can be found here DataTrail curriculum Introduction to Reproducibility Advanced Reproducibility in Cancer Informatics Computing in Cancer Informatics "],["general-data-analysis-tools.html", "Chapter 5 General Data Analysis Tools 5.1 Learning Objectives 5.2 Command Line vs GUI 5.3 More resources", " Chapter 5 General Data Analysis Tools 5.1 Learning Objectives 5.2 Command Line vs GUI When using computers there are two different ways you can tell a computer program what you want it to do. You can use a a Graphics User Interface (abbreviated as GUI) where you point and click buttons or you can use a Command Line Interface where you type in commands and write scripts that tell the program what you want it to do. Command Line Interfaces require a bit more time to learn and get used to, but they are generally easier to make more reproducible, because every step that you are using an analysis can be written in a script. Graphics User Interfaces can be more intuitive to use more quickly, but they can be difficult to repeat the analysis in the exact same way. If you know you will be doing the same analysis many times (either with different or the same samples), it is a good use of your time to make sure that you learn how to use Command Line tools. We will discuss some of the most commonly used Command line tools here. 5.2.1 Bash Bash is a command language used by a lot of computers and programs. Many of the same items that you might do every day on your computer by clicking on various items on your desktop and menus, you can also perform using bash. On a Mac computer, you can use bash commands by finding your Terminal window. Go to your search bar and search for the Terminal. You may want to keep this application handy. In Windows, you can use bash commands by search for Command Prompt application. Go to your search bar and search for Command Prompt. You may want to keep this application handy. 5.2.2 R R is a program commonly used for statistics and data analysis. It’s free and has lots of R packages built for genomics analysis purposes. Many of these packages have been highlighted in this course or otherwise listed in our tool glossary. 5.2.2.1 Resources for learning R 5.2.2.1.1 R and Tidyverse Swirl, an interactive tutorial R for Data Science Tidyverse skills for Data Science by Carrie Wright. Handy R cheatsheets R Cookbook Second Edition Advanced R R for Epidemiology - has generally good R advice O’Reilly books available through Seattle Public Library 5.2.2.1.2 R notebooks R Markdown Tutorial on R, RStudio and R Markdown Handy R cheatsheets R Notebooks tutorial 5.2.2.1.3 R and Genomics Intro to R and Tidyverse course and exercises from the Childhood Cancer Data Lab. Refine.bio examples from the Childhood Cancer Data Lab. Biostar Handbook: A Beginner’s Guide to Bioinformatics 5.2.3 Python Python is a program that also is used for data analysis among many other items. It can be a very powerful development tool. Some of the packages that have been highlighted in this course or otherwise are listed in our tool glossary. 5.2.3.1 Resources for learning python Python Data Science Handbook Python for Biologists 5.3 More resources A longer list of tools and resources can be found here DataTrail curriculum Introduction to Reproducibility Advanced Reproducibility in Cancer Informatics Computing in Cancer Informatics "],["sequencing-data.html", "Chapter 6 Sequencing Data 6.1 Learning Objectives 6.2 How does sequencing work? 6.3 Sequencing concepts 6.4 Very General Sequencing Workflow", " Chapter 6 Sequencing Data This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 6.1 Learning Objectives In this section, we are going to discuss generalities that apply to all sequencing data. This is meant to be a “primer” for you which data-type specific chapters will build off of to give you more specific and practical steps and advice in regards to your data type. 6.2 How does sequencing work? Sequencing methods, whether they are targeting DNA, transcriptomes, or some other target of the genome, have some commonalities in the steps as well as what types of biases and data generation artifacts to look out for. All sequencing experiments start out with the extraction of the biological material of interest. This biological material will be processed in some way to isolate to the genomic target of interest (we will cover the various techniques for this in more detail in each respective data chapter since it is highly specific to the data type). This set of processing steps will lead up to library generation – adding a way to catalog what molecules came from where. Sometimes for this library prep the sequences need to be fragmented before hand and an adapter bound to them. The resulting sample material is often a very small quantity, which means Polymerase Chain Reaction (PCR) needs to be used to amplify the material to a quantity large enough to be reliably sequenced. We will talk about how this very common method not only amplifies the sequences we want to read but amplifies sequence method biases that we would like to avoid. At the end of this process, base sequences are called for the samples (with varying degrees of confidence), creating huge amounts of data and what hopefully contains valuable research insights. 6.3 Sequencing concepts 6.3.1 Inherent biases Sequences are not all sequenced or amplified at the same rate. In a perfect world, we could take a simple snapshot of the genome we are interested in and know exactly what and how many sequences were in a sample. But in reality, sequencing methods and the resulting data always have some biases we have to be aware of and hopefully use methods that attempt to mitigate the biases. 6.3.1.1 GC bias You may recall that with nucleotides: adenine binds with thymine and guanine binds with cytosine. But, the guanine-cytosine bond (GC) has 3 hydrogen bonds whereas the adenine-thymine bond (AT) has only 2 bonds. This means that the GC bond is stickier (to put it scientifically) and needs higher temperatures to unbind. The sequencing and PCR amplification process involves cycling through temperatures and binding and unbinding of sequences which means that if a sequence has a lot of G’s and C’s (high GC content) it will unbind at a different temperatures than a sequence of low GC content. 6.3.1.2 Sequence complexity Nonrepeating sequences are harder to sequence and amplify than repeating sequences. This means that the complexity of a target sequence influences the PCR amplification and detection. 6.3.1.3 Length bias Longer sequences – whether they represent long sequence variants, long transcripts, or etc, are more likely to be identified than shorter ones! So if you are attempting to quantify the presence of a sequence, a longer sequence is much more likely to be counted more often. 6.3.2 PCR Amplification All of the above biases are amplified when the sequences are being amplified! You can picture that if each of these biases have a certain effect for one copy, then as PCR steps copy the sequence exponentially, the error is also being multiplied! PCR amplification is generally a necessary part of the process. But there are tools that allow you to try to combat the biases of PCR amplification in your data analysis. These tools will be dependent on the type of sequencing methods you are using and will be something that is discussed in each data type chapter. 6.3.3 Depth of coverage The depth of sequencing refers to how many times on average a particular base is sequenced. Obviously the more times something is sequenced, the more you can be confident that the base call is accurate. However, sequencing at greater depths also takes more time and money. Depending on your sequencing goals and methods there is an appropriate level of depth that is needed. Coverage on the other hand has to do with how much of the target is covered. If you are doing Whole Genome Sequencing, what percentage of the whole genome were you able to sequence? You may realize how depth is related to coverage, in that the greater depth of sequencing you use the more likely you are to also cover more of the genome. As discussed in relation to the biases, some part of the genome are harder to reach than others, so by reading at greater depths some of those “hard to read” parts of the genome will be able to be covered. 6.3.4 Quality controls Sequencing bases involves some error/confidence rate. As mentioned, some parts of the genome are harder to read than others. Or, sometimes your sequencing can be influenced by poor quality sample that has degraded. Before you jump in to further analyzing your data, you will want to investigate the quality of the sequencing data you’ve collected. The most common and well-known method for assessing sequencing quality controls is FASTQC. FASTQC creates an abundance of sequencing quality control reports from fastq files. These reports need to be interpreted within the context of your sequencing methods, samples, and experimental goals. Often bioinformatics cores are good to contact about these reports (they may have already run FASTQC on your data if that is where you obtained your data initially). They can help you wade through the flood of quality control reports printed out by FASTQC. FASTQC also has great documentation that can attempt to guide you through report interpretation. This also includes examples of good and bad FASTQC reports. But note that all FASTQC report interpretations must be done relative to the experiment that you have done. In other words, there is not a one size fits all quality control cutoffs for your FASTQC reports. The failure/success icons FASTQC reports back are based on defaults that may not be accurate or applicable to your data, so further investigation and consultation is warranted before you decided to trust or pitch your sequencing data. 6.3.5 Alignment Once you have your reads and you find them reasonably trustworthy through quality control checks, you will want to align them to your reference. The reference you align your sequences to will depend on the data type you have: a reference genome, a reference transcriptome, something else? Traditional aligners - Align your data to a reference using standard alignment algorithms. Can be very computationally intensive. Pseudo aligners - much faster and the trade off for accuracy is often negligible (but again is dependent on the data you are using). TODO: considerations for alignment. 6.3.6 Single End vs Paired End Sequencing can be done single-end or paired-end. Paired end means the primers are going to bind to both sides of a sequence. This can help you avoid some 3’ bias and give you more complete coverage of the area you are sequencing. But, as you may guess, pair-end read sequencing is more expensive than single end. You will want to determine whether your sequencing is paired end or single end. If it is paired end you will likely see file names that indicate this. You should have pairs of files that may or may not be labeled with _1 and _2 or _F and _R. We will discuss file nomenclature more specifically as it pertains to different data types in the upcoming chapters. 6.4 Very General Sequencing Workflow In the data type specific chapters, we will cover the sequencing data workflows and file formats in more detail. But in the most general sense, sequencing workflows look like this: 6.4.1 Sequencing file formats 6.4.1.1 SAM - Sequence Alignment Map SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. For more about SAM files. 6.4.1.2 BAM - Binary Alignment Map BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily). 6.4.1.3 FASTA - “fast A” Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence): &gt;SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT For more about fasta files. 6.4.1.4 FASTQ - “Fast q” A Fastq file is like a Fasta file except that it also contains information about the Quality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly? @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !&#39;&#39;*((((***+))%%%++)(%%%%).1***-+*&#39;&#39;))**55CCF&gt;&gt;&gt;&gt;&gt;&gt;CCCCCCC65 For more about fastq files. Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you. Sequencing data that is not of high enough quality should not be trusted! It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters. 6.4.1.5 BCL - binary base call (BCL) sequence file format This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs. More about BCL to Fastq conversion. 6.4.1.6 VCF - Variant Call Format VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are variant from the reference genome or each other. This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter. For more on VCF files. 6.4.1.7 MAF - Mutation Annotation Format MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file. For more on MAF files. 6.4.2 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. "],["microarray-data.html", "Chapter 7 Microarray Data 7.1 Learning Objectives 7.2 Summary of microarrays 7.3 How do microarrays work? 7.4 What types of arrays are there? 7.5 General processing of microarray data 7.6 Very General Microarray Workflow 7.7 General informatics files", " Chapter 7 Microarray Data This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 7.1 Learning Objectives 7.2 Summary of microarrays Microarrays have been in use since before high throughput sequencing methods became more affordable and widespread, but they still can be a effective and affordable tool for genomic assays. Depending on your goals, microarray may be a suitable choice for your genomic study. 7.3 How do microarrays work? All microarrays work on hybridization to sets of oligonucleotides on a chip. However, the preparation of the samples, and the oligonucleotides’ hybridization targets vary depending on the assay and goals. On a basic principle, oligonucleotide probes are designed for different targets sets designed for the same targets are put together. On the whole chip, these probes are arranged in a grid like design so that after a sample is hybridized to them, you can detect how much of the target is detected by taking an image and knowing what target each location is designed to. 7.3.1 Pros: Microarrays are much more affordable than high throughput sequencing which can allow you to run more samples and have more statistical power (Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays take less time to process than most high throughput sequencing methods(Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays are generally less computationally intensive to process and you can get your results more quickly(Tarca, Romero, and Draghici 2006; ALSF 2019). Microarrays are generally as good as sequencing methods for detecting clinical endpoints (W. Zhang et al. 2015). 7.3.2 Cons: Microarray chips can only measure the targets they are designed for, and cannot be used for exploratory purposes (W. Zhang et al. 2015). Microarrays’ probe designs can only be as up to date as the genome they were designed against at the time (Mantione et al. 2014; ALSF 2019). Microarray does not escape oligonucleotide biases like GC content and sequence composition biases(ALSF 2019). 7.4 What types of arrays are there? 7.4.1 SNP arrays Single nucleotide polymorphism arrays are designed to measure DNA variants. They are designed to target DNA variants. When the sample is hybridized, the amount of fluorescence detected can be interpreted to indicate the presence of the variant and whether the variant is homogeneous or heterogenous. The samples prepped for SNP arrays then need to be DNA samples. 7.4.1.1 Examples: The 1000 genomes project is a large collection of SNP data array from many populations around the world and is available for download. 7.4.2 Gene expression arrays Gene expression arrays are designed to measure gene expression. They are designed to target and measure relative transcript abundance level. 7.4.2.1 Examples: refine.bio is the largest collection of publicly available, already normalized gene expression data (including gene expression microarrays). Getting started in gene expression microarray analysis (Slonim and Yanai 2009). Microarray and its applications (2012). Analysis of microarray experiments of gene expression profiling (Tarca, Romero, and Draghici 2006). 7.4.3 DNA methylation arrays DNA methylation can also be measured by microarray. To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively. A ratio of the fluorescence signal can be used to identify the relative abundance of the methylated and unmethylated versions of the sequence. Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite bisulfite sequencing (Booth et al. 2013). Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics. 7.5 General processing of microarray data After scanning, microarray data starts as an image that needs to be quantified, normalized and further corrected and edited based on the most current genome and probe annotation. As noted above, microarrays do not escape the base sequence biases that accompany most all genomic assays. The normalization methods you use ideally will mitigate these sequence biases and also make sure to remove probes that may be outdated or bind to multiple places on the genome. The tools and methods by which you normalize and correct the microarray data will be dependent not only on the type of microarray assay you are performing (gene expression, SNP, methylation), but most of all what kind of microarray chip design/platform you are using. 7.5.1 Examples Refine.bio describes their processing methods. Brainarray keeps up to date microarray annotation for all kinds of platforms 7.5.2 Microarray Platforms There are so many microarray chip designs out there designed to target different things. Three of the largest commercial manufacturers have ready to use microarrays you can purchase. You can also design microarrays to hit your own targets of interest. Here are full lists of platforms that have been published on Gene Expression Omnibus. Affymetrix platforms Agilent platforms. Illumina platforms. 7.6 Very General Microarray Workflow In the data type specific chapters, we will cover the microarray workflow and file formats in more detail. But in the most general sense, microarray workflows look like this, note that the exact file formats are specific to the chip brand and type you use (e.g. Illumina, Affymetrix, Agilent, etc.): 7.6.1 Microarray file formats 7.6.1.1 IDAT - intensity data file This is an Illumina microarray specific file that contains the chip image intensity information for each location on the microarray. It is a binary file, which means it will not be readable by double clicking and attempting to open the file directly. Currently, Illumina appears to suggest directly converting IDAT files into a GTC format. We advise looking into this package to help you do that. For more on IDAT files. 7.6.1.2 DAT - data file This is an Affymetrix’ microarray specific file parallel to the IDAT file in that it contains the image intensity information for each location on the microarray. It’s stored as pixels. For more on DAT files. 7.6.1.3 CEL This is an Affymetrix microarray specific file that is made from a DAT file but translated into numeric values. It is not normalized yet but can be normalized into a CHP file. For more on CEL files 7.6.1.4 CHP CHP files contain the gene-level and normalized data from an Affymetrix array chip. CHP files are obtained by normalizing and processing CEL files. For more about CHP files. 7.7 General informatics files At various points in your genomics workflows, you may need to use other types of files to help you annotate your data. We’ll also discuss some of these common files that you may encounter: 7.7.0.1 BED - Browser Extensible Data A BED file is a text file that has coordinates to genomic regions. THe other columns that accompany the genomic coordinates are variable depending on the context. But every BED file contains the chrom, chromStart and chromEnd columns to start. A BED file might look like this: chrom chromStart chromEnd other_optional_columns chr1 0 1000 good chr2 100 3000 bad For more on BED files. 7.7.0.2 GFF/GTF General Feature Format/Gene Transfer Format A GFF file is a tab delimited file that contains information about genomic features. These types of files are available from databases and what you can use to annotate your data. You may see there are GFF2, GFF3, and GTF files. These only refer to different versions and variations. They generally have the same information. In general, GFF2 is being phased out so using GFF3 is generally a better bet unless the program or package you are using specifies it needs an older GFF2 version. A GFF file may look like this (borrowed example from Ensembl): 1 transcribed_unprocessed_pseudogene gene 11869 14409 . + . gene_id &quot;ENSG00000223972&quot;; gene_name &quot;DDX11L1&quot;; gene_source &quot;havana&quot;; gene_biotype &quot;transcribed_unprocessed_pseudogene&quot;; Note that it will be useful for annotating genes and what we know about them. For more about GTF and GFF files. 7.7.1 Other files * If you didn’t see a file type listed you are looking for, take a look at this list by the BROAD. Or, it may be covered in the data type specific chapters. 7.7.2 Microarray processing tutorials: For the most common microarray platforms, you can see these examples for how to process the data: 7.7.2.1 General arrays Using Bioconductor for Microarray Analysis. 7.7.2.2 Gene Expression Arrays An end to end workflow for differential gene expression using Affymetrix microarrays. 7.7.2.3 DNA Methylation Arrays DNA Methylation array workflow. References "],["annotating-genomes.html", "Chapter 8 Annotating Genomes 8.1 Learning Objectives 8.2 What are reference genomes? 8.3 What are genome versions? 8.4 What are the different files? 8.5 Considerations for annotating genomic data 8.6 Resources you will need for annotation!", " Chapter 8 Annotating Genomes This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 8.1 Learning Objectives In this chapter, we are going to discuss methods that affect every genomic method and may take up the majority of your time as a genomic data analyst: Annotation. We know that the sequencing or array data is not useful on its own – for our human minds to comprehend it and apply it to something we need a tangible piece of information to be attached to it. This is where annotation comes in. At best annotation helps you and others interpret genomic data. At its worst, its a time consuming activity that, done incorrectly, can lead to erroneous conclusions and labeling. Proper annotation requires an understanding of how the annotation data you are using was derived as well as the realization that all annotation data is constantly changing and the confidence for these data are never 100%. Some organism’s genomes are better annotated than others but nearly all are at least somewhat incomplete. 8.2 What are reference genomes? Every individual organism has its own DNA sequence that is unique to it. So how can we compare organisms to each other? In some studies, sequencing data is obtained and the genome is built de novo (aka from scratch) but this takes a lot of time and computing power. So instead, most genomic studies use the imperfect method of comparing to a reference genome. Reference genomes are built from prior data and available online. They inherently have biases in them. For example, human genomes are generally not made from diverse populations but instead from mostly males of european descent. It is inherently bad for both ethical and scientific reasons to to have genome references that are too white. For more on the problems with reference genomes, read this. In summary, reference genomes are used for comparison and as a ‘source of truth’ of sorts, but its important to note that this method is biased and better alternatives need to be realized. 8.3 What are genome versions? If you are familiar with software development, or have used any app before, you’re familiar with software updates and releases. Similarly, the genome has updates and releases as continued cloning and assemblies of organisms teaches us more. In the image below we are showing an example of what a genome version may be noted as (note that different databases may have different terminology – here we are showing the Genome Reference Consortium). You may also notice on their website it shows the date the genome version was released and what was fixed. The details of how genome versions are fixed and released are not really of concern for your data analysis. This is merely to explain that genomes change and what is most important in your analysis is that: You choose one genome version and consistently use it in all your analyses. Choose a genome version that the rest of your field has generally had a consensus on and is also using. Generally this means sticking with major releases of a genome instead of always going with the latest version. Most databases will try to point you to their major release, so just stick with that. We will point you where you can find genome annotation for a lot of the major organisms. 8.4 What are the different files? Although we can’t walk you through every organism and database set up, we will walkthrough the files and structure of one example here. In the above screenshot, from Ensembl, it shows different organisms in the rows, but also a variety of different files across the columns. In this example, DNA reference to the DNA sequence of the organism’s genome, but cDNA refers to complementary DNA – aka DNA that has been reversed transcribed from RNA. If you are working with RNA data you may want to use the cDNA file. Whereas CDS files are referring to only coding sequences and ncRNA files are showing only non coding sequences. Most of these files are FASTA files. Gene sets are also their own annotation files called GTF or GFF files. Ensembl provides more detailed information about what these files contain, but briefly, each row is a feature and has information describing that feature such as genomic locations, the relevant feature type (gene, coding sequence, pseudogene, etc.), and the gene ID or name. For a reminder on what these different file types are see the previous chapter. Depending on the tool you are using, the data file and type you need will vary. Some tools have these data built in or are compatible with other packages that have annotation. If a tool automatically includes annotation within it, you will need to ensure that any additional tools you are using are also pulling from the same genome and version. Look into a tool’s documentation to find out what genome versions it is based on. If it doesn’t tell you at all, you don’t want to be using that tool. You cannot assume that cross genome analyses will translate. 8.4.1 How to download annotation files For another database example we’ll look at the human data on ENA’s servers. Note that if you see FTP that just means “Fast Transfer Protocol” and it just means its where you can get the files themselves. For more on computing lingo, you can take our Computing in Cancer Informatics course. There’s many ways you can download these files and they are described here. In summary: - If you don’t feel comfortable using command line, you can use the browser downloader for ENA here - If you are using command line to write a script, then you can write use the wget or curl instructions described here. Be sure to read the README files to understand what it is you are downloading. Also note that if you are working from a high power computing cluster or other online server, these annotation files may already be available to you. You don’t want to take up more computing resources by downloading extra files, so check with an administrator or informatics expert who also uses the cluster or cloud to check if the annotation files already exist in your workspace. 8.5 Considerations for annotating genomic data 8.5.1 Make sure you have the right file to start! Is the annotation from the right organism? You may think this is a dumb question, but its very critical that you make sure you have the genome annotation for the organism that matches your data. Indeed the author of this has made this mistake in the past, so double check that you are using the correct organism. Are all analyses utilizing coordinates from the same genome/transcriptome version? Genome versions are constantly being updated. Files from older genome versions cannot be used with newer ones (without some sort of liftover conversion). This also goes for transcriptome and genome data. All analysis need to be done using the same genomic versions so that is ensured that any chromosomal coordinates can translate between files. For example, it could be in one genome version a particular gene was said to be at chromosome base pairs 300 - 400, but in the next version its now been changed to 305 - 405. This can throw off an analysis if you are not careful. This type of annotation mapping becomes even more complicated when considering different splice variants or non-coding genes or regulatory regions that have even less confidence and annotation about them. 8.5.2 Be consistent in your annotations If at all possible avoid making cross species analyses - unless you are an evolutionary genomics expert and understand what you are doing. But for most applications cross species analyses are hopeful wishing at best, so stick to one organism. Avoid mixing genome/transcriptome versions. Yes there is liftover annotation data to help you identify what loci are parallel between releases, but its really much simpler to stick with the same version throughout your analyses’ annotations. 8.5.3 Be clear in your write ups! Above all else, not matter what you end up doing, make sure that your steps, what files you use, and what tool versions you use are clear and reproducible! Be sure to clearly link to and state the database files you used and include your code and steps so others can track what you did and reproduce it. For more information on how to create reproducible analyses, you can take our reproducibility in cancer informatics courses: Introduction to Reproducibility and Advanced Reproducibility in Cancer Informatics. 8.6 Resources you will need for annotation! 8.6.1 Annotation databases Ensembl EMBL-EBI UCSCGenomeBrowser NCBI Genomes download page 8.6.2 GUI based annotation tools UCSCGenomeBrowser BROAD’s IGV Ensembl’s biomart 8.6.3 Command line based tools 8.6.3.1 R-based packages: annotatr ensembldb GenomicRanges - useful for manipulating and identifying sequences. GO.db - Gene ontology annotation org.Hs.eg.db RSamtools A full list of Bioconductors annotation packages - contains annotation for all kinds of species and versions of genomes and transcriptomes. 8.6.3.2 Python-based packages: BioPython genetrack 8.6.4 More resources about genome annotation "],["dna-methods-overview.html", "Chapter 9 DNA Methods Overview 9.1 Learning Objectives 9.2 What are the goals of analyzing DNA sequences? 9.3 Comparison of DNA methods 9.4 How to choose a DNA sequencing method 9.5 Strengths and Weaknesses of different methods", " Chapter 9 DNA Methods Overview This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 9.1 Learning Objectives 9.2 What are the goals of analyzing DNA sequences? There are several larger goals behind DNA sequencing experiments ranging from assembling whole genomes, to identifying variation or performing a functional genomic analysis or comparative genomic study. Each of these has implications when studying disease. Assembling whole genomes: Because an organism’s genome determines how an organism develops and functions (NHGRI 2024), an important task in the genomics field is assembling the genome of an organism from sequencing reads. This assembly process attempts to reconstruct how the sequencing reads overlap or fit together (Schatz, Delcher, and Salzberg 2010; Li and Durbin 2024). Recent examples of genome assembly in the genomics field include a complete 3.055 billion-base pair sequence of the human reference genome which was published by the Telomere-to-Telomere (T2T) Consortium (2022), the T2T-CHM13 version (followed not long after by the complete sequence of the human Y chromosome (2023)). A goal of the field is to better capture human genetic diversity by creating a reference pangenome, assembled from multiple donors within the population (2024). Genome assemblies are an important part of genomics beyond human genomics research; there are reference gnomes available for most model organisms as well as many plants, animals, and pathogens, with more and more being published at a high frequency (Miller, Zimin, and Gordus 2023; Alonge et al. 2022; Gershman et al. 2023; Sistrom et al. 2016). These reference genomes each act as an extensive compilation of the observed DNA sequence of genes, regulatory elements, etc. and the related coordinate systems for these elements, such that, for the corresponding organism, sequencing reads from other experiments can be mapped or aligned to the reference in order to localize where that read was in the genome. In the case of cancer informatics, a recent approach utilized personalized genome assembly to more accurately detect tumor somatic mutations. This is likely to be an area of future research for application in precision medicine (Xiao et al. 2022; Ermini and Driguez 2024). Identifying variation: Variant caller software is used within the field of genomics to identify places where reads from a DNA sequencing experiment differ from a comparative reference genome sequence (NHGRI 2022). Variants may be as small as single nucleotide differences (single-nucleotide polymorphisms or SNPs) or much larger (50 base pairs or more) structural variation (SVs) such as duplications, deletions, insertions, inversions, translocations (Wong, Hudson, and McPherson 2011). (Shorter insertions or deletions are termed indels.) The SVs involving gains or losses in genomic DNA can lead to copy number variations (CNVs). Mutation and structural variants are very common in cancer as well as larger-scale catastrophic genomic rearrangements (C.-Z. Zhang and Pellman 2022). Overall, variants may be rare in a population or fairly common (Audano et al. 2019). Further, variants may be somatic or germline variants: germline variants are hereditary and will be passed down from parent to offspring; in the offspring, the variant will be present in every cell, while somatic variants are generally not hereditary and present only in some cells rather than every cell (Frost 2022). Because variation, specifically genetic diversity is a necessary part of a healthy species (“What Is Genetic Diversity and Why Does It Matter?” n.d.) and because variation, specifically mutations/variants may cause disease, identifying variation is a common goal in a DNA sequencing workflow. An example of research focusing on studying genetic diversity in humans is the 1000 Genomes Project which recently expanded its resource of sequenced genomes and in doing so discovered even more variation present in the population (Byrska-Bishop et al. 2022). Functional genomic analysis: Genomes contain more than just genes (the coding sequences that will be transcribed and translated into a protein); they also contain functional elements such as promoters, enhancers, or silencers that modulate the expression of genes (Kellis et al. 2014). Further, differential gene expression is the phenomenon by which cells with the same DNA sequence show different patterns of gene expression. Functional genomic analyses aim to better understand differential gene expression and the impact of genetic variation found in functional elements. For example, many human genetic variants associated with common traits and diseases are localized in or near known functional elements (Hindorff et al. 2009). These variants may impact gene expression due to either changes in transcription factor binding at that site, or resulting epigenetic changes, which are defined as chemical modifications of chromatin or nucleotides beyond the DNA sequence. Such epigenetic modifications, which include histone marks and DNA methylation, can alter DNA compaction and influence a functional element’s accessibility for transcriptional machinery (e.g., if the element isn’t accessible, transcription may not occur; while previously the element was accessible and the gene could be transcribed). In later sections, methods that study epigenetic modifications like chromatin accessibility, DNA methylation, or binding of specific proteins will be discussed. All of these methods support functional genomic analyses and are important for better understanding differential gene expression and the impact of genetic variants located in functional elements may have on disease occurrence. A somewhat recent and high profile example of a functional genomic analysis centers again on work from the T2T Consortium. Not only did they publish a new, complete reference genome, but they also studied the epigenetic landscape in the newly resolved regions of the genome and pointed to potential newly discovered functional elements in a region previously thought to be transcriptionally inactive (Gershman et al. 2022). Comparative genomics: A common saying in the genomics field is that structure determines function and conserved structure may be constrained such that there is an important function which needs to be conserved (Alföldi and Lindblad-Toh 2013). Further, similarities in structure may be due to shared ancestry through the processes of evolution; therefore, some comparative genomics studies aim to infer homology or an evolutionary relationship from structural similarity (Pearson 2013). More pertinent to the topics discussed previously, comparative genomics studies are also useful for identifying functional elements (J. Taylor et al. 2006) and variants associated with disease (e.g., by comparing the genomes of those with the disease and those without it and identifying differences) (Alföldi and Lindblad-Toh 2013; Eichler 2019). 9.3 Comparison of DNA methods There are four DNA sequencing methods discussed in this chapter. The above graph compares WGS, WXS, and Targeted gene sequencing. The last section compares all 4. Whole genome sequencing (WGS) Whole exome sequencing (WXS) Targeted gene sequencing DNA/SNP microarrays Compared to WXS and Targeted Gene Sequencing, WGS is the most expensive but requires the lowest depth of coverage to achieve 95% sensitivity. In other words, WGS requires sequencing each region of the genome (3.2 billion bases) 30 times in order to confidently be able to pick up all possible meaningful variants. (Sims et al. 2014) goes into more depth on how these depths are calculated. Alternatively, WXS is a more cost effective way to study the genome, focusing places in the genome that have open reading frames – aka generally genes that are able to be expressed. This focuses on enriching for exons and not introns so splicing variants may be missed. In this case, each gene must be sequenced 80-100x for sufficient sensitivity to pick up meaningful variants. In targeted gene sequencing, a panel of 50-500 regions of interest are selected. This technique is very applicable for studying a set of specific genes of interest at great depth to identify all varieties of mutations within those specific genes. These genes must be sequenced at much greater depth (&gt;500x) to confidently identify all meaningful variants. This page from Illumina also provides information regarding sequencing depth considerations for different modalities. Additional references: WGS: (Bentley et al. 2008) WES: (Clark et al. 2011) Targeted: (Bewicke-Copley et al. 2019) 9.4 How to choose a DNA sequencing method Before starting any sequencing method, you likely have a research question or hypothesis in mind. In order to choose a DNA sequencing method, you will need to consider a few items in balance of each other: 9.4.1 1. What region(s) of the genome pertain to your research question? Is this unknown? Can it be narrowed down to non-coding or coding regions? Is there an even more specific subset of interest? 9.4.2 2. What does your project budget allow for? Some methods are much more costly than others. Cost is not only a factor for the reagents needed to sequence, but also the computing power needed to process and store the data and people’s compensation for their work on the data. All of these costs increase as the amounts of data that are collected increase. For more information on computing decisions see our Computing in Cancer Informatics course. 9.4.3 3. What is your detection power for these variants? Detecting DNA variants is not simply a matter of yes or no, but a confidence level due to sequencing errors in data collection. Are the variants you are looking for very rare and/or small (single nucleotide or very few copy number differences)? If so you will need more samples and potentially more sequencing depth to detect these variants with confidence. 9.5 Strengths and Weaknesses of different methods Is not much known about DNA variants in your organism or disease in question? In this instance you may want to cast a large net to explore more variants by using WGS. If previous research has identified sections of the genome that are of interest to your research question, then it’s highly advisable to not sequence the entire genome with WGS methods. Not only will whole genome sequencing be more costly, but it will decrease your statistical power to discover true positive variants of interest and increase your chances of discovering false positive variants. This is because multiple testing correction needs to be applied in instances where many tests are being done currently. In this instance, the tests being performed are across the whole genome. If your research question does not pertain to non-coding regions of the genome or splicing, then its advisable to use WXS. Recall that only about 1-2% of the genome is coding sequences meaning that if you are uninterested in noncoding regions but still use WGS then 98-99% of your data will be uninteresting to you and will only serve to increase your chances of finding false positives or cost you a lot of funding. Not only does sequencing more of the genome take more money and time but it will be more costly in time and resources in terms of the computing power needed to analyze it. Furthermore, if you are able to narrow down even further what regions are of interest this would be better in terms of cost and detection abilities. A targeted sequencing panel or DNA microarray are ideal for assaying known groups of targets. DNA microarrays are the least costly of all the methods to identify DNA variants, but with both targeted sequencing and DNA microarray you will need to find or create a custom probe or primer set. Ideally a probe or primer set that hits your regions of interest already exists commercially but if not, then you will have to design your own – which also costs time and money. In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data. References "],["whole-genome-or-exome-sequencing.html", "Chapter 10 Whole Genome or Exome Sequencing 10.1 Learning Objectives 10.2 WGS and WGS Overview 10.3 Advantages and Disadvantages of WGS vs WXS 10.4 WGS/WXS Considerations 10.5 DNA Sequencing Pipeline Overview 10.6 Data Pre-processing 10.7 Commonly Used Tools 10.8 Data pre-processing tools 10.9 Tools for somatic and germline variant identification 10.10 Tools for variant calling annotation 10.11 Tools for copy number variation analysis 10.12 Tools for data visualization 10.13 Resources for WGS", " Chapter 10 Whole Genome or Exome Sequencing This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 10.1 Learning Objectives The learning objectives for this course are to explain the use and application of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES/WXS) for genomics studies, outline the technical steps in generating WGS/WXS data, and detail the processing steps for analyzing and interpreting WGS/WXS data. To familiarize yourself with sequencing methods as a whole, we recommend you read our chapter on sequencing first. 10.2 WGS and WGS Overview The difference between WGS and WXS sequencing is whether or not the open reading frames and thus coding regions are targeted in sequencing. WGS attempts to sequence the whole genome, while for WXS only exons with open reading frames are targeted for sequencing. Both of these methods can be massively beneficial for studying rare and complex diseases. Thus, whole genome sequencing is a technique to thoroughly analyze the entire DNA sequence of an organism’s genome. This includes sequencing all genes both coding and non-coding and all mitochondrial DNA. WGS is beneficial for identifying new and previously established variants related to disease and the regulatory elements of the genome including promoters, enhancers, and silencers. Increasingly non-coding RNAs have also been identified to play a functional role in biological mechanisms and diseases. In order to learn more about the non-coding regions of the genome, WGS is necessary. Alternatively whole exome sequencing is used to sequence the coding regions of an organism’s genome. Although non-coding regions can sometimes reveal valuable insights, coding regions can be a useful area of the genome to focus sequencing methods on, since changes in a protein coding sequence of the genome generally have more information known about them. Often protein coding sequences can have more clearly functional changes - like if a stop codon is introduced or a codon is changed to a predictable amino acid. This can more easily lead to downstream investigations on the functional implications of the protein affected. 10.3 Advantages and Disadvantages of WGS vs WXS We more thoroughly discuss how to choose DNA sequencing methods here in the previous chapter, but we will briefly cover this here. Alternatives to WGS include Whole Exome Sequencing (WES/WXS), which sequences the open reading frame areas of the genome or Targeted Gene Sequencing where probes have been designed to sequence only regions of interest. The main advantages of WGS include the ability to comprehensively analyze all regions of a genome, the ability to study structural rearrangements, gene copy number alterations, insertions and deletions, single nucleotide polymorphisms (SNPs), and sequencing repeats. Some disadvantages include higher sequencing costs and the necessity for more robust storage and analysis solutions to manage the much larger data output generated from WGS. 10.4 WGS/WXS Considerations Some important considerations for WGS/WXS include: What genome you are studying and the size of this genome. Included in this considerations is whether this genome has been sequenced before and you will have a “reference” genome to compare your data against or whether you will have to make a reference genome yourself. This bioinformatics resource provides a great overview of genome alignment. The depth of coverage for sequencing is an important consideration. The typical recommendation for WGS coverage is 30x, but this is on the lower side and many researchers find it does not provide sufficient coverage compared to 50x. Illumina has an infographic that explains this information The tissue source and whether genetic alterations were introduced during processing are important. Fixation for formalin-fixed paraffin embedded (FFPE) can introduce mutations/genetic changes that will need to be accounted for during data analysis. This page from Beckman addresses many of the questions researchers often have about utilizing FFPE samples for their sequencing studies. The library preparation method of DNA amplification via PCR is very important as PCR can often introduce duplicates that interfere with interpreting whether a mutant gene is truly frequent or just over amplified during sequencing preparation. Illumina provides a comparison of using PCR and PCR-free library preparation methods on their website. 10.4.1 Target enrichment techniques For WXS or other targeted sequencing specifically (so not relevant to WGS data), what methods were used to enrich for the targeted sequences? (Which is the entire exome in the case of general WXS) These methods are generally summarized into two major categories: Hybridization based and amplicon based enrichment. - [Hybridization based enrichment](https://www.paragongenomics.com/target-enrichment/). This includes a variety of widely used methods that we will broadly categorize in two groups: Array-based and In-solution: - [Array-based capture](https://en.wikipedia.org/wiki/Exome_sequencing#:~:text=Target%2Denrichment%20strategies-,Array%2Dbased%20capture,-In%2Dsolution%20capture) uses microarrays that have probes designed to bind to known coding sequences. Fragments that do not bind to these probes are washed away, leaving the sample with known coding sequences bound and ready for PCR amplification [@Hodges2007; @Turner2009]. - [In-solution capture](https://en.wikipedia.org/wiki/Exome_sequencing#In-solution_capture) has become more popular in recent years because it [requires less sample DNA than array-base capture](https://sequencing.roche.com/us/en/products/product-category/target-enrichment.html). To enrich for coding sequences, in-solution capture has a pool of custom probes that are designed to bind to the coding regions in the sample. Attached to these probes are beads which can be physically separated from DNA that is not bound to the probes (this should be the non-coding sequences) [@Mamanova2010]. - [PCR/Amplicon based enrichment](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/) requires even less sample than the other two strategies and so is ideal for when the amount of sample is limited or the DNA has been otherwise processed harshly (e.g. with paraffin embedding). Because the other two enrichment methods are done after PCR amplification has been done to the whole genomic DNA sample, its thought that this method of selective PCR amplification for enrichment can result in more uniformly amplified DNA in the resulting sample. However this is less suitable the more gene targets you have (like if you truly need to sequence all of the exome) since amplicons need to be designed for each target. Overall it is much more affordable of a method. There are several variations of this method that are [discussed thoroughly by @Singh2022](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/). 10.5 DNA Sequencing Pipeline Overview In order to create WGS/WXS data, DNA is first extracted from a specific sample type (tissue, blood samples, cells, FFPE blocks, etc.). Either traditional (involving phenol and chloroform) or commercial kits can be used for this first step. Next, the DNA sequencing libraries are prepared. This involves fragmenting the DNA, adding sequencing adapters, and DNA amplification if the input DNA is not of sufficient quantity. Recall that for WXS After sequencing, data is analyzed by converting and aligning reads to generate a BAM file. Many analysis tools will use the BAM file to identify variants, which then generates a VCF file. More information about sequencing and BAM and VCF file generation can be found here in the sequencing data chapter. 10.6 Data Pre-processing Raw sequencing reads are first transformed into a fastq file (more information about fastq files can be found here in the sequencing data chapter in the Quality Controls section. Then the sequencing reads are aligned to a reference genome to create a BAM file. This data is sorted and merged, and PCR duplicates are identified. The confidence that each read was sequenced correctly is reflected in the base quality score. This score must be recalibrated at this step before variants are called. A final BAM file is thus created. This can be used for future analysis steps include variant or mutation identification, which is outlined on the following slide. 10.7 Commonly Used Tools The following link provides the data analysis pipeline written by researchers in the NCI division of the NIH and provides a helpful overview of the typical steps necessary for WGS analysis. Here are many of the tools and resources used by researchers for analyzing WGS data. 10.8 Data pre-processing tools In most cases, all of these tools will be used sequentially to prepare the data for downstream mutational and copy number variation (CNV) analysis. Bedtools including the bamtofastq function, which is the first step in converting data off the sequencer to a usable format for downstream analysis Samtools including tools for converting fastq to BAM files while mapping genes to the genome, duplicate read marking, and sorting reads Picard2 including tools to covert fastq to SAM files, filter files, create indices, mark read duplicates, sort files, and merge files GATK is a comprehensive set of tools from the Broad Institute for analyzing many types of sequencing data. For pre-processing, the print read function is very beneficial for writing the reads from a BAM or SAM file that pass specific criteria to a new file 10.9 Tools for somatic and germline variant identification These tools are used to identify either somatic or germline mutations from a sequenced sample. Many researchers will often use a combination of these tools to narrow down only variants that are identified using a combination of these analysis algorithms. All of these mutation calling tools except SvABA can be used on both WGS and WXS data. Mutect2 This is a beneficial variant calling tool with functions including using a “panel of normals” (samples provided by the user of many normal controls) to better compare disease samples to normal and filtering functions for samples with orientation bias artifacts (FFPE samples) called F1R2, which is explained in the link above. Varscan 2 This is a helpful tool that utilizes a heuristic/statistic approach to variant calling. This means that it detects somatic CNAs (SCNAs) as deviations from the log-ratio of sequence coverage depth within a tumor–normal pair, and then quantify the deviations statistically. This approach is unique because it accounts for differences in read depth between the tumor and normal sample. Varscan 2 can also be used for identifying copy number alterations in tumor-normal pairs. MuSE This is a beneficial mutation calling tool when you have both tumor and normal datasets. The Markov Substitution Model for Evolution utilized in this tool models the evolution of the reference allele to the allelic composition of the tumor and normal tissue at each genomic locus. SvABA This tool is especially useful for calling insertions and deletions (indels) because it assembles aberrantly aligned sequence reads that reflect indels or structural variants using a custom String Graph Assembler. Indels can be difficult to detect with standard alignment-based variant callers. Strelka2 This is a small variant caller designed by Illumina. It is used for identifying germline variants in cohorts of samples and somatic variants in tumor/normal sample pairs. SomaticSniper SomaticSniper can be used to identify SNPs in tumor/normal pairs. It calculates the probability that the tumor and normal genotypes are different and reports this probability as a somatic score. Pindel Pindel is a tool that uses a pattern growth approach to detect breakpoints of large deletions, medium size insertion/inversion, tandem duplications. Lancet This is a newer variant calling tool that uses colored de Bruijn graphs to jointly analyze tumor and normal pairs, offering strong indel detection. More information about the processes used in this variant calling tool can be found here Researchers may want to create a consensus file based on the mutation calls using multiple tools above. OpenPBTA-analysis shows an open source code example of how you might compare and contrast different SNV caller’s results. For researchers who prefer GUI based platforms: Gene Pattern has a great set of variant based tutorials. GenePattern is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. 10.10 Tools for variant calling annotation These are beneficial for providing functional meaning to the mutational hits identified above. Annovar This is a helpful tool for annotating, filtering, and combining the output data from the above tools. It can be used for gene-based, region-based, or filter-based annotations. GENCODE This tool can be used to identify and classify gene features in human and mouse genomes. dbSNP This is a resource to look up specific human single nucleotide variations, microsatellites, and small-scale insertions and deletions. Ensembl This resource is a genome browser for annotating genes from a wide variety of species. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. 10.11 Tools for copy number variation analysis Similar to the mutation calling tools, many researchers will use several of these tools and investigate the overlapping hits seen with different copy number variant calling algorithms: GATK GATK has a variety of tools that can be used to study changes in copy numbers of genes. This link provides a tutorial for how to use the tools. AscatNGS These tools (allele-specific copy number analysis of tumors) are specific for WGS copy number variation analysis. They can be used to dissect allele-specific copy numbers of tumors by estimating and adjusting for tumor ploidy and nonaberrant cell admixture. TitanCNA This tool is used to analyze copy number variation and loss of heterozygosity at the subclonal level for both WGS and WXS data in tumors compared to matched normals. It accounts for mixtures of cell populations and estimates the proportion of cells harboring each event. The Ha lab has developed a snakemake pipeline to more easily use this tool. Ha et al. published a paper describing this tool in detail here gGNV This is a germline CNV calling tool that can be used on both WGS and WXS data. This tool has booth COHORT and CASE modes. COHORT mode is used when providing a cohort of germline samples where CASE mode is used for individual samples. More details about these modes are described in the link above. BIC-seq2 This tool is used to detect CNVs with or without control samples. The steps involved in this data processing tool include normalization and CNV detection. 10.12 Tools for data visualization These tools are often used in parallel to look at regions of the genome, develop plots, and create other relevant figures: OpenCRAVAT uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations. IGV IGV is an interactive tool used to easily visualize genomic data. It is available as a desktop application, web application, and JavaScript to embed in web pages. This application is very beneficial for visualizing both mutational and CNV data for WGS and WXS. IGV has many tutorials on YouTube that are helpful for using the tool to its full potential. Maftools Maftools is an R package that can be used to create informative plots from your WGS data output. It has tools to import both VCF files and ANNOVAR output for data analysis. Prism Prism is a widely used tool in scientific research for organizing large datasets, generating plots, and creating readable figures. WGS or WXS data regarding mutations and CNV can be used as input for creating plots with this tool. 10.13 Resources for WGS Online tutorials: Galaxy tutorials NCI resources Bioinformaticsdotca tutorial Papers comparing analysis tools: (Hwang et al. 2019) (Naj et al. 2019) (X. He et al. 2020) References "],["rna-methods-overview.html", "Chapter 11 RNA Methods Overview 11.1 Learning Objectives 11.2 What are the goals of gene expression analysis? 11.3 Comparison of RNA methods", " Chapter 11 RNA Methods Overview This chapter is in a beta stage. Some of it has been written with AI tools. If you wish to contribute, please go to this form or our GitHub page. 11.1 Learning Objectives 11.2 What are the goals of gene expression analysis? The goal of gene expression analysis is to quantify RNAs across the genome. This can signify the extent to which various RNAs are being transcribed in a particular cell. This can be informative for what kinds of activity a cell is undergoing and responding to. 11.3 Comparison of RNA methods There are three general methods we will discuss for evaluating gene expression. RNA sequencing (whether bulk or single-cell) allows you to catch more targets than gene expression microarrays but is much more costly and computationally intensive. Gene expression microarrays have a lower dynamic range than RNA-seq generally but are much more cost effective. Spatial transcriptomics is the newest method on the block and has the ability to relate gene expression to tissue regions and subpopulations. 11.3.1 Single-cell RNA-seq (scRNA-seq): Cost: scRNA-seq methods can be relatively expensive due to the need for specialized protocols and reagents. Droplet-based methods (e.g., 10x Genomics) are generally more cost-effective than full-length methods (e.g., SMART-seq) because they require fewer sequencing reads per cell. Experimental Goals: scRNA-seq is suitable when studying cellular heterogeneity and characterizing gene expression profiles at the single-cell level. It provides insights into cell types, cell states, and cell-cell interactions. Specific Requirements: scRNA-seq requires single-cell isolation techniques, and the choice of method depends on the desired cell throughput, desired coverage, and the need for full-length transcript information. 11.3.2 Bulk RNA-seq: Cost: Bulk RNA-seq is generally more cost-effective compared to scRNA-seq because it requires fewer sequencing reads per sample. The cost primarily depends on the sequencing depth required. Experimental Goals: Bulk RNA-seq is appropriate for analyzing average gene expression profiles across a population of cells. It provides information on gene expression levels and can be used for differential gene expression analysis. Specific Requirements: Bulk RNA-seq requires a sufficient quantity of RNA from the sample, typically obtained through RNA extraction and purification. 11.3.3 Gene Expression Microarray: Cost: Gene expression microarrays are usually less expensive compared to RNA-seq methods. The cost includes array production and hybridization. Experimental Goals: Microarrays are useful for profiling gene expression levels across a large number of genes in a cost-effective manner. They can be employed for differential gene expression analysis and identification of gene expression patterns. Specific Requirements: Microarrays require labeled cDNA or cRNA targets, and they are limited to the detection of known transcripts represented on the array platform. 11.3.4 Spatial Transcriptomics: Cost: Spatial transcriptomics methods can vary in cost depending on the technique used. Some methods involve additional steps and specialized equipment, making them relatively more expensive. Experimental Goals: Spatial transcriptomics allows the investigation of gene expression patterns within the context of tissue or cellular spatial organization. It provides spatial information on gene expression, enabling the identification of cell types and their interactions. Specific Requirements: Spatial transcriptomics requires intact tissue sections or samples, and the choice of method depends on factors such as desired spatial resolution, throughput, and compatibility with downstream analyses. In these upcoming chapters we will discuss in more detail each of these methods, what the data represent, what you need to consider, and what resources you can consult for analyzing your data. "],["bulk-rna-seq-1.html", "Chapter 12 Bulk RNA-seq 12.1 Learning Objectives 12.2 Where RNA-seq data comes from 12.3 RNA-seq workflow 12.4 RNA-seq data strengths 12.5 RNA-seq data limitations 12.6 RNA-seq data considerations 12.7 Visualization GUI tools 12.8 RNA-seq data resources 12.9 More reading about RNA-seq data", " Chapter 12 Bulk RNA-seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 12.1 Learning Objectives 12.2 Where RNA-seq data comes from 12.3 RNA-seq workflow In a very general sense, RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that check the quality of the sequencing done. You may also want to trim and filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, differential expression, or any number of other analyses. In this chapter we will highlight some of the more popular RNA-seq tools, that are generally suitable for most experiment data but there is no “one size fits all” for computational analysis of RNA-seq data (Conesa et al. 2016). You may find tools out there that better suit your needs than the ones we discuss here. 12.4 RNA-seq data strengths RNA-seq can give you an idea of the transcriptional activity of a sample. RNA-seq has a more dynamic range of quantification than gene expression microarrays are able to measure. RNA-seq is able to be used for transcript discovery unlike gene expression microarrays. 12.5 RNA-seq data limitations RNA-seq suffers from a lot of the common sequence biases which are further worsened by PCR amplification steps. We discussed some of the sequence biases in the previous sequencing chapter. These biases are nicely covered in this blog by Mike Love and we’ll summarize them here: Fragment length: Longer transcripts are more likely to be identified than shorter transcripts because there’s more material to pull from. Positional bias: 3’ ends of transcripts are more likely to be sequenced due to faster degradation of the 5’ end. Fragment sequence bias: The complexity and GC content of a sequence influences how often primers will bind to it (which influences PCR amplification steps as well as the sequencing itself). Read start bias: Certain reads are more likely to be bound by random hexamer primers than others. Main Takeaway: When looking for tools, you will want to see if the algorithms or options available attempt to account for these biases in some way. 12.6 RNA-seq data considerations 12.6.1 Ribo minus vs poly A selection Most of the RNA in the cell is not mRNA or noncoding RNAs of interest, but instead loads of ribosomal RNA a. So before you can prepare and sequence your data you need to isolate the RNAs to those you are interested in. There are two major methods to do this: Poly A selection - Keep only RNAs that have poly A tails – remember that mRNAs and some kinds of noncoding RNAs have poly A tails added to them after they are transcribed. A drawback of this method is that transcripts that are not generally polyadenylated: microRNAs, snoRNAs, certain long noncoding RNAs, or immature transcripts will be discarded. There is also generally a worse 3’ bias with this method since you are selecting based on poly A tails on the 3’ end. Ribo-minus - Subtract all the ribosomal RNA and be left with an RNA pool of interest. A drawback of this method is that you will need to use greater sequencing depths than you would with poly A selection (because there is more material in your resulting transcript pool). This blog by Sitools Biotech does a good summary of the pros and cons of either selection method. 12.6.2 Transcriptome mapping How do you know which read belongs to which transcript? This is where alignment comes into play for RNA-seq There are two major approaches we will discuss with examples of tools that employ them. Traditional aligners - Align your data to a reference using standard alignment algorithms. Can be very computationally intensive. Traditional alignment is the original approach to alignment which takes each read and finds where and how in the genome/transcriptome it aligns. If you are interested in identifying the intracacies of different splices and their boundaries, you may need to use one of these traditional alignment methods. But for common quantification purposes, you may want to look into pseudo alignment to save you time. Examples of traditional aligners: STAR HISAT2 This blog compares some of the traditional alignment tools Pseudo aligners - much faster and the trade off for accuracy is often negligible (but as always, this is likely dependent on the data you are using). The biggest drawback to pseudoaligners is that if you care about local alignment (e.g. perhaps where splice boundaries occur) instead of just transcript identification then a traditional alignment may be better for your purposes. These pseudo aligners often include a verification step where they compare a subset of the data to its performance to a traditional aligner (and for most purposes they usually perform well). Pseudo aligners can potentially save you hours/days/weeks of processing time as compared to traditional aligners so are worth looking into. Examples of pseudo aligners: Salmon Kallisto Reference free assembly - The first two methods we’ve discussed employ aligning to a reference genome or transcriptome. But alternatively, if you are much more interested in transcript identification or you are working with a model organism that doesn’t have a well characterized reference genome/transcriptome, then de novo assembly is another approach to take. As you may suspect, this is the most computationally demanding approach and also requires deeper sequencing depth than alignment to a reference. But depending on your goals, this may be your preferred option. These strategies are discussed at greater length in this excellent manuscript by Conesa et al, 2016. 12.6.3 Abundance measures If your RNA-seq data has already been processed, it may have abundance measure reported with it already. But there are various types of abundance measures used – what do they represent? raw counts - this is a raw number of how many times a transcript was counted in a sample. Two considerations to think of: 1. Library sizes: Raw counts does not account for differences between samples’ library sizes. In other words, how many reads were obtained from each sample? Because library sizes are not perfectly equal amongst samples and not necessarily biologically relevant, its important to account for this if you wish to compare different samples in your set. 2. Gene length: Raw counts also do not account for differences in gene length (remember how we discussed longer transcripts are more likely to be counted). Because of these items, some sort of transformation needs to be done on the raw counts before you can interpret your data. These other abundance measures attempt to account for library sizes and gene length. This blog and video by StatQuest does an excellent job summarizing the differences between these quantifications and we will quote from them: Reads per kilobase million (RPKM) Count up the total reads in a sample and divide that number by 1,000,000 – this is our “per million” scaling factor. Divide the read counts by the “per million” scaling factor. This normalizes for sequencing depth, giving you reads per million (RPM) Divide the RPM values by the length of the gene, in kilobases. This gives you RPKM. Fragments per kilobase million (FPKM) FPKM is very similar to RPKM. RPKM was made for single-end RNA-seq, where every read corresponded to a single fragment that was sequenced. FPKM was made for paired-end RNA-seq. With paired-end RNA-seq, two reads can correspond to a single fragment, or, if one read in the pair did not map, one read can correspond to a single fragment. The only difference between RPKM and FPKM is that FPKM takes into account that two reads can map to one fragment (and so it doesn’t count this fragment twice). Transcripts per million (TPM) Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK). Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor. Divide the RPK values by the “per million” scaling factor. This gives you TPM. TPM has gained a popularity in recent years because it is more intuitive to understand: When you use TPM, the sum of all TPMs in each sample are the same. This makes it easier to compare the proportion of reads that mapped to a gene in each sample. In contrast, with RPKM and FPKM, the sum of the normalized reads in each sample may be different, and this makes it harder to compare samples directly. 12.6.4 RNA-seq downstream analysis tools ComplexHeatmap is great for visualizations DESEq2 and edgeR are great for differential expression analyses. CTAT - Using RNA-seq as input, CTAT modules enable detection of mutations, fusion transcripts, copy number aberrations, cancer-specific splicing aberrations, and oncogenic viruses including insertions into the human genome. Gene Set Enrichment Analysis (GSEA) is a method to identify the coordinate activation or repression of groups of genes that share common biological functions, pathways, chromosomal locations, or regulation, thereby distinguishing even subtle differences between phenotypes or cellular states. Gene Pattern’s RNA-seq tutorials - an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. 12.7 Visualization GUI tools WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. Network Data Exchange (NDEx) is a project that provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge. 12.8 RNA-seq data resources ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. Refine.bio - a repository of uniformly processed and normalized, ready-to-use transcriptome data from publicly available sources. 12.9 More reading about RNA-seq data Refine.bio’s introduction to RNA-seq StatQuest: A gentle introduction to RNA-seq (Starmer 2017). A general background on the wet lab methods of RNA-seq (Hadfield 2016). Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation (M. I. Love, Hogenesch, and Irizarry 2016). Mike Love blog post about sequencing biases (M. Love 2016) Biases in Illumina transcriptome sequencing caused by random hexamer priming (Hansen, Brenner, and Dudoit 2010). Computation for RNA-seq and ChIP-seq studies (Pepke, Wold, and Mortazavi 2009). References "],["single-cell-rna-seq.html", "Chapter 13 Single-cell RNA-seq 13.1 Learning Objectives 13.2 Where single-cell RNA-seq data comes from 13.3 Single-cell RNA-seq data types 13.4 Single cell RNA-seq tools 13.5 Quantification and alignment tools 13.6 Downstream tools Pros and Cons 13.7 More scRNA-seq tools and tutorials 13.8 Visualization GUI tools 13.9 Useful tutorials 13.10 Useful readings", " Chapter 13 Single-cell RNA-seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 13.1 Learning Objectives 13.2 Where single-cell RNA-seq data comes from As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity. Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity. If your research questions require cell-level transcriptional information, single-cell RNA-seq will on interest to you. 13.3 Single-cell RNA-seq data types There are broadly two categories of single-cell RNA-seq data methods we will discuss. Full length RNA-seq: Individual cells are physically separated and then sequenced. Tag Based RNA-seq: Individual cells are tagged with a barcode and their data is separated computationally. Depending on your goals for your single cell RNA-seq analysis, you may want to choose one method over the other. (Material borrowed from (“Alex’s Lemonade Training Modules” 2022)). 13.3.1 Unique Molecular identifiers Often Tag based single cell RNA-seq methods will include not only a cell barcode for cell identification but will also have a unique molecular identifier (UMI) for original molecule identification. The idea behind the UMIs is it is a way to have insight into the original snapshot of the cell and potentially combat PCR amplification biases. 13.4 Single cell RNA-seq tools There are a lot of scRNA-seq tools for various steps along the way. In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting doublets (also known as duplets), and using this information to filter out data that is not trustworthy. Doublets are transcriptome data generated from two cells, and an undesired technical artifact when single cell RNA-seq workflows want data representing a single cell at a time. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses. Each step of this very general representation of a workflow can be conducted by a variety of tools. We will highlight some of the more popular tools here. But, to look through a full list, you can consult the scRNA-tools website. 13.5 Quantification and alignment tools This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. STAR (Dobin et al. 2013): Pros: Accurate alignment of RNA-seq reads to the genome. Can handle a wide range of RNA-seq protocols, including scRNA-seq. Provides read counts and gene-level expression values. Cons: Requires a significant amount of memory and computational resources. May be difficult to set up and run for beginners. HISAT2 (Kim, Langmead, and Salzberg 2015): Pros: Accurate alignment of RNA-seq reads to the genome. Provides transcript-level expression values. Supports splice-aware alignment. Cons: May require significant computational resources for large datasets. May not be as accurate as some other alignment tools. Kallisto bustools (Bray et al. 2016): Pros: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Requires less memory and computational resources than alignment-based methods. Cons: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates. Alevin/Salmon (Patro et al. 2017): Pros: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Supports both single-end and paired-end sequencing. Cons: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates. Cell Ranger (Zheng et al. 2017): Pros: Specifically designed for 10x Genomics scRNA-seq data, with optimized workflows for alignment and quantification. Provides read counts and gene-level expression values. Offers a streamlined pipeline with minimal input from the user. Cons: Limited options for customizing parameters or analysis methods. May not be suitable for datasets from other scRNA-seq platforms. 13.6 Downstream tools Pros and Cons Seurat: Pros: Has a wide range of functionalities for preprocessing, clustering, differential expression, and visualization. Can handle multiple modalities, including CITE-seq and ATAC-seq. Has a large and active user community, with extensive documentation and tutorials available. Cons: Can be computationally intensive, especially for large datasets. Requires some knowledge of R programming language. Scanpy: Pros: Written in Python, a widely used programming language in bioinformatics. Has a user-friendly interface and extensive documentation. Offers a variety of preprocessing, clustering, and differential expression methods, as well as interactive visualizations. Cons: May not be as feature-rich as some other tools, such as Seurat. Does not yet support multiple modalities. Monocle: Pros:Focuses on trajectory analysis, allowing users to explore developmental trajectories and cell fate decisions. Has a user-friendly interface and extensive documentation. Can handle data from multiple platforms, including Smart-seq2 and Drop-seq. Cons: May not be as feature-rich for clustering or differential expression analysis as some other tools. Requires some knowledge of R programming language. Monocle: Pros:Focuses on trajectory analysis, allowing users to explore developmental trajectories and cell fate decisions. Has a user-friendly interface and extensive documentation. Can handle data from multiple platforms, including Smart-seq2 and Drop-seq. Cons: May not be as feature-rich for clustering or differential expression analysis as some other tools. Requires some knowledge of R programming language. 13.6.1 Doublet Tool Pros and Cons DoubletFinder(McGinnis, Murrow, and Gartner 2020): Pros: Uses a machine learning approach to detect doublets based on transcriptome similarity. Can be used with a variety of scRNA-seq platforms. Offers a user-friendly interface and extensive documentation. Cons: Can be computationally intensive for large datasets. May require some knowledge of R programming language. Scrublet (Wolock, Krishnaswamy, and Huang 2019): Pros: Uses a density-based approach to detect doublets based on barcode sharing. Fast and computationally efficient, making it suitable for large datasets. Offers a user-friendly interface and extensive documentation. Cons:May not be as accurate as other methods, especially for low-quality data. Limited to 10x Genomics data. DoubletDecon (De Pasquale and Dudoit 2019): Pros: Uses a statistical approach to identify doublets based on the distribution of the number of unique molecular identifiers (UMIs) per cell. Can be used with different platforms and species. Offers a user-friendly interface and extensive documentation. Cons: May not be as accurate as other methods, especially for data with low sequencing depth or low cell numbers. Requires some knowledge of R programming language. It’s important to note that no doublet detection method is perfect, and it’s often a good idea to combine multiple methods to increase the accuracy of doublet identification. Additionally, manual inspection of the data is always recommended to confirm the presence or absence of doublets. 13.7 More scRNA-seq tools and tutorials AlevinQC Gene Pattern’s single cell RNA-seq tutorials - an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. Single Cell Genome Viewer For normalization scater TumorDecon can be used to generate customized signature matrices from single-cell RNA-sequence profiles. It is available on Github (https://github.com/ShahriyariLab/TumorDecon) and PyPI (https://pypi.org/project/TumorDecon/). 13.8 Visualization GUI tools WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. 13.9 Useful tutorials These tutorials cover explicit steps, code, tool recommendations and other considerations for analyzing RNA-seq data. Orchestrating Single Cell Analysis with Bioconductor - An excellent tutorial for processing single cell data using Bioconductor. Advanced Single Cell Analysis with Bioconductor - a companion book to the intro version that contains code examples. Alex’s Lemonade scRNA-seq Training module - A cancer based workshop module based in R, with exercise notebooks. Sanger Single Cell Course - a general tutorial based on using R. ASAP: Automated Single-cell Analysis Pipeline is a web server that allows you to process scRNA-seq data. Processing raw 10X Genomics single-cell RNA-seq data (with cellranger) - a tutorial based on using CellRanger. 13.10 Useful readings An Introduction to the Analysis of Single-Cell RNA-Sequencing Data (AlJanahi, Danielsen, and Dunbar 2018). Orchestrating single-cell analysis with Bioconductor (Amezquita et al. 2020). UMIs the problem, the solution and the proof (Smith 2015). Experimental design for single-cell RNA sequencing (Baran-Gale, Chandra, and Kirschner 2018). Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies (Lafzi et al. 2018). Comparative Analysis of Single-Cell RNA Sequencing Methods (Ziegenhain et al. 2017). Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq Systems (X. Zhang et al. 2019). Single cells make big data: New challenges and opportunities in transcriptomics (Angerer et al. 2017). Comparative Analysis of common alignment tools for single cell RNA sequencing (Brüning et al. 2021). Current best practices in single-cell RNA-seq analysis: a tutorial (Luecken and Theis 2019). References "],["spatial-transcriptomics-1.html", "Chapter 14 Spatial transcriptomics 14.1 Learning objectives 14.2 What are the goals of spatial transcriptomic analysis? 14.3 Overview of a spatial transcriptomics workflow 14.4 Spatial transcriptomic data strengths: 14.5 Spatial transcriptomic data weaknesses: 14.6 Tools for spatial transcriptomics 14.7 More tools and tutorials regarding spatial transcriptomics", " Chapter 14 Spatial transcriptomics 14.1 Learning objectives 14.2 What are the goals of spatial transcriptomic analysis? Spatial transcriptomics (ST) technologies have been developed as a solution to the lack of spatial context in single cell transcriptomics (scRNA-seq) data (Rao et al. 2021; Ospina, Soupir, and Fridley 2023). There is a diversity of ST methods, however all have in common two features: Multiple measurements of gene expression and the locations within the tissue where those gene expression measurements were taken. Data analysis of ST data requires integration of those two components, and it’s primary goal is to characterize gene expression patterns within the tissue or cellular context. The ability to quantify gene expression at different locations within the tissue is of tremendous value to understand the functional variation of different tissue regions, domains, or niches. It also places cell-cell communication in the context of cell neighborhoods, which ultimately facilitates a deeper understanding of cell and tissue biology, but also enables practical applications such as discovery of novel drug targets for complex diseases such as cancer (Dries et al. 2021; Williams et al. 2022). Following, are some of the specific goals that a study using ST could achieve: Describe tissue-specific cellular neighborhoods of cell types and cell type sub-populations: Although scRNA-seq continues to be a powerful method to assign biological identities to a mixture of cells, integrated analysis of ST combined with scRNA-seq adds crucial information to cell phenotypes by describing the neighborhoods where cells occur (Longo et al. 2021). Many methods to phenotype ST data are available, with most of them relying on the availability of a curated (scRNA-seq) cell type reference. Once cell identities have been determined, clustering or spatial statistics can be applied to describe the composition of tissue niches or domains. The explosion of ST data has resulted on novel and comprehensive tissue- or disease-specific atlases, not only describing the cell types within organs, but also the functional cell-cell relationships that result from spatial organization (e.g., Guilliams et al. (2022); Wu et al. (2021)). Uncover spatially regulated biological processes: With ST data, there comes the ability to detect genes or gene pathways that are expressed in specific areas within tissues (i.e., spatially-restricted expression). Detecting genes with spatially-restricted expression is key to achieve further understanding of specific biological processes, such as tissue gradients, cell differentiation, or signaling pathways. For example, cancer researchers are now able to study signaling pathways restricted to the tumor-stroma interface (Hunter et al. 2021), which could lead to the discovery of mechanisms representing cancer vulnerabilities resulting from interactions between the tumor and stroma cells. Investigate cell-cell interactions: From basic to applied tissue biology research, the study of cell-cell interactions is of high interest, especially the interactions that occur via ligand-receptor pairs. The construction of comprehensive databases of ligand-receptor interactions has been possible due the large amounts of single-cell data sets produced by researchers. A major contribution of ST to the study of tissue biology is the addition of the spatial context to previously identified ligand-receptor interactions. Because single-cell RNA-seq requires physical separation of cells, current ligand-receptor databases represent hypotheses which ST can help to address by using models of spatial co-localization, enabling in-situ examination of cell-cell interactions and communication (Raredon et al. 2023; X. Wang, Almet, and Nie 2023). Integrate imaging data: Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., Ravi et al. (2022)). The opposite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., Hu et al. (2021); Xu et al. (2022); Tan et al. (2020)). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy. Identify biomarkers and drug targets: The use of ST allows the exploration of tissue niche-specific expression patterns and gene pathway analysis. This exploration can lead to generation of hypotheses about potential biomarkers for specific tissue functions or disease states. Furthermore, the molecular interactions predicted using scRNA-seq (e.g., ligand-receptor), can now be put in context of the larger tissue architecture using ST data. The spatial context of these interactions will likely boost the identification of novel drug targets, as well as improved understanding of current therapies (Lyubetskaya et al. 2022; L. Zhang et al. 2022). 14.3 Overview of a spatial transcriptomics workflow There is a large diversity in approaches to spatially profile tissues. Some ST technologies allow profiling at coarse cellular resolution, where regions of interest (ROIs) are usually identified by a pathologist. These ROIs may include tens of cells up to few hundreds (e.g., GeoMx Bergholtz et al. (2021)). Smaller ROI sizes can be found in other technologies such as Visium, where ROIs of 55uM of diameter (or “spots”) often contain no more than 10 cells (https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data). For finer cellular resolution, technologies such as MERFISH, SMI, or Xenium, among others, can measure gene expression at individual cells (Yue et al. 2023). In general, there is a trade-off between the cellular resolution and molecular resolution, as the number of quantified genes and RNA molecules is lower in single-cell level spatial technologies compared to those at the ROI or spot level. In single-cell ST, often a panel of hundreds of genes is quantified, while in “mini-bulk” (ROI/spot) ST, it is possible to genes at the whole transcriptome level. In addition to the differences in cellular and molecular, there are fundamental differences in the chemistry used to count the RNA transcripts in the tissue (N. Wang et al. 2021; Yue et al. 2023). Capture or hybridization of RNA followed by sequencing, or fluorescent imaging are two of the most common techniques used in ST methods. Because of large diversity in resolution and chemical procedures among ST technologies, data collection workflows are equally diverse. Finally, each study poses specific questions that cannot be addressed with traditional scRNA-seq pipelines, requiring customized workflows. Some of the commonalities in the workflows are presented here: Sample preparation: The preparation of a tissue sample will depend largely on the specific ST technology to be used. In general, this involves obtaining the tissue of interest in the form of a thin slice from a fresh frozen biopsy or a paraffin embedded tissue block. Tissue slices are generally about five to 10 micron of thickness. Given the instability of RNA molecules, the samples originating the tissue slices should be properly preserved and stabilized to maintain the integrity of RNA molecules. Many ST technologies are compatible with tissue microarrays (TMAs). Capture or hybridization of RNA molecules: In this step, the tissue sample is typically placed on a solid substrate, such as regular positively charged glass slides or vendor-designed slides. The latter category include spatially barcoded slides. (e.g., Visium (Ståhl et al. 2016) ), where RNA capture probes are contained in microscopic spots arranged in arrays or grids. The use of positively charged slides are used in technologies using in-situ sequencing or imaging-based methods, however, capture-based methods like GeoMx also employ this type of slide. Each method entails specific considerations. An example of these considerations include optimization of tissue permeabilization in Visium slides to release the RNA molecules. In the case of imaging-based methods, RNA molecules are hybridized with fluorescent probes that uniquely identify each RNA species [e.g., SMI (S. He et al. 2022), MERFISH (M. Zhang et al. 2021) ]. RNA quantification: The method used to count the number of captured or hybridized RNA molecules greatly varies from technology to technology. Capture methods often involve release of the RNA molecules from the tissue or slide, followed by library preparation, amplification, next generation sequencing, and read mapping to a reference genome. In this case, libraries are spatially multiplexed, whereby barcodes indicate the spatial location originating the captured RNA molecules. In imaging-based methods, segmentation is required to delineate the cell borders. Then, coded fluorescent probes are counted within each segmented cells. Data quality control and pre-processing: As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriptomics data typically contain an excess of zeroes and high gene dropout (Zhao et al. 2022). Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercised to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly (Ospina, Soupir, and Fridley 2023). In imaging-based methods, the area of cells can be used to detect “doublets” generated during image segmentation. Once filtering has been performed, gene count normalization and transformation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform (Hafemeister and Satija 2019) can be also used. Visualization: Similar to scRNA-seq data, dimension reduction methods such as the Uniform Manifold Approximation and Projection (UMAP) are key to visualize the heterogeneity of the data set. Nonetheless, given the additional modality provided by the spatial coordinates, spatial gene expression heatmaps can be generated, which can be compared against the imaging data (e.g., H&amp;, IHC, mIF) to gain further insights into overall tissue architecture. Clustering and cell/tissue domain phenotyping: There is a plethora of clustering approaches, ranging from employed in scRNA-seq analysis (e.g., Louvain) to novel neural network classification. Some methods take advantage of the spatial location information and/or tissue image to inform clustering. Compared to clustering, cell/domain phenotyping is an area of even more active development, within the majority of methods relying on the use of a comprehensive single-cell, tissue specific atlas from which cell types (i.e., “labels”) are obtained. Canonical marker-based phenotyping is still widely used, and in many cases unavoidable to identify specific cell populations. general, it is advisable to use the expert validation of a tissue biologist or pathologist to ascertain if clustering and phenotyping are capturing the tissue architecture adequately. 14.4 Spatial transcriptomic data strengths: Preservation of the spatial context: Spatial transcriptomics allows the investigation of gene expression patterns, cell types, and their interactions within the context of tissue spatial organization. Integration with imaging data: Spatial transcriptomics provides an additional data modality in the form of imaging data, such as histological images or fluorescence microscopy. This integration enhances the interpretation of spatial transcriptomic data by correlating gene expression patterns with tissue morphology and specific cellular structures. Discovery of novel cell-cell interactions and signaling pathways: By examining gene expression profiles in the spatial context, higher accuracy in the identification of novel cell-cell interactions and signaling pathways is obtained. Pairs of interacting genes can be identified by studying their level of co-localization (i.e., expressed in the same regions). Exploration of spatially regulated biological processes: Spatial transcriptomics enables the investigation of biological processes, such as spatial expression gradients or developmental processes occurring in specific regions. It provides insights into spatially restricted gene expression patterns associated with tissue patterning, morphogenesis, or cellular differentiation. Hypothesis generation and biomarker discovery: Spatial transcriptomic analysis can help in the generation of hypotheses and the identification of potential biomarkers related to specific tissue functions, regions, or disease states. By linking gene expression patterns to tissue organization and pathology, spatial transcriptomics facilitates the discovery of spatially restricted gene signatures and potential diagnostic or prognostic markers. 14.5 Spatial transcriptomic data weaknesses: Trade-off between spatial resolution and molecular resolution: Spatial transcriptomic techniques that provide whole transcriptome level information measure expression at the “mini-bulk” level (spots or ROIs), with each mini-bulk sample containing a collection of cells. Conversely, single-cell ST provide expression for a panel of genes (hundreds to a few thousands of genes). In addition, obtaining fine-grained spatial information may be challenging, especially in complex tissues or samples with high cellular density. Technical variability and experimental artifacts: Spatial transcriptomic analysis involves multiple experimental steps, including tissue processing, capture/hybridization, and sequencing/imaging. Each step introduces technical variability and potential experimental artifacts, which can impact the accuracy and reproducibility of the results. Controlling and minimizing these sources of variation is crucial but can be challenging. Zero excess and limited coverage of transcripts: Since most ST techniques use probes to capture of hybridize RNA transcripts, the resulting data may contain biases in the representation of certain RNA molecules. Additionally, spatial transcriptomic methods may have limitations in capturing certain RNA species or low-abundance transcripts, leading to a large portion of genes not being detected and contribution to zero-count excess. Complex Data Analysis: Analyzing spatial transcriptomic data requires advanced computational methods and expertise. The complexity of the data and the need for specialized bioinformatics tools and pipelines can pose challenges, particularly for researchers without extensive computational skills. Validation and integration challenges: Spatial transcriptomic analysis generates hypotheses and provides spatially resolved gene expression information. However, validating the functional significance of identified gene expression patterns or cellular interactions may require additional experimentation. Integrating spatial transcriptomic data with other omics data or imaging modalities can also be complex and may require careful data integration strategies. Cost and time considerations: Spatial transcriptomic analysis can be relatively expensive and time-consuming compared to traditional transcriptomic techniques. The specialized protocols, reagents, and instrumentation required can add to the cost of the analysis. Moreover, the data generation and analysis processes can be time-intensive, which may limit the scalability of studies involving large sample sizes. 14.6 Tools for spatial transcriptomics 14.6.1 Data processing: 14.6.1.1 Space Ranger Pros: Space Ranger is a software package developed by 10x Genomics specifically for processing and analyzing spatial transcriptomics raw data generated by their platform (Visium). It provides a streamlined workflow for processing raw data, including image registration, assignment of read counts to spots, and counting transcripts. Outputs from Space Ranger are commonly the input of many other ST analytical software. Cons: Space Ranger has been designed to process only 10x Genomics data. The software does not provide methods to extract insights, which is accomplished by integration with other analytical suites. Requires knowledge of command line use. 14.6.1.2 GeomxTools Pros: The GeomxTools R package has been designed to take outputs from the GeoMx Digital Spatial Profiler (DSP) platform. The package includes methods to use raw .dcc files and .pkc probe set files to generate count matrices per ROI. Support for normalization and transformation of counts are also included in GeomxTools. Cons: GeomxTools has been designed to process GeoMx DSP data outputs. Requires knowledge of R programming. 14.6.2 Data exploration: 14.6.2.1 Seurat Pros: Seurat is a widely used R package in single-cell data, with expanded capabilities to analyze ST data from multiple platforms. Seurat features direct integration with outputs from Space Ranger, MERSCOPE, CosMx-SMI, among others. It provides a variety of functions for data pre-processing, dimensionality reduction, clustering, and visualization. Seurat has a large user community, extensive documentation, and tutorials, making it accessible to researchers. Cons: Seurat can be memory-intensive, particularly when working with large data sets. It requires familiarity with R programming and bioinformatics concepts for effective use. Overall, methods in Seurat are the same methods applied to non-spatial scRNA-seq data. 14.6.2.2 Squidpy Pros: Scanpy is a Python-based library specifically designed for single-cell and ST analysis. It offers a range of functionalities for data pre-processing, clustering, trajectory analysis, and visualization. Scanpy is known for its scalability, efficiency, and flexibility. It integrates well with other Python libraries and frameworks, making it suitable for integration with other analysis pipelines. Some of the statistical methods in Squidpy implicitly make use of the spatial coordinates to detect patterns. Cons: Similar to Seurat, Scanpy requires some familiarity with Python programming and bioinformatics concepts. Users without prior programming experience may need to invest time in learning Python. 14.6.2.3 Giotto Pros: The analytical suite Giotto in a collection of methods to study spatial gene expression, agnostic to the platform used to generate the data. It allows users to perform data pre-processing, clustering, visualization, detection of spatially variable genes, and expression co-localization analysis. Computationally intensive analysis can be conducted in the cloud via integration with Terra.bio or locally using a Docker container. Some of the statistical methods in Giotto implicitly make use of the spatial coordinates to detect patterns. Cons: Requires some familiarity with R, as well as bioinformatics and spatial statistics concepts. Installation requires setting up Python, as some modules use that language. 14.6.2.4 spatialGE and spatialGE-web Pros: The spatialGE analysis suite allows users to study STdata form multiple platforms, including methods for pre-processing, clustering/domain detection, spatially variable genes, and functional analysis via detection of gene expression gradients and/or gene set enrichment spatial patterns. All the functionality of the R package has been implemented on a point-and-click web application requiring no coding experience and email notifications when analyses are completed. Statistcial methods in spatialGE implicitly take into account the spatial coordinates during calculations. Cons: Use of the spatialGE R package requires familiarity with the language. The spatialGE web application by-pass the need of R coding, however computationally-intensive methods can take time to complete. 14.6.2.5 Loupe Pros: The Loupe browser is a point-and-click tool for exploration of both non-spatial scRNA-seq and ST. Loupe takes Visium outputs and allows visualization of gene expression, clustering, and detection of differentially expressed genes. The tool also allows for easy registration and comparative analysis of Visium imaging and expression data. Cons: Loupe allows basic exploration of the data. To perform functional-level analysis of ST data, the use of additional tools might be required. 14.6.2.6 ST Pipeline Pros: ST Pipeline is a bioinformatics pipeline developed by the Spatial Transcriptomics consortium. It provides a complete workflow for ST data analysis, including pre-processing, normalization, spot detection, and visualization. ST Pipeline supports various spatial transcriptomic platforms, making it versatile. Cons: ST Pipeline requires familiarity with Python, command-line, and Linux environments. Users may need to invest time in setting up the pipeline and configuring parameters based on their specific datasets and platforms. 14.6.2.7 semla Pros: The semla R package is a bioinformatics pipeline enabling pre-processing, visualization, spatial statistics, and image integration of ST data. The package provides integration with Seurat. Cons: ST Pipeline requires familiarity with R. 14.6.3 Clustering/tissue domain identification: 14.6.3.1 SpaGCN Pros: The SpaGCN Python package performs prediction of tissue domains implicitly taking into account the spatial coordinates and optionally assisted by colors in the image data. The gene expression, coordinate, and image data are processed via graph convolutional networks (GCN) to find common patterns between the modalities. Based on predicted domains, SpaGCN can identify gene or collection of genes (meta genes) that are uniquely expressed in the domains. SpaGCN allows analysis of multiple ST technologies. Cons: SpaGCN requires familiarity with Python and basic data frame processing. Some understanding of GCNs and parameters involved in calculations is advisable. 14.6.4 Spatially variable gene identification: 14.6.4.1 SpatialDE Pros: SpatialDE is a Python package designed for detecting spatially variable genes from ST data using non-parametric statistics. SpatialDE intergrates the spatial coordinates and image data to identify genes or group of genes showing spatial expression aggregation. The package can analyze data from multiple ST platforms. Cons: SpatialDE requires familiarity with Python programming. 14.6.4.2 SPARK and SPARK-X Pros: The SPARK methods allows scalable detection of genes showing spatial patterns. The tests are performed via generalized linear models and spatial autocorrelation matrix estimation. The SPARK implementation allows scalabilty and computing efficiency. Cons: The SPARK methods require familiarity with Python programming. Some familiarity with spatial statistics is advisable. 14.6.4.3 SpaceMarkers Pros: The SpaceMarkers approach detects sets of genes with evidence of spatial co-expression. Kernel smoothing is used to model the weight of expression of a gene taking into account neighboring areas. Cons: Requires familiarity with R programming. The method has been tested in Visium data. 14.6.5 Deconvolution/phenotyping: 14.6.5.1 SPOTlight Pros: The SPOTlight algorithm takes advantage of robust non-negative matrix factorization (NMF) to define transcriptomic profiles from an annotated scRNA-seq reference. The transcriptomic profiles are transferred to the spatial transcriptomics data using non-negative least squares regression. Instead of providing a single category for “mini-bulk” data (e.g., Visium), SPOTlight features piecharts to describe the cell type composition within each mini-bulk sample (e.g., spot). Cons: Requires some familiarity with R programming. The method has been tested in Visium data. As with most deconvolution methods, accurate identification of cell types highly relies on a well-annotated scRNA reference. 14.6.5.2 STdeconvolve Pros: The STdeconvolve algorithm uses latent dirichlet allocation (LDA) to define transcriptomic profiles or topics on the ST data. The topics are assigned a biological identity (e.g., cell type, tissue domain) using gene set enrichment of marker-based phenotyping. The topics are presented as proportions in “mini-bulk” data (e.g., Visium), where pie charts describe the cell type/domain composition within each mini-bulk sample (e.g., spot). STdeconvolve is one of very few reference-free ST deconvolution methods. Cons: Requires some familiarity with R programming. The method has been mostly tested in Visium data. For MERFISH data, requires aggregation into spots. 14.6.5.3 InSituType Pros: InSituType is a cell phenotyping algorithm designed for CosMx-SMI data but applicable to other single-cell ST data. InSituType can transfer cell types from an annotated scRNA-seq data set, or run reference-free unsupervised clustering to detect cell populations. In addition, immunofluorescence data accompanying SMI data sets can be used to inform gene expression deconvolution. InSituType can phenotype large quantities of cells within reasonable time. Cons: InSituType assumes cell populations can be defined via cluster centroids. Thus, deconvolution can be affected when samples contain cells with intermediate phenotypes or if technical/background noise is prevalent. Requires familiarity with R programming. 14.6.5.4 SpatialDecon Pros: The SpatialDecon algorithm implements log-normal regression to alleviate the effects of ST data skewness in the prediction of cell types. The method is analogous to estimation of cell types proportions in bulk RNAseq to “mini-bulk” ROIs or spots in GeoMx and Visium experiments respectively. Hence, the method assumes cell type heterogeneity within the ROIs or spots. In the case of GeoMx experiments, SpatialDecon takes advantage of nuclei counts to provide absolute cell type counts within each ROI. The package includes pre-built cell type signature matrices for several tissue types, but scRNA references can be used to create custom signatures. Cons: Requires familiarity with R programming. 14.6.6 Cell communication: 14.6.6.1 CellChat Pros: CellChat is an algorithm to infer cell communications via ligand-receptor interactions. CellChat was designed for non-spatial scRNA data, however, a recent implementation has been included to account for distances between cells in ST experiments. The package includes a comprehensive ligand-receptor data base which is queried after quantification of probability of interaction between two given cell types. Cons: Requires familiarity with R programming. The spatial implementation of CellChat has been tested on Visium data. 14.7 More tools and tutorials regarding spatial transcriptomics Analysis, visualization, and integration of spatial datasets with Seurat Sheffield Bioinformatics tutorial for spatial transcriptomics Theis Lab SCOG workshop materials for spatial transcriptomics Visualization, domain detection, and spatial heterogeneity with spatialGE References "],["chromatin-methods-overview.html", "Chapter 15 Chromatin Methods Overview 15.1 Learning Objectives 15.2 Why are people interested in chromatin? 15.3 What kinds of questions can chromatin answer? 15.4 Comparison of technologies", " Chapter 15 Chromatin Methods Overview This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. In its existing form, this chapter has been written with AI and still needs further verification by experts. 15.1 Learning Objectives 15.2 Why are people interested in chromatin? Chromatin plays a crucial role in regulating gene expression, which is essential for a wide range of biological processes. It is the complex of DNA and proteins that make up the structure of chromosomes in the nucleus of a cell. The DNA in chromatin is packaged around histone proteins in a way that can either promote or inhibit access to the DNA by other proteins that control gene expression. Specifically, chromatin structure can affect the ability of transcription factors and RNA polymerase to bind to and transcribe genes. Changes in chromatin structure can lead to changes in gene expression, which can have profound effects on cell function and development. For example, chromatin remodeling is a key step in cell differentiation, during which cells become specialized and take on specific functions. Dysregulation of chromatin structure can also lead to the development of diseases, such as cancer, in which aberrant gene expression contributes to uncontrolled cell growth and proliferation. Therefore, understanding the mechanisms that regulate chromatin structure and function is crucial for advancing our understanding of cellular processes, disease development, and potential therapies. This is why chromatin research has become a major area of focus in molecular biology and genomics research. 15.3 What kinds of questions can chromatin answer? How are genes turned on and off in response to developmental cues or environmental stimuli? What are the mechanisms by which chromatin structure is altered during cell differentiation and development? How do epigenetic modifications, such as DNA methylation and histone modifications, affect chromatin structure and gene expression? How does chromatin structure influence the binding of transcription factors and other regulatory proteins to specific regions of the genome? How is chromatin structure altered in diseases such as cancer, and how can this knowledge be used to develop new therapies? How can we manipulate chromatin structure to selectively activate or repress specific genes, and what are the potential applications of such approaches? 15.3.1 Chromatin is involved in a variety of biological processes: Gene expression: Chromatin structure and organization play a crucial role in regulating gene expression. The packaging of DNA around histone proteins can either promote or inhibit access to the DNA by other proteins that control gene expression. DNA replication and repair: Chromatin structure can also affect DNA replication and repair. For example, histone modifications and chromatin remodeling can facilitate access to DNA replication and repair machinery. Epigenetic regulation: Epigenetic modifications, such as DNA methylation and histone modifications, can be stably inherited and play a critical role in the regulation of gene expression. Cell differentiation: Chromatin structure is dynamically regulated during cell differentiation and plays a key role in determining cell fate and function. Development: Chromatin structure also plays an important role in the regulation of developmental processes, such as morphogenesis and organogenesis. Disease: Dysregulation of chromatin structure and function is associated with a wide range of diseases, including cancer, neurodegenerative disorders, and developmental disorders. 15.4 Comparison of technologies 15.4.1 ATAC-seq: ATAC-seq (Assay for Transposase Accessible Chromatin using sequencing) is a technique that uses transposases to fragment DNA and insert sequencing adapters into accessible chromatin regions. The DNA fragments are then sequenced to identify regions of open chromatin. This technique is widely used to study the epigenetic regulation of gene expression. 15.4.1.1 When to use ATAC-seq: When you want to study the epigenetic regulation of gene expression. When you want to identify open chromatin regions associated with regulatory elements such as enhancers and promoters. When you want to study various cell types and tissues, including difficult-to-access cell types. 15.4.1.2 Advantages: ATAC-seq is a simple and cost-effective technique that requires a low amount of starting material. It allows the identification of open chromatin regions, which are usually associated with regulatory elements such as enhancers and promoters. ATAC-seq can be used to study various cell types and tissues, including difficult-to-access cell types. 15.4.1.3 Disadvantages: ATAC-seq can have high background noise due to non-specific cleavage of chromatin. It may miss lowly accessible regions due to a bias towards highly accessible regions. It is difficult to identify the specific regulatory elements that are associated with open chromatin regions. 15.4.2 Single-cell ATAC-seq: Single-cell ATAC-seq is a technique that combines single-cell sequencing and ATAC-seq to identify open chromatin regions in individual cells. This technique allows the study of epigenetic heterogeneity between cells and the identification of cell-specific regulatory elements. 15.4.2.1 When to use single-cell ATAC-seq: When you want to study the epigenetic heterogeneity between cells and identify cell-specific regulatory elements. When you want to identify rare cell types or rare cell states that may be missed by bulk techniques. When you want to study the epigenetic dynamics of cells in response to environmental changes. 15.4.2.2 Advantages: Single-cell ATAC-seq allows the identification of open chromatin regions in individual cells, which provides cell-specific epigenetic information. It can identify rare cell types and rare cell states that may be missed by bulk techniques. It can be used to study the epigenetic dynamics of cells in response to environmental changes. 15.4.2.3 Disadvantages: Single-cell ATAC-seq can have a higher level of technical noise due to the low amount of starting material. It can be challenging to obtain high-quality single-cell suspensions from tissues. It can be difficult to analyze the large amount of data generated by single-cell sequencing techniques. 15.4.3 ChIP-seq: ChIP-seq (Chromatin Immunoprecipitation sequencing) is a technique that uses antibodies to isolate specific DNA-protein complexes, such as transcription factors or histone modifications. The DNA fragments associated with the protein complexes are then sequenced to identify the genomic regions that are bound by the protein. 15.4.3.1 Advantages: ChIP-seq allows the identification of specific protein-DNA interactions, which provides information on the regulation of gene expression. It can be used to study the epigenetic changes associated with specific cellular processes, such as differentiation or development. ChIP-seq can identify the binding sites of transcription factors, which can be used to identify regulatory elements such as enhancers and promoters. 15.4.3.2 Disadvantages: ChIP-seq requires a high amount of starting material and can be costly. It can have a high level of background noise due to non-specific binding of antibodies. It can be challenging to perform 15.4.4 CUT&amp;RUN CUT&amp;RUN (Cleavage Under Targets &amp; Release Using Nuclease) is a relatively new genomic method that involves the targeted cleavage of DNA by a specific antibody or protein of interest, followed by the release and sequencing of the DNA fragments. The CUT&amp;RUN method was developed as a more streamlined alternative to the ChIP-seq (Chromatin Immunoprecipitation sequencing) method, which involves a more complex series of steps Skene and Henikoff (2018). 15.4.4.1 How CUT&amp;RUN works: Cells are permeabilized and incubated with a specific antibody or protein of interest. This antibody or protein is fused to a protein called Protein A-Micrococcal Nuclease (pA-MNase). After incubation, the pA-MNase is activated and cleaves the DNA in the vicinity of the bound antibody or protein of interest. The released DNA fragments are then purified and sequenced to identify the genomic regions that were bound by the antibody or protein of interest. CUT&amp;RUN has several advantages over ChIP-seq, including: CUT&amp;RUN requires a lower amount of starting material and can be performed more quickly than ChIP-seq. CUT&amp;RUN produces less background noise, as the DNA is cleaved in situ, rather than being fragmented by sonication or other methods. CUT&amp;RUN can be used to study chromatin-associated proteins that may not be easily solubilized for ChIP-seq. 15.4.5 CUT&amp;Tag CUT&amp;Tag (Cleavage Under Targets and Tagmentation) is similar to CUT&amp;RUN. It was developed as an improvement over CUT&amp;RUN, with the goal of reducing the amount of background noise and improving the efficiency of the method (Kaya-Okur et al. 2019). 15.4.5.1 How CUT&amp;Tag works: Cells are permeabilized and incubated with a specific antibody or protein of interest, which is fused to a protein called Protein A-Tn5 transposase. The Protein A-Tn5 transposase inserts sequencing adapters into the genomic DNA in the vicinity of the bound antibody or protein of interest. The DNA is then released from the chromatin by the Protein A-Tn5 transposase and purified for sequencing. Like CUT&amp;RUN, CUT&amp;Tag allows for the specific cleavage of DNA in the vicinity of a target protein or antibody, but the addition of sequencing adapters in CUT&amp;Tag occurs directly in the nucleus, prior to DNA release. This results in less background noise and more efficient DNA recovery. 15.4.5.2 Advantages: CUT&amp;Tag has a lower level of background noise and higher sensitivity due to the addition of sequencing adapters in situ. CUT&amp;Tag requires less input material than CUT&amp;RUN, which makes it a more efficient method. CUT&amp;Tag can be used to study the binding sites of transcription factors and chromatin-associated proteins. Overall, both CUT&amp;RUN and CUT&amp;Tag are powerful genomic methods that allow for the efficient study of protein-DNA interactions and epigenetics. The choice between the two methods may depend on the specific research question and the availability of specific reagents or equipment. 15.4.6 GRO-seq (Global Run-On sequencing) Allows for the genome-wide analysis of transcriptional activity by measuring the nascent RNA transcripts that are actively being synthesized by RNA polymerase. GRO-seq is a high-throughput sequencing-based technique that provides a snapshot of the transcriptional landscape of a cell Park and Won (2018). 15.4.7 How GRO-seq works: Nuclei are isolated from cells and incubated with a biotinylated nucleotide triphosphate, which is incorporated into nascent RNA transcripts by RNA polymerase. The labeled RNA is then selectively captured using streptavidin beads, and the RNA is reverse-transcribed into cDNA. The cDNA is then sequenced to identify the regions of the genome that are actively transcribed. 15.4.7.1 Advantages: Its ability to distinguish between the sense and antisense strands of transcribed RNA Its ability to quantify the level of transcriptional activity in individual genes Its ability to identify novel transcripts and transcriptional start sites. DNase-seq and MNase-seq are alternative approaches which can be used to identify accessible regions of chromatin. MNase-seq is particularly useful for studying the occupancy of nucleosomes or transcription factors with high resolution. DNase-seq uses DNAse I to cleave DNA at hypersensitive sites typically associated with cis-regulatory elements. It is also possible to footprint TF occupancy with base-pair level resolution using DNase-seq, while the quality of ATAC-seq footprinting is still in question. Additionally, although both DNAse-seq and MNase-seq have sequence biases as well, the sequence preference is different for each enzyme. References "],["atac-seq-1.html", "Chapter 16 ATAC-Seq 16.1 Learning Objectives 16.2 What are the goals of ATAC-Seq analysis? 16.3 ATAC-Seq general workflow overview 16.4 ATAC-Seq data strengths: 16.5 ATAC-Seq data limitations: 16.6 ATAC-Seq data considerations 16.7 ATAC-seq analysis tools 16.8 Additional tutorials and tools 16.9 Additional tutorials and tools 16.10 Online Visualization tools 16.11 More resources about ATAC-seq data", " Chapter 16 ATAC-Seq This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 16.1 Learning Objectives 16.2 What are the goals of ATAC-Seq analysis? The goals of ATAC-seq are to identify the accessible regions of the genome in a particular set of samples. These data allow us to understand the relationships between the chromatin accessibility patterns and cell states, and to understand the mechanistic causes and consequences of these chromatin accessibility patterns. ATAC-seq data is generated by fragmenting the genome with the Tn5 endonuclease and sequencing the shorter DNA fragments. While most of the genome is associated with protein complexes that preclude the digestion of DNA by Tn5, some regions of the genome have accessible chromatin that can be cleaved by Tn5 resulting in short (&lt;500bp) fragments. These regions of the genome are of biological interest as they are likely to harbor transcription factor binding sites and to constitute cis-regulatory elements, genomic regions that are involved in the regulation of gene expression. 16.2.1 What questions can be answered with ATAC-seq? 16.3 ATAC-Seq general workflow overview A basic ATAC-seq workflow involves mapping sequence reads to the genome, identifying peaks, assessing data quality, and identifying patterns of interest through clustering or identification of differentially accessible regions or other statistical means. 16.3.1 Data quality metrics: 16.3.1.1 Pre-sequencing QC: 16.3.1.2 Sequencing considerations: 16.3.1.3 Pre-alignment QC: A tool like FastQC or similar should be used to check for GC content, read quality and length, and primer or adapter reads prior to alignment. Trimmomatic is a useful tool for removing primer and adapter sequences if they are present. ATAC-seq experiments should be sequenced with paired-end sequencing, and existing pipelines will expect paired-end. (2 files *_R1.fastq and *_R2.fastq) Use fasterq-dump to download files from NCBI Sequence Read Archive - this tool will automatically split the reads in multiple files 16.3.1.4 Number of mapped reads As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about general sequencing technologies in our previous chapter here. For experiments on human samples this number should be greater than 20 million mapped unique reads. Bowtie2 is commonly used for mapping fragments to the genome. As for all DNA-sequencing based genomics technologies, a sufficient number of mapped reads is required to obtain meaningful results from a sample. You can read more about general sequencing technologies in our previous chapter here. For experiments on human samples this number should be greater than 20 million mapped unique reads. 16.3.1.5 Post-alignment QC: Post alignment: check percent of matched, unmatched, unpaired and duplicated reads. Reads which are duplicated or unmatched should be filtered out. Picard is a useful tool for this step. Reads on the + strand should be shifted +4bp, reads on the - strand should be shifted -5 bp. 16.3.1.6 Fragment size distribution: ATAC-seq data is often generated using paired end sequencing technologies, which allow for characterization of ATAC-seq fragments. Histograms of these distributions using single base pair resolution bins reveal patterns of enrichment relative to the nucleosome scale of 147bp and the DNA-helix scale ~10.5bp. When comparing ATAC-seq samples, it is important to consider the fragment size distributions of the samples being compared. Differences in the distributions could lead to results that are unrelated to biology. 16.3.1.7 Peak calling: ATAC-seq peak calling typically makes use of analysis tools developed for ChIP-seq. MACS2 is one of the most common choices for a peak calling tool, but HOMER or other common ChIP-seq peak callers are also acceptable. An input sample is not typically generated for ATAC-seq as it would be for a ChIP-seq experiment, so the major requirement for the peak caller is that it does not require the input control to call peaks. #### Number of peaks: Although the number of accessible chromatin regions can vary from one cell type to another, there are several regions that appear to be constitutively accessible across most cell types. At least 20,000 peaks can be identified in a high quality experiment. The deeper the sequencing the more peaks will be detected in an ATAC-seq experiments. At a very high sequencing depth some of the statistically significant peaks might not be of biological interest. In an analysis of such data sets the fold enrichment relative to background, or absolute peak signal, in addition to statistical significance, ought to be taken into account. 16.3.1.8 FRiP score (fraction of reads in peaks) In high quality ATAC-seq data a large fraction of reads overlap with peaks, while in low quality data there is a high level of fragments that map to background regions. Ideally, the FRiP score is greater than 0.3 (30 percent or more of reads overlap with peaks), with a score below 0.2 indicating low-quality data 16.3.1.9 Overlap with other chromatin accessibility data Thousands of ATAC-seq samples have been produced in human and mouse. High quality ATAC-seq data will share a substantial proportion of peaks with many of these datasets. Publicly available ATAC-seq data can be found and comparisons made at the Cistrome Data Browser [http://cistrome.org/db/]. 16.3.1.10 Overlap with promoters The promoter regions of many genes are constitutively accessible. Examining peak overlap with regions close to known protein coding gene transcription start sites can be used as a check for data quality. 16.3.2 Information from ATAC-seq analysis: 16.3.2.1 Major approaches: Compare changes in transcription factor motif enrichment in accessible regions between samples Compare changes in accessibility of regions (differential accessibility) between samples Footprinting - identify regions where insertion is below expected level 16.3.2.2 Differential accessibility analysis: Differential accessibility analysis typically uses packages for RNA-seq differential expression analysis such as DEseq2, edgeR, or limma. All three are available as R packages and can be installed using Bioconductor, a bioinformatics package manager for R. Unfortunately, there are no well-established packages for this analysis in other languages such as Python. Differential accessibility analysis is an approach with high potential, but care must be taken in processing and normalizing the data for accurate results. 16.3.2.3 Motif analysis: Motif analysis in ATAC-seq is more complex than for ChIP-seq because a larger set of TFs are responsible for the emergence of chromatin accessible regions than for the binding sites of a particular TF. Nevertheless, in the analysis of differential ATAC-seq peaks motif analysis can be used to reveal the TFs related to differences between conditions. This type of analysis is most likely to be successful when the ATAC-seq between closely related conditions or cell types is being compared. The MEME suite has a variety of tools for motif analysis available in both web and command-line versions. 16.3.2.4 Motif Scanning Motif scanning is an analysis technique which identifies putative transcription factor binding sites (TFBS) which sufficiently match a given TF motif’s position-weight matrix. PWMscan is a straightforward online tool, but not the best option for high throughput. FIMO is an alternative which can be used either on the web or the command line. This approach will identify all sites within the genome which are likely to bind a single transcription factor. 16.3.2.5 Motif discovery: Homer or MEME. These tools identify overrepresented sequences within the accessible peaks, regardless of whether they match a previously defined motif. Once the ATAC-seq peaks are determined, the next step is to search for enriched DNA sequence motifs within these regions. This is accomplished by using motif discovery algorithms such as MEME Suite, HOMER, or DREME. These tools scan the ATAC-seq peaks for overrepresented sequence patterns, which may correspond to binding sites for specific transcription factors or other regulatory elements. The motifs discovered can be compared against existing motif databases, such as JASPAR or TRANSFAC, to annotate the potential transcription factor binding sites. 16.3.2.6 Motif Enrichment: These motif enrichment tools will scan through and identify matches to known motif sequences within accessible sites, and additionally will quantify whether the motif is significantly enriched compared to a control sample (input, uncommon with ATAC-seq) or a shuffled sequence to mimic background. After identifying the enriched motifs, researchers can perform motif enrichment analysis to determine the significance of these motifs in the ATAC-seq peaks. This is often done using statistical tools like Fisher’s exact test or hypergeometric test, which assess the enrichment of specific motifs compared to their background occurrence in the genome. Additionally, tools like GREAT or HOMER can be employed to perform gene ontology analysis and assess the functional relevance of the identified motifs in biological processes and pathways. Overall, ATAC-seq motif enrichment analysis provides researchers with valuable insights into the regulatory landscape of the genome. By identifying enriched motifs within accessible chromatin regions, researchers can gain a deeper understanding of the transcriptional regulatory networks and potentially uncover novel transcription factors involved in specific biological processes or diseases. This analysis serves as a powerful tool for unraveling the intricacies of gene regulation and can pave the way for further investigations in functional genomics and therapeutic development. Homer or MEME suite tools. 16.4 ATAC-Seq data strengths: The ATAC-seq is easy to adopt and has been used by many laboratories to generate high quality data for characterizing accessible chromatin in cell lines or sorted cells derived from tissues. In principle, ATAC-seq can identify a large proportion of cis-regulatory elements. In contrast to ChIP-seq, ATAC-seq does not require specific antibodies- ATAC-seq is a time-efficient protocol which requires low cell input. In comparison with histone modification ChIP-seq, ATAC-seq provides a higher resolution assessment of the cis-regulatory genomic regions. Histone modification ChIP-seq, in contrast, tends to be localized on nucleosomes flanking the site of interest and can spread to nucleosomes beyond the immediate flanking ones. 16.5 ATAC-Seq data limitations: ATAC-seq does not precisely identify the transcription factors or other chromatin associated factors that bind in or around chromatin accessible regions. This type of information needs to be inferred through analysis of transcription factor binding motif analysis or ChIP-seq data. Whereas ATAC-seq indicates the presence of a putative cis-regulatory element, H3K27ac ChIP-seq is able to separate accessible regions from those that are accessible and active. Accessible regions are not necessarily cis-regulatory regions, although many of them are. The genes that are regulated by cis-regulatory elements cannot be identified conclusively by ATAC-seq alone. ATAC-seq data can be biased, and affected by batch effects like any other genomics data type. When comparing ATAC-seq data good experimental design principles like the inclusion of biological replicates and consideration of controls, are needed for a meaningful outcome. . 16.6 ATAC-Seq data considerations The nucleosome is the fundamental unit of chromatin packaging in the genome and nucleosomal DNA is far less likely to be cleaved by the Tn5 nuclease than linker DNA. When DNA is fragmented by Tn5 the positions of the endpoints relative to the nucleosomes is an important consideration. When the ends are less than 147bp apart it is likely that both ends originate from the same linker region. Longer fragments can result from cuts on opposite sides of the same nucleosome, or even opposite sides of a genomic interval that encompasses multiple nucleosomes. The short fragments are therefore most likely to be nucleosome free and provide stronger evidence for transcription factor binding sites. As will other genomics protocols, ATAC-seq data is subject to biases introduced in the ATAC-seq protocol and in the sequencing itself. Comparison of ATAC-seq data generated in different batches, by different laboratories or using different protocols might not be directly comparable. In addition, the Tn5 endonuclease does have biases in the precise DNA sequences it can cut. This should be taken into consideration when carrying out base pair resolution analyses including footprinting analysis and analysis of the effects of sequence variants on chromatin accessibility. Read depth will impact ATAC-seq signal, but enzyme strength and conditions can also alter the distribution of cuts. When using ATAC-seq data to answer biological questions it is important to understand what types of bias could impact the results. To ensure valid results the analysis needs to use appropriate statistical methods, ensure enough high quality ATAC-seq data is available, including controls, and possibly reframing the questions. 16.7 ATAC-seq analysis tools This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. MACS2(Y. Zhang et al. 2008): Pros: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples. Cons: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases. HOMER(Heinz et al. 2010): Pros: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data. Cons: may not be as accurate as other peak-calling tools in some cases. ATACseqQC(Schep et al. 2017): Pros: provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity. Cons: does not perform peak-calling or downstream analysis. deeptools(Ramı́rez et al. 2016): Pros: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility. Cons: may require some programming skills to use effectively. DFilter (Ghavi-Helm et al. 2019): Pros: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases. Cons: may require more computational resources than other tools. 16.8 Additional tutorials and tools This section has been written by AI and needs verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. MACS2(Y. Zhang et al. 2008): Pros: widely used, handles both paired-end and single-end sequencing data, allows for differential peak calling between different samples. Cons: assumes that all peaks have the same shape, may not be as accurate as other peak-calling tools in some cases. HOMER(Heinz et al. 2010): Pros: includes tools for peak-calling, motif analysis, and annotation of nearby genes, user-friendly interface, handles both paired-end and single-end sequencing data. Cons: may not be as accurate as other peak-calling tools in some cases. ATACseqQC(Schep et al. 2017): Pros: provides several metrics and plots for evaluating data quality, identifies potential issues with data such as batch effects, sequencing depth, and library complexity. Cons: does not perform peak-calling or downstream analysis. deeptools(Ramı́rez et al. 2016): Pros: includes tools for normalization, visualization, and comparison of ATAC-seq data, generates heatmaps, profiles, and other plots for visualizing chromatin accessibility. Cons: may require some programming skills to use effectively. DFilter (Ghavi-Helm et al. 2019): Pros: uses a deep learning approach to predict the likelihood of a genomic region being an ATAC-seq peak, can handle both paired-end and single-end sequencing data, has been shown to outperform other peak-calling tools in some cases. Cons: may require more computational resources than other tools. 16.9 Additional tutorials and tools A Galaxy based tutorial for ATAC-seq - Galaxy is a good recommendation for those new to informatics who would like a cloud-based GUI option to use for the analysis of their data. MACS - Model-based analysis for ChIP-Seq - A command line tool for the identification of transcription factor binding sites. Can be used with ChIP-seq or ATAC-seq. CHIPS - A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data. This tool will require some snakemake and coding knowledge. For more recommendations about coding see our later chapter about general data analysis tools. Cistrome DB - a visual tool to allow you to browse your ATAC-seq data. SELMA - Simplex Encoded Linear Model for Accessible Chromatin - SELMA is a python based tool for the assessment of biases in Chromatin based data. 16.10 Online Visualization tools Cistrome DB - a visual tool to allow you to browse your ATAC-seq data. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ATAC-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. 16.11 More resources about ATAC-seq data ATAC-seq overview from Galaxy - these slides explain the overarching concepts of ATAC-seq. ATAC seq guidelines from Harvard - this workflow runs through step by step how to analysis ATAC-seq data and what different parameters mean. ATAC-seq review - this paper gives a great overview of ATAC-seq data and step by step what needs to be considered. Identifying and mitigating bias in chromatin CHIP Snakemake pipeline for analyzing ChIP-seq and chromatin accessibility data Paper on bias in DNase-seq footprinting analysis and fragment size effects, similar comments apply to ATAC-seq SELMA Method for evaluating footprint bias in ATAC-seq References "],["single-cell-atac-seq-1.html", "Chapter 17 Single cell ATAC-Seq 17.1 Learning Objectives 17.2 What are the goals of scATAC-seq analysis? 17.3 scATAC-seq general workflow overview 17.4 Peak calling 17.5 Dimensionality reduction 17.6 Embedding (visualization) 17.7 Clustering 17.8 Cell type annotation 17.9 scATAC-seq data strengths: 17.10 scATAC-seq data limitations: 17.11 scATAC-seq data considerations 17.12 scATAC-seq analysis tools 17.13 Trajectory analysis 17.14 Motif detection (ex. ChromVar) 17.15 Regulatory network detection 17.16 Tools for data type conversion 17.17 More resources and tutorials about scATAC-seq data", " Chapter 17 Single cell ATAC-Seq This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 17.1 Learning Objectives 17.2 What are the goals of scATAC-seq analysis? The primary goal of single-cell ATAC-seq is to obtain a high-resolution map of chromatin accessibility at the single-cell level. It is often used for the identification of cell type-specific cis-regulatory elements (CREs) or transcription factor (TF) binding sites because single-cell resolution enables researchers to parse heterogeneous subgroups within a sample. Single-cell ATAC-seq is often applied to questions in developmental biology and cell differentiation. 17.3 scATAC-seq general workflow overview Align reads to genome and assign to cells based on barcodes This step can be performed using Cell Ranger if the data were generated using a 10X Genomics kit (commercially available). For other methods, this step largely resembles the alignment step of bulk ATAC-seq analysis, using aligners such as Bowtie2 or BWA, filtering tools such as Picard, and adapter-trimming tools such Trimmomatic. Prior to adapter trimming barcodes should be matched to the list of known barcodes generated in the experiment and either assigned to a cell or assigned as ambiguous. At this stage unique molecular identifiers (UMIs) added to fragments during library preparation are also extracted and associated with each read to allow for PCR deduplication. Quality control The most important considerations for single-cell ATAC-seq are the number of unique fragments per cell, the transcription start site (TSS) enrichment score and detection of doublets. The number of unique fragments in a cell is a critical quality control metric for single-cell ATAC-seq. Cells with a low fragment count do not provide enough information to draw conclusions about their characteristics, and cells with extremely high fragment counts are likely to be doublets containing reads from multiple cells. To determine the number of unique reads per cell, short random barcodes termed unique molecular identifiers (UMIs) are added to the fragments during library preparation. After the reads have been aligned to the genome and grouped by their cell barcodes, the UMIs can be used to remove PCR duplicates by retaining only one copy of reads with the same UMI and genomic location. The resulting UMI counts can be used as a more accurate measure of chromatin accessibility at specific genomic regions in individual cells. An additional step is typically taken to filter out reads mapping to the mitochondrial genome, so that the final unique fragment counts consist of only unique reads corresponding to nuclear DNA. The TSS enrichment score in ATAC-seq measures the preferential accessibility of chromatin regions near gene promoters. This approach was established in pipelines for bulk ATAC-seq, such as the ENCODE pipeline (cite), and is also applicable to single-cell ATAC-seq. In brief, the TSS enrichment score quantifies the enrichment of open chromatin regions at TSSs versus a non-TSS background (e.g. +/-2000 bp beyond TSSs). A high TSS enrichment score therefore indicates that the number of accessible regions at TSSs, where high accessibility is expected, is significantly higher than background (cite), while a low TSS enrichment score indicates that the data quality is not high enough to distinguish accessible regions from background insertion patterns. Doublet detection is any approach that attempts to computationally identify cell barcodes which contain reads from a mixture of single cells. Although an extremely high number of fragment counts may indicate that a cell is in fact a doublet, doublet detection provides a more targeted approach by assigning a score or a probability that each cell is a doublet. These approaches may compare cells to simulated doublets generated randomly from the data, or may rely on the fact that the number of ATAC-seq reads in a single cell is limited to only two reads per cell for diploid organisms. This step is not as common in scATAC-seq analysis as it is in single cell RNA-seq analysis owing to the difficulty of estimating doublets from the highly sparse data, but can be done for additional rigor or if there is particular concern that the dataset contains a high number of doublets. Additionally, the fragment size distribution of the library should exhibit nucleosomal periodicity, where fragments are enriched at ~147 bp intervals corresponding to the length of nucleosome-bound DNA that are refractory to Tn5 insertion. 17.4 Peak calling Peak calling in ATAC-seq is performed in a similar manner to bulk ATAC-seq [ref bulk chapter]. Importantly, it should be performed by treating data from all cells within a cluster as a pseudo-bulk replicate. This is because scATAC-seq data is highly sparse and any individual cell only has enough information to convey whether a region is accessible or inaccessible, due to the maximum of 2 reads per locus per cell. Peak calling is commonly performed using MACS2, but other peak callers suitable for ATAC-seq could be used as well, as described in our chapter on bulk ATAC-seq (reference). 17.5 Dimensionality reduction As ATAC-seq data is extremely high dimensional, with counts for hundreds of thousands of peaks in thousands of cells, dimensionality reduction must be performed to represent the data in a way which reflects the major sources of variation while allowing for efficient computation. Many of the most popular dimensionality reduction approaches for ATAC-seq are borrowed from natural language processing, including latent semantic indexing (LSI) as well as probabilistic approaches such as latent Dirichlet allocation (LDA) and probabilistic LSI (pLSI). LSI and its variations are commonly used and are a simple, efficient approach based on PCA. Probabilistic approaches calculate the probability of information in a dataset being related to specific ‘topics’ identified by the statistical model. They are more mathematically complex than LSI but attempt to more accurately reconstruct the latent (not observable) structure in the data. 17.6 Embedding (visualization) Embedding is the process of representing the high-dimensional scATAC-seq dataset in two (or occasionally three) dimensions for visualization. First, dimensionality reduction must have been performed using one of the methods described in the section above. Then, the result of dimensionality reduction can be provided as input to the chosen embedding approach. The most common method for generating ATAC-seq embeddings is UMAP (Uniform Manifold approximation) but other methods, such as force-directed graph layouts or t-SNE (t-distributed Stochastic Neighbor Embedding) can also be used. 17.7 Clustering Clustering is the process of computationally detecting populations of cells with similar characteristics - in this case, cells with similar accessibility profiles. Leiden clustering, which uses the similarity of cells to their neighbors to group cells into clusters, is a common choice for identifying clusters in scATAC-seq data. 17.8 Cell type annotation Cell type annotation on scATAC-seq data alone can be performed based on the enrichment of cell-type-specific CREs, or alternatively can be performed based on gene expression patterns observed in integrated scRNA-seq data. Gene scores are a measure of the accessibility of a gene locus and putative CREs within a defined window of the gene. Gene scores significantly above the expected background suggest a gene is active in a given cell type, and these scores can be used to identify markers for cell type annotation. Integration with scRNA-seq data can allow for identification of cell types which may be difficult to distinguish based on ATAC-seq profiles alone(ref), but requires an scRNA-seq dataset of a comparable population of cells. Trajectory analysis, which is used to infer and visualize the developmental or differentiation paths of individual cells within a population, can be performed on processed single-cell ATAC-seq data using tools developed for single-cell RNA-seq data. These approaches aim to reconstruct the temporal progression and identify the key intermediate states or cell fate decisions during biological processes such as embryonic development, tissue regeneration, or disease progression. Trajectory inference algorithms, such as: Monocle Qiu et al. (2017) Palantir Setty et al. (2019) PAGA Wolf et al. (2019) These are commonly used to reconstruct the developmental trajectories and order the cells along these trajectories. The resulting trajectory models provide valuable insights into the underlying regulatory dynamics, lineage relationships, and critical regulatory genes or pathways governing cellular differentiation and development. Much like peak calling, it is not possible to obtain enough information from individual cells to perform differential accessibility analysis at the single cell level. Because of this limitation, differential accessibility analysis is performed in a similar manner to bulk ATAC-seq analysis using pseudo-bulk data at the cluster or cell type level, where counts from many single cells are aggregated together and treated as though they are a single sample generated from a bulk experiment. Common tools for differential accessibility analysis include deSeq2 and EdgeR, which were both developed for differential gene expression analysis. 17.9 scATAC-seq data strengths: scATAC-seq is the gold-standard for showing heterogeneity in chromatin accessibility between populations of cells and within tissues because single-cell resolution enables analysis of subpopulations that are challenging to isolate experimentally. scATAC-seq can be paired with scRNAseq to obtain transcriptome and chromatin accessibility measurements from the same cells. This is a powerful approach for gaining understanding of how specific patterns of chromatin accessibility affect gene expression. scATAC-seq is also a relatively high throughput technique, particularly with droplet based techniques. A single dataset can cover thousands of cells. 17.10 scATAC-seq data limitations: scATAC-seq has very high sparsity compared to single-cell RNA-seq since there are only two copies of each locus in a diploid cell compared to many copies of mRNAs. Like other single-cell techniques This results in the data essentially being binary at the single cell level - a region either has reads and is considered accessible in that cell or has no reads. Like bulk ATAC-seq, the Tn5 transposase has a sequence bias, so regions with a preferred sequence will undergo higher levels of transposition. Highly accessible regions of DNA will also be overrepresented in the final library. Single-cell ATAC-seq is an expensive technique regardless of the experimental approach chosen. Plate-based methods are generally cheaper but have lower throughput, while droplet-based methods are higher throughput but extremely costly and reliant on proprietary technology. Large datasets require significant investment and often use of droplet-based techniques. Many scATAC-seq datasets have low cell numbers due to the cost and technical difficulty of the assay. This presents a challenge for analysis since the data is highly sparse and noisy, which in combination with a small dataset can lead to difficulty interpreting the data. 17.11 scATAC-seq data considerations scATAC-seq will always be sequenced with paired-end reads. There are two major experimental approaches for generating single-cell ATAC-seq data: droplet based methods, such as the commercially available 10X Chromium platform, where nuclei are separated into individual droplets, and plate-based methods, which use multiple pooling and barcoding steps to tag each cell with a unique combination of barcodes (with a level of expected barcode collisions). The procedure for demultiplexing the reads will depend on the method used to generate the data. Data generated using 10X platforms can be de-multiplexed and aligned using the Cell Ranger software, while plate-based approaches typically use an alignment and peak-calling approach similar to that used for bulk ATAC-seq, with the additional step of matching the barcodes in each read to the known set of combinatorial barcodes. Correctly matching the reads to cells and filtering reads with non-matching barcodes is a critical step for scATAC-seq analysis. 17.12 scATAC-seq analysis tools Cellranger is a popular preprocessing tool specifically designed for scATAC-seq data generated using the 10x Genomics platform. It performs essential steps such as demultiplexing, barcode processing, read alignment, and filtering, providing a streamlined workflow for 10x-generated scATAC-seq data. However, it cannot be used for data generated by other methods. Bowtie2, Picard tools, and Trimmomatic: These tools are commonly used for preprocessing scATAC-seq data generated using plate-based or combinatorial indexing approaches. Bowtie is a fast and widely used aligner for mapping sequencing reads to a reference genome, while Picard provides a suite of command-line tools for manipulating and analyzing BAM files and Trimmomatic can remove adapter sequences from reads. These tools can be utilized for aligning reads, removing duplicates, sorting, and filtering the data to obtain the necessary inputs for downstream analysis. ArchR is a comprehensive scATAC-seq preprocessing tool implemented in R. It accepts both 10x fragment files and BAM files as input, making it suitable for data generated using different protocols. ArchR performs quality control, peak calling, peak annotation, normalization, and data transformation steps. It is one of the most popular tools for analyzing standalone scATAC-seq data and provides a user-friendly interface for exploratory data analysis. Scanpy is a Python-based tool widely used for visualizing and manipulating single-cell omics data, including scATAC-seq. After processing scATAC-seq data with tools like ArchR, the output can be exported as a matrix (data) or CSV (metadata) and formatted into a Scanpy data object. Scanpy offers various analytical functionalities, including dimensionality reduction, clustering, trajectory inference, differential accessibility analysis, and visualization. This tool is the tool of choice if you plan to perform your analysis primarily in Python. Seurat is an R-based tool that is extensively used for analyzing and visualizing single-cell omics data, including scATAC-seq. Similar to Scanpy, after preprocessing the data with tools like ArchR, Seurat can be employed for downstream analysis. It provides a wide range of functions for quality control, dimensionality reduction, clustering, differential accessibility analysis, cell type identification, and visualization. Seurat integrates well with other existing R-based tools for single-cell data analysis, offering flexibility and compatibility. This is a useful core tool to use if you plan to perform your analysis in R. Signac is an R package specifically designed for the analysis of single-cell epigenomics data, including scATAC-seq. It offers a comprehensive set of functions for preprocessing, quality control, dimensionality reduction, clustering, trajectory analysis, differential accessibility, and visualization. Signac integrates well with Seurat, providing an additional tool for exploring and analyzing scATAC-seq data. Additional quality checking tools: Quality checking and filtering steps in scATAC-seq analysis can be performed using various tools depending on the workflow and programming language. Some commonly used tools with QC capabilities useful for examining library quality measures such as GC bias, overrepresented sequences, and quality scores include FastQC and deepTools. 17.12.0.1 Doublet detection ArchR has a tool for doublet detection - it generates synthetic doublets from combinations of cells in the dataset and uses the similarity of cells in the dataset to these synthetic doublets to identify doublets. This is a common approach, and variations of it are used by most doublet detection algorithms. Many are specifically designed to expect transcriptomic data (such as the commonly used Scrublet) and identify barcodes with mixed transcriptional signatures of multiple clusters/cell types, and these methods do not accept scATAC-seq input. Some transcription based tools can be given modified input to detect doublets in scATAC-seq data, as described in documentation from the Demuxafy project. There are also tools like AMULET which leverage the fact that the number of ATAC-seq reads at any locus in a single cell are limited by the number of copies of a chromosome to detect doublets. Overall, doublet detection is not as common of a step in scATAC-seq analysis as it is in scRNA-seq analysis, owing to the limited tools available and the difficulty of performing this analysis on extremely sparse data. 17.12.0.2 Visualization Scanpy (Python) and Seurat (R) are the most commonly used tools for visualizing scATAC-seq data. These tools allow you to plot the accessibility of specific peaks or gene scores, as well as metadata such as cell type, clusters, etc. on the UMAP (or other) embedding at the single-cell level. Both packages include built-in functions to perform this plotting in a streamlined manner and to manipulate the data objects for additional quantification and visualization using general plotting packages such as matplotlib or ggplot. The choice between these tools is primarily determined by the programming language you choose for your analysis, as they share many of the same core features. Additionally, tools such as deepTools or enrichedHeatmap may be useful for visualizing heatmaps of pseudo-bulk data, and bedGraph or BigWig representations of pseudo-bulk data can be visualized using genome browsers such as IGV or UCSC genome browser. pyGenomeBrowser is a package which allows more customizable visualization of browser tracks and may be useful for generating publication-quality figures. 17.13 Trajectory analysis Several tools are available for single-cell trajectory analysis. These approaches are primarily distinguished by variations used in their mathematical approaches for calculating trajectories, but most make use of graph-based approaches which model the similarity or connections between cells in a dataset. The distinct approaches of the tools discussed here lead to varying levels of performance on different types of data, and extensive benchmarking has been performed (here) and (here) on synthetic datasets to determine the accuracy of different approaches. The most important consideration here is whether there are any cyclic trajectories expected in the dataset, where the end of the trajectory would connect back to the start, or disconnected trajectories, where not all trajectories originate from the same starting state. Not all approaches can reconstruct these trajectories accurately. Most popular methods expect a tree-like structure, with a single starting point and branches which lead toward terminal cell fates. Monocle is a popular choice that offers a comprehensive workflow for trajectory inference, visualization of trajectory analysis, pseudotime ordering of cells, and identification of differentially expressed genes along trajectories. Another commonly used tool is Slingshot, which utilizes a graph-based approach to infer trajectories, compute pseudotime ordering, and generate smooth curves to visualize trajectories. Additionally, it has the ability to infer multiple disconnected trajectories within a single dataset. PAGA (Partition-based Graph Abstraction) uses a distinct strategy with the goal of maintaining connections between similar groups of cells as well as the overall structure of the data. Palantir is a tool which uses a probabilistic approach to assign cell fate probabilities to each cell in a dataset, which can be used to define cells belonging to a specific trajectory. 17.14 Motif detection (ex. ChromVar) Single-cell chromVAR analysis is a computational approach used to assess cell-to-cell variation in chromatin accessibility profiles across a population of single cells. It aims to identify TF activity differences between cell types or states and elucidate the underlying regulatory dynamics. Single-cell chromVAR leverages the concept of TF motif enrichment or depletion within cell-specific accessible regions to infer TF activity. It compares the chromatin accessibility profiles of individual cells to a background model derived from the aggregate accessibility profiles of all cells, enabling the detection of cell-specific TF binding patterns. By quantifying the enrichment or depletion of TF motifs within accessible regions, single-cell chromVAR provides insights into TF activity variation, potential regulatory networks, and cell-type-specific transcriptional regulation. It serves as a valuable tool for understanding the contribution of TFs to cellular heterogeneity and regulatory processes in single-cell chromatin accessibility data. 17.15 Regulatory network detection CisTopic is a computational tool used for the analysis of single-cell chromatin accessibility data to identify and characterize cell subpopulations with distinct regulatory patterns. It employs a topic modeling approach to capture the variability in chromatin accessibility profiles across cells and identifies the major regulatory patterns driving cell heterogeneity. CisTopic assigns cells to topics based on the similarity of their accessibility landscapes. By analyzing the differential accessibility of genomic regions within each topic, CisTopic facilitates the discovery of transcription factor binding motifs and CREs associated with specific cell subpopulations. 17.16 Tools for data type conversion A comprehensive explanation of packages to convert between single-cell data object types used by Python and R packages is found here. The most common data types for processed scATAC-seq data are: SingleCellExperiment Seurat/h5Seurat annData objects H5seurat objects can be converted to annData objects using SeuratDisk. 17.17 More resources and tutorials about scATAC-seq data Galaxy tutorial for sc-ATAC-seq analysis Signac scATAC-seq tutorial with pbmcs sc ATAC-seq chapter - Intro to Bioinformatics and Comp Bio Single Cell ATAC-seq youtube video Comprehensive analysis of single cell ATAC-seq data with SnapATAC References "],["chip-seq-1.html", "Chapter 18 ChIP-Seq 18.1 Learning Objectives 18.2 What are the goals of ChIP-Seq analysis? 18.3 ChIP-Seq general workflow overview 18.4 ChIP-Seq data strengths: 18.5 ChIP-Seq data limitations: 18.6 ChIP-Seq data considerations 18.7 ChiP-seq analysis tools 18.8 More resources about ChiP-seq data", " Chapter 18 ChIP-Seq This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 18.1 Learning Objectives 18.2 What are the goals of ChIP-Seq analysis? ChIP-Seq (chromatin immunoprecipitation sequencing) and related approaches are used to identify genome-wide binding sites of specific proteins or protein complexes. Given the diversity of interactions at the DNA-protein interface, sequencing-based methods for targeted chromatin capture have evolved to meet precise research needs and improve the quality of the results. Specifically, ChIP-Seq builds on protein immunoprecipitation techniques (IP) by applying next generation sequencing to a pulldown product. IP followed by sequencing can be applied to any nucleic-acid binding protein for which an antibody is available, including a known or putative transcription factor (TF), chromatin remodeler or histone modifications, or other DNA- or chromatin-specific factors. ChiP-Seq approaches have been honed to increase signal-to-noise, reduce input material, and more specifically map protein-DNA interactions, for example by treating the IP product with a exonuclease that chews-back unprotected DNA end (e.g. ChIP-exo). The main goals of analysis for ChIP-Seq approaches are: Identify the genomic regions where a specific protein or protein complex binds. This can be achieved by sequencing both the IP input and product, and then calculating the enrichment in the product sample over the input. Annotate binding sites via comparison to other datasets and genome annotations. This may include transcription start sites (TSSs) or gene-regulatory regions. Oftentimes it is best to validate your data against previous profiling of similar epitopes. Comparison of binding sites: Many ChIP-Seq experiments compare changes in protein-DNA interactions across different conditions. This type of analysis can leverage statistical tools for pairwise comparison and multiple hypothesis testing. Identification of co-occurring motifs: Many chromatin proteins exhibit a sequence-specific binding pattern that is shaped by evolutionary forces. These sequence patterns, or motifs, are thought to capture contacts between specific base pairs and the DNA-binding domain of a protein and are often represented as a position weight matrix (PWM) for computational analysis. Statistical tools have been developed for de novo motif discovery within a given set of genomic intervals, like a ChIP-seq peaklist. The list of discovered motifs can be meaningfully interpreted by cross referencing with a motif database and recovery of known motifs represent another means of data validation. Integration with other -omics data: Given the expansive repositories of publicly available sequencing data, creating a comprehensive narrative from a ChIP-Seq experiment usually involves comparison with other types of sequencing data. Just like how a ChIP-Seq peak list can be interpreted through existing genome annotations, other sequencing data can be interpreted through the binding sites identified from a given ChIP-Seq experiment. For example, a sequence variant might be enriched for or against in protein binding sites versus previously identified motifs. This would suggest that a mutation would alter DNA-protein interactions. Binding of a specific gene-regulatory element might also correlate with changes in gene expression. 18.3 ChIP-Seq general workflow overview &lt;TODO: add data formats in a graphical format&gt; A key contribution of large consortia, such as the ENCODE consortium, are standardized processing workflows to facilitate the integration of ChIP-seq data generated in different labs. While the exact data processing needs of any given experiment may vary, established pipelines provide a helpful starting point. In choosing a data processing workflow, it is essential to note the input data format. For example, the read length should be considered, as well as the sequencing paradigm (i.e. whether the data is single-end or paired-end). The most generic steps for processing ChIP-Seq data are: Quality control: The first step in ChIP-Seq data processing is to perform quality control checks on the raw sequencing data to assess its quality and identify any potential issues, such as poor sequencing quality or adapter contamination which can be assessed via FASTQC. Read alignment: The next step is to align the ChIP-Seq reads to a reference genome using a suitable alignment tool such as Bowtie or BWA. Notably, many publicly available ChIP-Seq datasets are single-ended and it is important to use the correct alignment parameters for a given sequencing approach. In the case of ChIP-seq approaches that include exonuclease treatment, such as ChIP-exo and ChIP-nexus, a paired-end sequencing approach is often taken and then insert size can be useful for validating alignment. For example, profiling of a histone modification should yield nucleosome-sized fragments, ranging up from 120 bp for mononucleosomes, whereas TFs should yield smaller, sub-nucleosomal fragments and polymerase is in between at 20-50bps (PMID: 30030442). Peak calling: After the reads have been aligned to the genome, the next step is to identify the genomic regions where the protein or protein complex of interest is bound. This is done using peak-calling algorithms, such as MACS2, SICER, or HOMER, which can calculate enrichment as fold change over the input control with statistical testing. Quality control of peaks: Once the peaks have been called, it is important to perform quality control checks to ensure that the peaks are of high quality and biologically relevant. This can be done by assessing the number of peaks, fraction of reads in peaks (FRiPs), enrichment of the peaks in specific genomic regions, comparing the peaks to known gene annotations, or performing motif analysis. Often, peaks will be merged across replicates to create a consensus peak set. Peaks should be assessed visually with tools like IGV or the UCSC genome browser to ensure they overlap regions of high coverage. The Cistrome Data Browser is another useful resource for comparing with published ChIP-seq, DNase-seq and ATAC-seq data. Differential binding analysis: If the ChIP-Seq experiment involves comparing the binding of the protein or protein complex in different conditions or cell types, statistical testing can be performed to identify the regions of the genome where the protein or protein complex binds differentially. Tools developed for multiple comparison testing, like Limma, Deseq2, and EdgeR are useful for this type of comparative analysis. Integrative analysis: Finally, integrative analysis with other -omics data can be performed to gain biological insights into the ChIP-Seq data. This can involve interpreting ChiP-Seq data through existing annotations by looking at signal enrichment in different genomic regions, like transcription start sites (TSSs), gene bodies, and previously-identified cis-regulatory elements (CREs). ChIP-Seq data can even be interpreted through other ChIP-seq data to see if features overlap with statistical testing for similarity using packages like BEDTools and Bedops. 18.4 ChIP-Seq data strengths: ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful tool for understanding the genomic locations where a specific protein or protein complex binds. ChIP-Seq is particularly good at showing or illustrating: Identification of regulatory elements: ChIP-Seq can be used to identify the genomic regions where a protein or protein complex binds to regulatory elements, such as promoters, enhancers, and silencers. For example, certain histone modifications characterize active promoters and enhancers, such as H3K4 methylation and H3K27 acetylation. Characterization of protein-protein interactions: ChIP-Seq can be used to identify the genomic regions where multiple proteins bind. In this way, cobinding can be inferred to provide insight into the protein-protein interactions that are involved in regulating gene expression. Identification of binding site motifs: ChIP-Seq can be used to identify the DNA motifs that are enriched in the binding sites of a protein or protein complex. This information can be used to identify other transcription factors or cofactors that are involved in the same regulatory network. Databases of known TF binding motifs include JASPAR, Cis-BP, Hocomoco. Differential binding analysis: ChIP-Seq can be used to compare the binding of a protein or protein complex in different conditions or cell types, which can provide insight into the mechanisms that regulate protein binding and the impact of different cellular states on the regulatory networks. 18.5 ChIP-Seq data limitations: ChIP-Seq (chromatin immunoprecipitation sequencing) is a powerful technique, but there are several biases, caveats, and problems that can arise when analyzing ChIP-Seq data. Some of the most common biases, caveats, and problems are: Accessibility bias: ChIP-Seq relies on fragmentation of chromatin prior to immunoprecipitation, which is observed to enrich for genomic regions that are highly accessible to TFs in general . Antibody specificity and cross-reactivity: The specificity of the antibody used in ChIP-Seq is crucial for the accuracy of the results. Finding an antibody for specific epitopes can pose a challenge because antibodies can have cross-reactivity with other epitopes, which can result in false positives or misinterpretation of the data. DNA fragmentation bias: The length and quality of the DNA fragments used in ChIP-Seq can impact the results. Shorter fragments are often located in regions with more highly accessible chromatin, especially nucleosome linker regions and promoters of active genes. Sequencing depth bias: The amount of sequencing depth can impact the results of ChIP-Seq analysis. Insufficient sequencing depth can result in false negatives or miss important binding sites. Reproducibility and sample variation: ChIP-Seq experiments can be highly variable, and reproducibility between replicates can be an issue. Additionally, the composition and quality of the sample can also impact the results. Peak-calling algorithm choice: The choice of peak-calling algorithm can impact the results of ChIP-Seq analysis, as different algorithms have different strengths and weaknesses. Interpretation of binding sites: Finally, the interpretation of binding sites identified by ChIP-Seq can be complex and requires additional validation to confirm their biological relevance and function. Notably, ChIP-Seq cannot distinguish direct protein-DNA interaction from indirect binding (e.g. where a protein may bind another protein that binds to DNA). 18.6 ChIP-Seq data considerations As a general guideline, a minimum sequencing depth of 20 million reads is recommended for ChIP-seq experiments in Drosophila, whereas 40–50 million reads is a practical minimum for most marks in human tissue (PMID: 24598259). However, this depth may not be sufficient for some analyses, particularly for studies that require high resolution or low signal-to-noise ratio. In such cases, deeper sequencing may be necessary to achieve the desired level of sensitivity and specificity. In general, epitopes that cover large sequence space (e.g. repressive histone modification such as H3K27me3) require greater sequencing depth than epitopes confined to more narrow genomic regions (e.g. active histone modifications such as H3K4 methylation and H3K27ac). ChIP-seq for TFs may require even less sequencing depth; however, low antibody specificity may necessitate deeper sequencing due to low signal-to-noise. In practice, the depth of sequencing required for ChIP-seq experiments can vary widely depending on the specific experimental design and research question. It is important to perform a pilot study or use appropriate statistical methods to estimate the necessary sequencing depth for a given experiment. Choosing a specific antibody is essential, otherwise even deep sequencing may not recover signal over high background. Sequencing depth should also account for genome size (e.g. larger genome requires deeper sequencing). 18.7 ChiP-seq analysis tools 18.7.1 Tools for quality checks FastQC is a widely used tool that is used to assess the quality of sequencing data. It analyzes the raw sequencing data and generates a report that provides an overview of various metrics such as base quality, sequence length distribution, and GC content. Picard tools and SAMtools: Picard tools and SAMtools are two collections of command-line tools that are used to manipulate and analyze high-throughput sequencing data. They can be used to check the quality of the data, remove duplicates, and generate summary statistics. MACS2 (Model-based Analysis of ChIP-Seq) is a software tool that is specifically designed for the analysis of ChIP-Seq data. It is used to identify regions of the genome that are enriched for DNA-protein interactions. ENCODE Uniform Processing Pipelines: The ENCODE (Encyclopedia of DNA Elements) Uniform Processing Pipelines are a set of standardized protocols and tools that are used to process and analyze ChIP-Seq data. They ensure that the data generated by different labs are consistent and can be easily compared. These tools are just a few examples of the many quality control tools available for ChIP-Seq analysis. The choice of tool(s) to use will depend on the specific analysis being performed and the preferences of the user. 18.7.2 Tools for Peak calling: MACS2 (Model-based Analysis of ChIP-Seq) is a widely used tool for peak calling in ChIP-Seq data. It uses a Poisson distribution to model the local noise and identifies peaks based on the fold enrichment over the background noise. SICER: Spatial Clustering for Identification of ChIP-Enriched Regions (SICER) is a peak caller that takes into account the spatial clustering of enriched regions in ChIP-Seq data. It uses a clustering algorithm to identify peaks based on the local density of enriched regions. HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools that includes a peak caller for ChIP-Seq data. It uses a sliding window approach to identify peaks based on the local enrichment of reads. PeakSeq is a peak caller that uses a Bayesian approach to identify enriched regions in ChIP-Seq data. It models the relationship between the read counts and the signal-to-noise ratio and identifies peaks based on the posterior probability of enrichment. 18.7.3 Tools for Differential Analysis DESeq2: This is a widely used R package for differential analysis of sequencing count data, including ChIP-seq. It uses a negative binomial model to normalize and test for differential enrichment of ChIP-seq peaks. edgeR: Another popular R package for differential expression analysis of RNA-seq data, edgeR can also be used for differential analysis of ChIP-seq data. It uses a generalized linear model to estimate differential enrichment and has been shown to be effective for ChIP-seq data with low read counts. Annotation ChIPseeker: This R package can be used for annotating ChIP-seq peaks with genomic features such as gene annotation, gene ontology, and pathway analysis. It can also generate plots and heatmaps for visualization. HOMER: This suite of tools includes several programs for motif discovery, peak annotation, and visualization. The annotatePeaks.pl program can be used for assigning genomic regions to specific functional categories, including promoter, exon, intron, intergenic, and enhancer regions. GREAT: This web-based tool can be used for annotating genomic regions with functional annotations such as gene ontology terms and regulatory domains. It uses a statistical approach to associate genomic regions with biological functions. Cistrome-GO: A web-based tool for determining the gene ontologies of genes likely to be regulated by regions discovered through TF ChIP-seq. GenomicRanges: This R package provides a framework for working with genomic ranges, including intersection, overlap, and annotation of genomic regions with functional categories. It can be used in conjunction with other R packages for ChIP-seq analysis, such as ChIPseeker and DiffBind. ChIP-Enrich: This web-based tool can be used for annotating ChIP-seq peaks with functional categories such as gene ontology, pathway analysis, and transcription factor binding sites. It uses a hypergeometric test to identify overrepresented functional categories. Cistrome DB: The website allows users to upload their enriched regions, returning TF ChIP-seq, DNase-seq or ATAC-seq samples with similar profiles. 18.7.4 Motif Analysis MEME Suite: The MEME Suite is a comprehensive suite of tools for motif analysis, including motif discovery and motif-based sequence analysis. It includes tools for discovering de novo motifs from ChIP-Seq data and for searching for known motifs in the regions bound by the protein of interest. HOMER is a suite of tools for motif discovery and analysis. It includes tools for identifying de novo motifs from ChIP-Seq data, as well as for searching for known motifs in the regions bound by the protein of interest. HOMER also provides tools for performing gene ontology analysis and pathway analysis based on the identified motifs. MEME-ChIP is a specialized version of the MEME Suite that is specifically designed for motif analysis in ChIP-Seq data. It includes tools for discovering de novo motifs from ChIP-Seq data, as well as for searching for known motifs in the regions bound by the protein of interest. CentriMois a tool for identifying enriched motifs in ChIP-Seq data based on the position of the motif relative to the peak summit. It can be used to identify motifs that are enriched at the center of the peak, as well as those that are enriched near the edges of the peak. 18.7.5 Tools for preprocessing Trimmomatic is a widely used tool for trimming and filtering Illumina sequencing data. It is often used to remove low-quality reads, adapter sequences, and other artifacts that can affect downstream analysis. Cutadapt is another popular tool for trimming adapter sequences from high-throughput sequencing data. It is particularly useful for removing adapters that contain degenerate nucleotides or that have been ligated with variable lengths. Bowtie2 is a fast and memory-efficient tool for aligning sequencing reads to a reference genome. It is often used to map ChIP-Seq reads to the genome prior to peak calling. SAMtools is a suite of tools for manipulating SAM/BAM files, which are commonly used to store alignment data from high-throughput sequencing experiments. It can be used for filtering and sorting reads, as well as for generating summary statistics. BEDTools is a powerful suite of tools for working with genomic intervals, such as those generated by ChIP-Seq peak calling. It can be used for operations such as intersecting, merging, and subtracting intervals. 18.7.6 Tools for making visualizations Integrative Genomics Viewer (IGV) is a popular genome browser that is widely used for the visualization of genomic data, including ChIP-Seq data. It provides a user-friendly interface for exploring genomic data at different levels of resolution, from the whole-genome level down to individual nucleotides. The UCSC Genome Browser is another widely used genome browser that can be used to visualize ChIP-Seq data. It provides an intuitive interface for navigating and visualizing genomic data, including the ability to zoom in and out and to overlay multiple data tracks. Genome Visualization Tool (GViz) is a package for the R statistical computing environment that provides functions for generating publication-quality visualizations of genomic data, including ChIP-Seq data. It offers a high degree of flexibility and customization, allowing users to create complex and informative plots that convey the relevant information in a clear and concise manner. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ChIP-seq data. Cistrome-Explorer A web-based visualization of compendia of ATAC-seq and histone modification ChIP-seq data for diverse samples, represented as a heatmap. Users can upload their ChIP-seq peak sets to assess the tissue specificity of their regions on the genome. 18.7.7 Tools for making heatmaps Deeptools is a widely used package for analyzing ChIP-seq data, and it includes a tool called “plotHeatmap” that can generate heatmaps from ChIP-seq data. Integrative Genomics Viewer (IGV) is a popular tool for visualizing and exploring genomic data. It includes a heatmap function that can be used to generate heatmaps from ChIP-seq data. EnrichedHeatmapis an R package for making heatmaps that visualize the enrichment of genomic signals on specific target regions. SeqMonk is a software package designed for the visualization and analysis of large-scale genomic data. It includes a heatmap function that can generate heatmaps from ChIP-seq data. ngs.plot is a tool that can generate different types of plots, including heatmaps, from NGS data. It includes a ChIP-seq specific mode that can be used to generate heatmaps from ChIP-seq data. ChAsE: ChAsE (ChIP-seq Analysis Engine) is a web-based platform for ChIP-seq analysis that includes a heatmap function that can generate heatmaps from ChIP-seq data. These tools allow users to generate heatmaps of ChIP-seq data, which can be used to identify enriched regions of binding and to visualize patterns of binding across genomic regions. The Cistrome Project has a large collection of human and mouse ChIP-seq, DNase-seq and ATAC-seq data, as well as tools for analyzing user generate ChIP-seq data with publicly available samples. These tools include the Cistrome Data Browser toolkit function that can find publicly available datasets that are similar to a ChIP-Seq peak set, and Cistrome-GO for gene ontology analysis of TF ChIP-seq target genes. 18.8 More resources about ChiP-seq data &lt;TODO: Put links to any resources and tutorials that are useful for ChIP-Seq data&gt; Shirley Liu’s Computational biology course Galaxy ChIP-seq tutorial ENCODE ChiP-seq tutorial Crazyhottommy’s ChIp-seq tutorial Harvard CUT&amp;RUN tutorial 4DN CUT&amp;RUN tutorial Henikoff Lab CUT&amp;Tag tutorial ARCHS4 (All RNA-seq and ChIP-seq sample and signature search) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with ChIP-seq data. Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. "],["cutrun-and-cuttag.html", "Chapter 19 CUT&amp;RUN and CUT&amp;Tag 19.1 Learning Objectives 19.2 Technologies 19.3 Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology 19.4 Differences between CUT&amp;RUN and CUT&amp;Tag 19.5 Limitation of CUT&amp;RUN and CUT&amp;Tag 19.6 General Data Analysis Workflow 19.7 More resources about CUT&amp;RUN and CUT&amp;Tag data analysis", " Chapter 19 CUT&amp;RUN and CUT&amp;Tag This chapter is in a beta stage. If you wish to contribute, please go to this form or our GitHub page. 19.1 Learning Objectives 19.2 Technologies 19.3 Advantages of CUT&amp;RUN and CUT&amp;Tag over the Traditional ChIP-seq Technology Lower Cell Number and Less Starting Material Requirement: CUT&amp;RUN and CUT&amp;Tag can be performed with much lower cell number than ChIP-seq. This is particularly beneficial when working with rare cell types or limited biological samples. The CUT&amp;RUN and CUT&amp;Tag techniques involve less sample manipulation compared to ChIP-seq. This minimizes the risk of losing material and potential artifacts from extensive sample handling and processing. Higher Resolution and Specificity: CUT&amp;RUN and CUT&amp;Tag provide higher resolution and greater specificity in identifying protein-DNA interactions. This results from the method’s direct targeting and cleavage of DNA at the binding sites, reducing background noise. Reduced Background Noise: CUT&amp;RUN and CUT&amp;Tag typically result in lower background noise due to the direct tagging of DNA at the site of the protein-DNA interaction, enhancing the clarity and quality of the results. The sensitivity of sequencing depends on the depth of the sequencing run (i.e., the number of mapped sequence tags), the size of the genome, and the distribution of the target factor. The sequencing depth is directly correlated with cost and negatively correlated with background. Therefore, low-background CUT&amp;RUN and CUT&amp;Tag will waste less sequencing resources on profiling the background and hence is inherently more cost-effective than high-background ChIP-seq. Cost-Effectiveness: In addition to high efficiency in sequencing the target region, due to the lower requirement for reagents and enzymes, CUT&amp;RUN and CUT&amp;Tag can be more cost-effective, especially in high-throughput settings. More Efficient Protocol Workflow and Faster Turnaround Time: The protocol for CUT&amp;RUN and CUT&amp;Tag is more streamlined and less labor-intensive than ChIP-seq. It eliminates the need for sonication, DNA purification, and ligation steps, simplifying the procedure. The overall protocols of CUT&amp;RUN and CUT&amp;Tag are generally quicker and more straightforward than ChIP-seq, leading to faster experiment turnaround times. 19.3.1 CUT&amp;RUN Cleavage Under Targets and Release Using Nuclease, CUT&amp;RUN for short, is an antibody-targeted chromatin profiling method to measure the histone modification enrichment or transcription factor binding. This is a more advanced technology for epigenomic landscape profiling compared to the traditional ChIP-seq technology and known for its easy implementation and low cost. The procedure is carried out in situ where micrococcal nuclease tethered to protein A binds to an antibody of choice and cuts immediately adjacent DNA, releasing DNA-bound to the antibody target. Therefore, CUT&amp;RUN produces precise transcription factor or histone modification profiles while avoiding crosslinking and solubilization issues. Extremely low backgrounds make profiling possible with typically one-tenth of the sequencing depth required for ChIP-seq and permit profiling using low cell numbers (i.e., a few hundred cells) without losing quality. Publications: An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife. 2017 Targeted in situ genome-wide profiling with high efficiency for low cell numbers. Nature Protocols. 2018 Improved CUT&amp;RUN chromatin profiling tools. eLife. 2019 Protocols: CUT&amp;RUN: Targeted in situ genome-wide profiling with high efficiency for low cell numbers (Version 3) CUT&amp;RUN with Drosophila tissues (Version 1) 19.3.1.1 AutoCUT&amp;RUN CUT&amp;RUN has been automated using a Beckman Biomek FX liquid-handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. DNA end polishing and direct ligation of adapters permit sample-to-Illumina library processing of 96 samples in two days. AutoCUT&amp;RUN can be used for cell-type specific gene activity and enhancer profiling based on histone modifications and transcription factors, including in frozen tissue samples of tumor xenografts. Publication: Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigentics &amp; Chromatin. 2018 Protocol: AutoCUT&amp;RUN: genome-wide profiling of chromatin proteins in a 96 well format on a Biomek (Version 1) 19.3.2 CUT&amp;Tag Cleavage Under Targets and Tagmentation, CUT&amp;Tag for short, is an enzyme tethering approach to profiling chromatin proteins, including histone marks and RNA Pol II. CUT&amp;Tag generates sequence-ready libraries without the need for end polishing and adapter ligation. It uses a proteinA-Tn5 fusion to tether Tn5 transposase near the site of an antibody to a chromatin protein of interest. A secondary antibody, such as guinea pig anti-rabbit antibody, is used to increase the efficiency of tethering the pA-Tn5 to the target primary antibody. The pA-Tn5 complex is pre-loaded with sequencing adapters that insert into adjacent DNA upon activation with magnesium. CUT&amp;Tag has a very low background and can be performed in a single tube in as little as a day, though primary antibodies are typically incubated overnight. It can also be used with the ICELL8 nano dispensation system to profile single cells. A streamlined CUT&amp;Tag protocol was introduced by the Henikoff Lab that suppresses DNA accessibility artifacts to ensure high-fidelity mapping of the antibody-targeted protein and improves the signal-to-noise ratio over current chromatin profiling methods. Streamlined CUT&amp;Tag can be performed in a single PCR tube, from cells to amplified libraries, providing low-cost genome-wide chromatin maps. By simplifying library preparation, CUT&amp;Tag-direct requires less than a day at the bench, from live cells to sequencing-ready barcoded libraries. As a result of low background levels, barcoded and pooled CUT&amp;Tag libraries can be sequenced for as little as $25 per sample. This enables routine genome-wide profiling of chromatin proteins and modifications and requires no special skills or equipment. Publication: CUT&amp;Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications. 2019 Efficient low-cost chromatin profiling with CUT&amp;Tag. Nature Protocols. 2020 Scalable single-cell profiling of chromatin modifications with sciCUT&amp;Tag. Nature Protocols. 2023 Protocol: Bench top CUT&amp;Tag (Version 3) 3XFlag-pATn5 Protein Purification and MEDS-loading (5x scale, 2L volume, Version 1) CUT&amp;Tag with Drosophila tissues (Version 1) 19.3.2.1 AutoCUT&amp;Tag CUT&amp;Tag has been automated using a Beckman Coulter Biomek FX liquid handling robot so that a 96-well format can be used to profile chromatin for high-throughput samples, such as in a clinical setting. AutoCUT&amp;Tag can be used to profile the gene targets of fusions of the KMT2A lysine methyltransferase to other chromatin proteins, which characterize lymphoid, myeloid, and mixed lineage leukemias, uncovering heterogeneities that may underlie lineage plasticity. Publication: Automated CUT&amp;Tag profiling of chromatin heterogeneity in mixed-lineage leukemia. Nature Genetics. 2021 Simplified Epigenome Profiling Using Antibody-tethered Tagmentation Epigenomic analysis of formalin-fixed paraffin-embedded samples by CUT&amp;Tag Protocol: AutoCUT&amp;Tag: streamlined genome-wide profiling of chromatin proteins on a liquid handling robot (Version 1) 19.3.2.2 CUTAC Cleavage Under Targeted Accessible Chromatin, CUTAC, for short, is a simple modification of the Tn5 transposase-mediated antibody-directed CUT&amp;Tag method that provides high-quality accessibility mapping in parallel with mapping of specific components of the chromatin landscape. Findings imply that regulatory sites detected by hyperaccessibility mapping are coupled to the initiation of RNA Polymerase II transcription via H3K4 methylation. CUTAC requires few resources and is sufficiently simple that it can be performed from nuclei to purified sequencing-ready libraries in single PCR tubes on a home workbench. Publication: Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation. eLife. 2020 Protocol: CUT&amp;Tag-direct for whole cells with CUTAC (Version 4) 19.4 Differences between CUT&amp;RUN and CUT&amp;Tag CUT&amp;RUN is more suitable than CUT&amp;Tag for transcription factor (TF) profiling because the salt will compete with TF binding to DNA during the high salt incubation. TF depending on the motif affinity, only binds to a few DNA basepairs, and TF binding can be weak and compelled by salt. As demonstrated by Kaya-Okur et al. 2019, the CUT&amp;Tag signal of CTCF, one of the strongest binding factors, can be observed but become relatively weak. Therefore, it can be challenging for the peak caller to detect the enrichment of CTCF profiled by CUT&amp;Tag. Hence, it can also be hard to find the motif pattern practically. CUT&amp;Tag is more suitable for histone modification and RNA polymerase profiling as DNA wraps around the histone and RNA polymerase structure inserts and grabs the DNA. The DNA binding from both histone modification marks and PolII is strong. CUT&amp;Tag for histone modification also showed moderately higher signals compared to CUT&amp;RUN throughout the list of sites in Kaya-Okur et al. 2019. CUT&amp;RUN must be followed by DNA end polishing and adapter ligation to prepare sequencing libraries, which increases the time, cost, and effort of the overall procedure. Moreover, the release of MNase-cleaved fragments into the supernatant with CUT&amp;RUN is not well-suited for application to single-cell platforms. 19.5 Limitation of CUT&amp;RUN and CUT&amp;Tag Dependency on Antibody Quality: Similar to ChIP-seq, CUT&amp;RUN and CUT&amp;Tag’s success heavily relies on the quality and specificity of the antibodies used. High-quality, highly specific antibodies are essential for reliable results, and the lack of such antibodies can limit the application of this technique. Likelihood of Over-digestion of DNA: Due to inappropriate timing of the Magnesium-dependent Tn5 reaction with CUT&amp;RUN, DNA can be over-cut, a similar limitation exists for contemporary ChIP-Seq protocols where enzymatic or sonicated DNA shearing must be optimized. GC Bias: For CUT&amp;Tag, as with other techniques using Tn5, the library preparation has a strong GC bias and has poor sensitivity in low GC regions or genomes with high variance in GC content. Not Suitable for All Epitopes: CUT&amp;RUN and CUT&amp;Tag may not work efficiently for all protein-DNA interactions, especially if the epitope recognized by the antibody is obscured or altered in the chromatin context. However, companies are testing thoroughly therefore this issue is decreasing with time. Challenges in Detecting Low Abundance TFs: While CUT&amp;RUN and CUT&amp;Tag are more sensitive than ChIP-seq, they can still face challenges in detecting TFs present in very low abundance in the cell. 19.6 General Data Analysis Workflow CUT&amp;RUN and CUT&amp;Tag data analysis share a very similar strategy. Data analysis generally involves raw sequencing data alignment, quality control, normalization, peak calling, visualization, differential analysis, and other specific analyses for target scientific discoveries. A detailed data processing and analysis tutorial with reproducible codes and demo data can be found at CUT&amp;Tag Data Processing and Analysis Tutorial, 19.6.1 Adapter Trimming If the read length is long, adapter trimming may be needed for more accurate alignment results. However, for CUT&amp;RUN and CUT&amp;Tag, if the read length is short (i.e., 25bp per end), the aligner can use a “soft-match” style algorithm to handle the remaining adapter at the end of the read. Therefore, the adapter trimming is not necessary in that scenario. Cutadapt: Cutadapt finds and removes adapter sequences, primers, poly-A tails, and other types of unwanted sequences from your high-throughput sequencing reads. It can remove a wide range of adapter sequences and is not limited to Illumina-specific adapters. Users can specify multiple adapter sequences. Cutadapt supports quality trimming, though with less granularity than Trimmomatic. It can be used for both paired-end and single-end reads and allows for filtering based on length after trimming. For instance, with Illumina’s NextSeq 2000 machine and 50 base pairs paired-end reads, the adapters clipped by cutadapt 4.1 with parameters: -j 8 --nextseq-trim 20 -m 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -Z Trimmomatic: A flexible trimmer for Illumina Sequence Data. It trims low-quality bases from the start and end of the reads and scans the read with a sliding window to trim based on average quality. Trimmomatic can also remove Illumina-specific adapters with an option to specify custom adapter sequences. It is known for its high precision and flexibility. It can handle paired-end and single-end data. 19.6.2 Alignment Bowtie2: Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100 characters to relatively large (e.g., mammalian) genomes. When aligning paired-end reads to the reference genome, filter and keep read pairs whose fragment lengths are between 10bp and 1000bp. Detailed recommended parameters can be found in the [tutorial]. The alignment of the 50 base pairs paired-end reads out of Illumina’s NextSeq 2000 machine by Bowtie2 version 2.4.4 to reference sequence with parameters: --very-sensitive-local --soft-clipped-unmapped-tlen --dovetail --no-mixed --no-discordant -q --phred33 -I 10 -X 1000 BWA: BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. 19.6.3 Quality control The quality of the aligned data can be evaluated from the following aspects: Sequencing depth: Check the number of reads mapped to the genome to see if it matches the expected sequencing depth. CUT&amp;RUN/CUT&amp;Tag data typically has very low backgrounds, so as few as 1 million mapped fragments can give robust profiles for a histone modification in the human genome. Alignment rate: Alignment frequencies are expected to be &gt;80% for high-quality data. Duplication rate: Duplication rate is the percentage of duplicated reads, and Picard is widely used to detect duplicates. PCR duplicates are read with the same start and end coordinates and are not biological duplicates. PCR duplicates are created during the library amplification. Generally, the duplication rate is expected to be &lt;20% for high-quality data. However, as long as the duplicates rate is lower than 80-90 %, meaning the sequencing is not completely saturated, duplicates should be kept for downstream analysis. Even for relatively high duplicated samples (e.g., 50% duplication rate), PCR duplicates tend to happen more at the signal part, and removing duplicates with favor towards the background noise. In other words, keeping the duplicates can help us locate the peak region. When the sequencing depth is not saturated, the duplicate rate is linearly correlated with the sequencing depth. Therefore, normalization that removes the sequencing depth variations across samples can take care of the duplicate rate simultaneously. Estimated library size: Estimated library size is the estimated number of unique molecules in the library based on PE duplication calculated by Picard. The estimated library sizes are proportional to the abundance of the targeted epitope and the quality of the antibody used, while the estimated library sizes of IgG samples are expected to be very low. Suppose users follow the sequencing depth tradition for the ChIP-seq data and sequence 100+ million reads but end up with only 1-2 million estimated library size. In that case, it is expected to have an ultra-high duplication rate. In that case, the sequencing depth is too high, and the sequencing is saturated. Duplicates are expected to be removed for downstream analysis. Fragment length distribution: CUT&amp;RUN and CUT&amp;Tag targeting at a histone modification predominantly result in nucleosomal fragments (~180 bp) or multiples of that length. Therefore, the fragment length density distribution usually has several peaks whose modes are 180bp apart, matching the nucleosomal length. CUT&amp;RUN/CUT&amp;Tag targeting transcription factors predominantly produce nucleosome-sized fragments and variable amounts of shorter fragments from neighboring nucleosomes and the factor-bound site, respectively. Moreover, tagmentation of DNA on the surface of nucleosomes also occurs, and plotting fragment length distribution with single-basepair resolution reveals a 10-bp sawtooth periodicity, which is typical of successful CUT&amp;Tag experiments. Such 10 bp periodic cleavage preferences match the 10 bp/turn periodicity of B-form DNA, which suggests that the DNA on either side of these bound TFs is spatially oriented such that tethered MNase has preferential access to one face of the DNA double helix. The presence of this 10 bp periodicity is a good indicator that the experiment has specifically targeted nucleosomal DNA or proteins in close association with it. If this pattern is absent, it might suggest non-specific binding or other technical issues. 19.6.4 Normalization 19.6.4.1 Spike-in Scaling E. coli DNA is carried along with bacterially-produced pA-Tn5 protein and gets tagmented non-specifically during the reaction. The fraction of total reads that map to the E.coli genome depends on the yield of epitope-targeted CUT&amp;Tag and roso depends on the number of cells used and the abundance of that epitope in chromatin. Since a constant amount of pATn5 is added to CUT&amp;Tag reactions and brings along a fixed amount of E. coli DNA, E. coli reads can be used to normalize epitope abundance across experiments. The underlying assumption is that the ratio of fragments mapped to the primary genome to the E. coli genome (or other added DNA sequences if pA-Tn5 is purified and E.coli is not available anymore) is the same for a series of samples, each using the same number of cells. Because of this assumption, we do not normalize between experiments or batches of pATn5, which can have very different amounts of carry-over E. coli DNA. Using a constant C to avoid small fractions in normalized data, we define a scaling factor S as \\(S = \\frac{C}{(Fragments Mapped To E.coli Genome)}\\) \\(Normalized coverage = (Primary Genome Coverage) * S\\) The scaling can be done using bedtools, genomecov function and parameter “-scale”. 19.6.4.2 Sequencing depth and coverage normalization Without a spike-in, normalization to eliminate the sequencing depth and coverage variations can be done by the following formula: Normalized Count = \\(\\frac{Raw Count}{Sum of Fragments Coverage} * Genome_Size\\) Sum of Fragments Coverage = sum of all fragment lengths. Namely, Sum_of_Fragments_Coverage includes both the sequencing depth and coverage information. Note that only fragments that are within 1bp~1000bp are considered. 19.6.5 Peak Calling 19.6.5.1 SEACR The Sparse Enrichment Analysis for CUT&amp;RUN, SEACR for short, is a R package designed to call peaks and enriched regions from chromatin profiling data with very low backgrounds (i.e., regions with no read coverage) that are typical for CUT&amp;Tag chromatin profiling experiments. SEACR requires bedGraph files from paired-end sequencing as input and defines peaks as contiguous blocks of basepair coverage that do not overlap with blocks of background signal delineated in the IgG control dataset. If IgG control is available, use the IgG sample as the “control sample” and choose the “norm stringent” setting. If IgG is unavailable, users can use the “top *% peaks” by only providing the target marker sample. Web server: Peak calling by Sparse Enrichment Analysis for CUT&amp;RUN (SEACR) Web Interface 19.6.5.2 MACS2 The Model-based Analysis of ChIP-Seq version 2, MACS2 for short, is widely used for identifying transcription factor binding sites and histone modification regions in ChIP-Seq data. MACS2 has been widely adapted to analyze the CUT&amp;RUN/CUT&amp;Tag data. Installation details can be found at https://github.com/taoliu/MACS/wiki. 19.6.5.3 SEACR vs MACS2 SEACR is better suited for datasets with broad signal enrichment, such as H3K27me3, where peaks are broader and can continuously cover a large genomic region. MACS2 excels in datasets with sharp peaks, such as H3K4me3, where peaks are concentrated and isolated from the background and adjacent peaks. SEACR uses a straightforward thresholding approach, which can be more intuitive but may miss some nuances in the data. MACS2 uses a more complex statistical model to identify peaks, offering potentially greater accuracy but at the cost of computational complexity. SEACR offers more flexibility in handling different types of CUT&amp;RUN/CUT&amp;Tag data, especially in the absence of control samples or the control samples are of low quality. MACS2 generally requires high-quality control samples for best performance and is less flexible in this regard. 19.6.5.4 FRagment proportion in Peaks regions (FRiPs) Fragment proportion in Peak Regions, FRiPs for short, is also a critical signal-to-noise measurement. Although sequencing depths for CUT&amp;Tag are typically only 1-5 million reads, the low background of the method usually results in high FRiP scores. In other words, it measures the percentage of sequencing resources accurately allocated to the target epitope regions. Note that the number of peaks and FRiPs typically increase with the sequencing depth and mappable fragment number, therefore comparisons should be done by downsampling samples to the same number of fragment. For example, the comparison across technologies in Efficient chromatin accessibility mapping in situ by nucleosome-tethered tagmentation Figure 5A: 19.6.6 Visualization Integrative Genomic Viewer: IGV visualizes the chromatin landscape in regions using a genome browser. It provides a web app version and a local desktop version that is easy to use. UCSC Genome Browser: UCSC Genome Browser provides the most comprehensive supplementary genome information. deepTools: deepTools is a suite of Python tools particularly developed for efficiently analyzing high-throughput sequencing data. It is particularly helpful to check chromatin features at a list of annotated sites. For example, we can use it to check the histone modification enrichment/absence signals around transcription starting sites or the peak center. We can use the “computeMatrix” and “plotHeatmap” functions from deepTools to generate the following heatmap. 19.6.7 Differential Analysis chromVAR - getCounts. The “getCounts” function in the chromVAR R package can convert an aligned bam file into a region by sample matrix, where the region can be genomic binning or peaks. The differential detection analysis can be performed on the region by sample matrix. DESeq2: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 DESeq2 estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. DESeq2 can also be utilized to detect the differentially enriched region using the region by sample matrix from the CUT&amp;RUN/CUT&amp;Tag data. Limma: limma powers differential expression analyses for RNA-sequencing and microarray studies Limma is an R package for analyzing gene expression microarray data, especially using linear models for analyzing designed experiments and assessing differential expression. Limma provides the ability to analyze comparisons between many RNA targets simultaneously in arbitrary, complicated designed experiments. Empirical Bayesian methods are used to provide stable results even when the number of arrays is small. Limma can be extended to study differential fragment enrichment analysis within peak regions. Notably, limma can deal with both the fixed effect model and random effect model. edgeR: Differential Expression Analysis of Multifactor RNA-Seq Experiments With Respect to Biological Variation Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodologies based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, and quasi-likelihood tests. As well as RNA-seq, it is applied to the differential signal analysis of other types of genomic data that produce read counts, including CUT&amp;RUN/CUT&amp;Tag, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, and CAGE. edgeR can deal with multifactor problems. 19.7 More resources about CUT&amp;RUN and CUT&amp;Tag data analysis CUT&amp;RUNTools: a flexible pipeline for CUT&amp;RUN processing and footprint analysis. CUT&amp;RUNTools is a flexible and general pipeline for facilitating the identification of chromatin-associated protein binding and genomic footprinting analysis from antibody-targeted CUT&amp;RUN primary cleavage data. CUT&amp;RUNTools extracts endonuclease cut site information from sequences of short-read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CUT&amp;RUN. CUT&amp;RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&amp;RUN and CUT&amp;Tag data analysis. CUT&amp;RUNTools 2.0 is a major update of CUT&amp;RUNTools, including a set of new features specially designed for CUT&amp;RUN and CUT&amp;Tag experiments. Both the bulk and single-cell data can be processed, analyzed, and interpreted using CUT&amp;RUNTools 2.0. Nextflow Analysis Pipeline for CUT&amp;RUN and CUT&amp;TAG Experiments: nf-core/cutandrun is a best-practice bioinformatic analysis pipeline for CUT&amp;RUN, CUT&amp;Tag, and TIPseq experimental protocols that were developed to study protein-DNA interactions and epigenomic profiling. GoPeaks: histone modification peak calling for CUT&amp;Tag. GoPeaks is a peak caller designed for CUT&amp;TAG/CUT&amp;RUN sequencing data. GoPeaks, by default, works best with narrow peaks such as H3K4me3 and transcription factors. However, broad epigenetic marks like H3K27Ac/H3K4me1 require different step, slide, and minwidth parameters. "],["dna-methylation-sequencing.html", "Chapter 20 DNA Methylation Sequencing 20.1 Learning Objectives 20.2 What are the goals of analyzing DNA methylation? 20.3 Methylation data considerations 20.4 Methylation data workflow 20.5 Methylation Tools Pros and Cons 20.6 More resources", " Chapter 20 DNA Methylation Sequencing This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 20.1 Learning Objectives 20.2 What are the goals of analyzing DNA methylation? To detect methylated cytosines (5mC), DNA samples are prepped using bisulfite (BS) conversion. This converts unmethylated cytosines into uracils and leaves methylated cytosines untouched. Probes are then designed to bind to either the uracil or the cytosine, representing the unmethylated and methylated cytosines respectively. For a given sample, you will obtain a fraction, known as the Beta value, that indicates the relative abundance of the methylated and unmethylated versions of the sequence. Beta values exist then on a scale of 0 to 1 where 0 indicates none of this particular base is methylated in the sample and 1 indicates all are methylated. Note that bisulfite conversion alone will not distinguish between 5mC and 5hmC though these often may indicate different biological mechanics. Additionally, 5-hydroxymethylated cytosines (5hmC) can also be detected by oxidative bisulfite sequencing (OxBS) [Booth et al. (2013). oxidative bisulfite conversion measures both 5mC and 5hmC. If you want to identify 5hmC bases you either have to pair oxBS data with BS data OR you have to use Tet-assisted bisulfite (TAB) sequencing which will exclusively tag 5hmC bases (Yu et al. 2012). 20.3 Methylation data considerations 20.3.1 Beta values binomially distributed Because beta values are a ratio, by their nature, they are not normally distributed data and should be treated appropriately. This means data models (like those used by the limma package) built for RNA-seq data should not be used on methylation data. More accurately, Beta values follow a binomial distribution. This generally involves applying a generalized linear model. 20.3.2 Measuring 5mC and/or 5hmC If your data and questions are interested in both 5mC and 5hmC, you will have separate sequencing datasets for each sample for both the BS and OBS processed samples. 5mC is often a step toward 5hmC conversion and therefore the 5mC and 5hmC measurements are, by nature, not independent from each other. In theory, 5mC, 5hmC and unmethylated cytosines should add up to 1. Because of this, its been proposed that the most appropriate way to model these data is to combine them together in a model (Kochmanski, Savonen, and Bernstein 2019). 20.4 Methylation data workflow Like other sequencing methods, you will first need to start by quality control checks. Next, you will also need to align your sequences to the genome. Then, using the base calls, you will need to make methylation calls – which are methylated and which are not. This details of step depends on whether you are measuring 5mC and/or 5hmC methylation calls. Lastly, you will likely want to use your methylation calls as a whole to identify differentially methylated regions of interest. 20.5 Methylation Tools Pros and Cons This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment. 20.5.1 Quality control: FastQC: A popular tool for evaluating the quality of sequencing reads, generating various quality control plots and statistics. It is fast, easy to use and has a simple user interface (Andrews, n.d.). Pros: Fast and easy to use. Very commonly used. Provides various quality control metrics and plots. Can generate reports that can be easily shared with collaborators Cons: Does not perform any trimming or filtering of low-quality reads Not specifically designed for bisulfite sequencing data Trim Galore!: A wrapper tool for Cutadapt and FastQC that provides a simple way to trim adapters and low-quality reads. It also has built-in support for bisulfite sequencing data (Krueger and Andrews, n.d.). Pros: Easy to use, with a simple command line interface. Automatically trims adapters and low-quality reads. Specifically designed for bisulfite sequencing data Cons: Limited flexibility in terms of the trimming and filtering options. Does not provide quality control metrics or plots 20.5.2 Analysis: Bismark: A widely used tool for aligning bisulfite sequencing reads to a reference genome. It allows for paired-end and single-end reads, provides many options for handling sequencing errors and can output methylation calls in various formats (Liu et al. 2019). Pros: Performs alignment, quantification and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters Cons:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome Bowtie2: A fast and efficient aligner that can be used for bisulfite sequencing data, and can align reads to bisulfite-converted genomes or to an unconverted genome with a pre-built bisulfite index (Langmead and Salzberg 2012). Pros: Very fast and efficient, making it suitable for large datasets. Can align reads to either a bisulfite-converted genome or to an unconverted genome with a pre-built bisulfite index. Provides options for handling sequencing errors and optimizing alignment parameters Cons: Does not perform methylation calling or quantification 20.5.3 Methylation calling: Bismark: As well as performing alignment, Bismark can also be used to call methylation from aligned reads. It reports the percentage of cytosines methylated at each site (Liu et al. 2019). Pros: Performs both alignment and methylation calling in a single tool. Can output methylation calls in various formats. Provides many options for handling sequencing errors and optimizing methylation calling parameters Cons:Can be computationally intensive for large datasets. Requires a pre-built bisulfite-converted reference genome MethylDackel: A fast and efficient tool for methylation calling from bisulfite sequencing data. It can output methylation calls in various formats, including a methylation bedGraph. Pros: Very fast and efficient, making it suitable for large datasets. Provides options for handling sequencing errors and optimizing methylation calling parameters. Can output methylation calls in various formats, including a methylation bedGraph Cons:Does not perform alignment or methylation quantification 20.5.4 Methylation quantification: MethylKit: A popular tool for quantifying methylation levels from bisulfite sequencing data. It can handle various types of data and provides options for filtering out low-quality data and detecting differentially methylated regions (Akalin et al. 2012). Pros: Provides various options for filtering out low-quality data and detecting differentially methylated regions. Can handle various types of data, including bisulfite sequencing and reduced representation bisulfite sequencing. Provides many visualization tools for analyzing methylation data Cons: Can be computationally intensive for large datasets. Requires some knowledge of R programming language to use effectively Bismark: As well as methylation calling, Bismark can also quantify methylation levels at each cytosine site. It reports the number of methylated and unmethylated reads, as well as the percentage of methylation (Liu et al. 2019). 20.5.5 Analysis: DSS: A popular tool for identifying differentially methylated regions (DMRs) between groups of samples. It uses a statistical model to detect significant changes in methylation levels and reports DMRs with associated p-values (Feng and Conneely 2016). Pros: Uses a statistical model to identify differentially methylated regions between groups of samples. Provides various options for controlling false discovery rate and adjusting for multiple comparisons. Suitable for large datasets. Cons: Requires some knowledge of statistical methods and programming language to use effectively. May not be suitable for smaller datasets or datasets with low coverage. MethylKit: As well as methylation quantification, MethylKit can also be used for downstream analysis, such as clustering samples based on methylation patterns and performing functional annotation of differentially methylated regions (Akalin et al. 2012). 20.6 More resources DNA methylation analysis with Galaxy tutorial The mint pipeline for analyzing methylation and hydroxymethylation data. Book chapter about finding methylation regions of interest References "],["microbiome-sequencing.html", "Chapter 21 Microbiome Sequencing 21.1 Learning Objectives 21.2 A Brief Introduction to Microbiomes 21.3 Goals of Amplicon analysis 21.4 Microbiome Analysis with QIIME 2", " Chapter 21 Microbiome Sequencing This chapter is incomplete! If you wish to contribute, please go to this form or our GitHub page. 21.1 Learning Objectives 21.2 A Brief Introduction to Microbiomes Microbes are everywhere. We have found these tiny organisms in the deepest regions of the ocean and in the upper atmosphere. We have found them in: + water that has been solid ice for millennia in the Antarctic + boiling water in the geysers of Yellowstone National Park. + the driest natural environments on Earth, including the Atacama Desert in Chile, where desiccation resistant microbes hide in the soil sometimes waiting ten years for the drop of rain that will jump start their metabolism long enough for them to reproduce before they return to dormancy. + perpetually damp environments, like the intestinal tract of the human body where they are constantly the subject of inspection by our diligent immune cells, and where they impact our health in positive and negative ways that we are only beginning to understand. + our nuclear reactors, prompting questions about whether we could harness them as tiny machines to help us remediate environmental disasters of the past, present, and future. If we looked hard enough, I think we’d find them on the surface of the moon and Mars, though they are probably microbes who stowed away on our spacecraft and are now patiently waiting for a drop of water that may or may not ever show up. If we ever colonize those worlds, microbes will be an indispensable ally in creating an environment that could sustain us. This figure is adapted from (Tignat-Perrier et al. 2022) under Creative Commons license. Microbes almost never live alone in the real world (i.e., outside of a laboratory). Rather they exist in communities of different species who are interacting with each other and their environment. Some of these communities will have many different types of organisms, and some will have only a few. Because of the large number of species and individuals involved, no two communities will ever be exactly alike, and quantifying differences between microbial communities is an important area of research at the moment. The types of interactions between organisms are also highly varied. These can include mutualistic relationships, where both organisms benefit from the interaction; parasitic relationships, where one organism exclusively benefits to the detriment of the other; and the full gradient in between. Microbiome science is everywhere. There are tens of articles published daily in the scientific literature, and many popular science articles and books present these findings to the world of non-scientists. Understanding the promises and limitations of the methods of microbiome science can help avoid misconceptions about microbiome research, and it’s important for practitioners of microbiome science to understand and convey the promise and limitations of our field. Misconceptions abound, frequently arising from the same sources as high-quality popular science microbiome reporting. For example, on 5 Feb 2015 an article appeared in the New York Times noting (almost offhand) that Yersinia pestis, the organism responsible for Bubonic plague, had been found in multiple locations throughout the New York City subway system as part of its normal built environment microbiome. This was rapidly followed up on 6 Feb 2015 with an article noting that there was probably not Bubonic plague on the subway system after all, but rather that the approaches used by the research team are limited in their taxonomic resolution, and that likely a harmless close relative of Y. pestis was observed: “What the researchers probably found, [a spokesman for the university where the study originated] said, was bacteria from an unknown species or from organisms that happened to share some gene sequences with the plague bacterium…”. As microbiome services and products are increasingly marketed directly to the public, consumers of microbiome research findings, products, and services need to know how to critically evaluate these offerings and their associated claims. As practitioners in the field, we can help by ensuring that the methods we apply are appropriate and reliable, and that we make our work accessible. 21.3 Goals of Amplicon analysis The technologies that are enabling work in microbiome science are the same that are driving the data revolution in biology. Primarily this work is driven by high-throughput DNA sequencing, which is applied for profiling microbial community composition: marker gene profiling (such as 16S or ITS sequencing) functional potential (such as shotgun metagenomic sequencing) functional activity (such as metatranscriptome sequencing) Other “omics” technologies are now playing an increasing role in microbiome research, such as: mass-spectrometry-based metabolomics, which provides profiles of small molecule metabolites in an environment. metaproteomics which provides more detailed descriptions of functional activities of microbes (and their hosts, if applicable). As a result, bioinformatics software tools are essential to microbiome research. For many microbiome researchers, bioinformatics is an intimidating and challenging aspect of their projects. 21.4 Microbiome Analysis with QIIME 2 QIIME 2 is an all in one bioinformatics microbiome analysis platform. This platform allows for users to go from sequenced microbiome data to publication ready visualizations. The original QIIME, now referred to as QIIME 1, was published in 2010 (Caporaso et al. 2010) and has been cited tens of thousands of times in the primary literature. QIIME 2, which was published in July of 2019 (Bolyen et al. 2019), succeeded QIIME 1 on 1 January 2018. QIIME 2 is better than QIIME 1 in all ways, and QIIME 1 is no longer actively supported. If you have previously used QIIME 1, you should invest time in learning and switching to QIIME 2. If you’re new to QIIME, start with QIIME 2. (When I refer to QIIME in this book, without specifying whether I’m referring to QIIME 1 or QIIME 2, I’m referring to the platform generally.) QIIME 2 has large and growing user and developer communities, and these communities make QIIME 2 possible. The epicenter of the community is the QIIME 2 Forum. The forum is primarily known as a place where users can get technical support with QIIME 2 for no charge. Developers of QIIME 2 moderate the forum, and typically respond to technical support questions within a couple of business days. The forum is also a great place to discuss general topics in microbiome bioinformatics, or microbiome research methods generally. There are many active discussions on these topics on the forum. Keeping up with the discussions on the forum is a great way to learn about current topics in microbiome research methods. There’s also a free job board on the forum - you can use the forum to find jobs, or post your own job ads there to find employees who are well-versed in QIIME 2 and other bioinformatics tools. If you’re not already a member of the QIIME 2 Forum, you should consider joining. It’s a great way for you to get help, and as you develop your QIIME 2 skills helping others on the forum is a great way to reenforce your learning and to get involved in the community. Here is a high-level introduction to microbiome analysis using QIIME 2. This introduction will go over common methods, metrics and approaches used for microbiome science. So grab a cup of your favorite hot beverage and let’s get started! ☕ References "],["itcr--omic-tool-glossary.html", "Chapter 22 ITCR -omic Tool Glossary 22.1 ARCHS4 22.2 Bioconductor 22.3 Cancer Models 22.4 CIViC 22.5 CTAT 22.6 DeepPhe 22.7 Genetic Cancer Risk Detector (GARDE) 22.8 GenePattern 22.9 Gene Set Enrichment Analysis (GSEA) 22.10 Integrative Genomics Viewer (IGV) 22.11 NDEx 22.12 MultiAssayExperiment 22.13 OpenCRAVAT 22.14 pVACtools 22.15 TumorDecon 22.16 WebMeV 22.17 Xena", " Chapter 22 ITCR -omic Tool Glossary Here’s all the tools that have been mentioned in this course or are otherwise recommended for your use. The list is in alphabetical order. ARCHS4 Bioconductor Notable Bioconductor genomics tools: Cancer Models CIViC CTAT DeepPhe Genetic Cancer Risk Detector (GARDE) GenePattern Gene Set Enrichment Analysis (GSEA) Integrative Genomics Viewer (IGV) NDEx MultiAssayExperiment OpenCRAVAT pVACtools TumorDecon WebMeV Xena 22.1 ARCHS4 All RNA-seq and ChIP-seq sample and signature search (ARCHS4) (https://maayanlab.cloud/archs4/) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from GEO and SRA. The ARCHS4 website provides the uniformly processed data for download and programmatic access in H5 format, and as a 3-dimensional interactive viewer and search engine. Users can search and browse the data by metadata enhanced annotations, and can submit their own gene sets for search. Subsets of selected samples can be downloaded as a tab delimited text file that is ready for loading into the R programming environment. To generate the ARCHS4 resource, the kallisto aligner is applied in an efficient parallelized cloud infrastructure. Human and mouse samples are aligned against the most recent Ensembl annotation (Ensembl 107). 22.2 Bioconductor The mission of the Bioconductor project is to develop, support, and disseminate free open source software that facilitates rigorous and reproducible analysis of data from current and emerging biological assays. We are dedicated to building a diverse, collaborative, and welcoming community of developers and data scientists. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community. Bioconductor is also available as Docker images. 22.2.1 Notable Bioconductor genomics tools: annotatr ensembldb GenomicRanges - useful for manipulating and identifying sequences. GO.db - Gene ontology annotation org.Hs.eg.db RSamtools A full list of Bioconductors annotation packages - contains annotation for all kinds of species and versions of genomes and transcriptomes. ComplexHeatmap MultiAssayExperiment limma DESEq2 edgeR curatedTCGAData cBioPortalData SingleCellMultiModal 22.3 Cancer Models Patient Derived Cancer Models Finder (www.cancermodels.org) is a cancer research platform that aggregates clinical, genomic and functional data from patient-derived xenografts, organoids and cell lines. The PDCM Finder standardises, harmonises and integrates the complex and diverse data associated with PDCMs for cancer community. Data types used are model meta data, related clinical metadata from the sample for which the model was derived, e.g. molecular and treatment-based. Data are preprocessed, consistently semantically annotated, harmonised and FAIR. PDCM Finder contains &gt;6200 models across 13 cancer types, including rare pediatric models (17%) and models from minority ethnic backgrounds (33%), making it the largest free to consumer and open access resource of this kind. Get started at www.cancermodels.org to browse and query models by cancer type 22.4 CIViC CIViC is a knowledgebase and curation interface for the clinical interpretation of variants in cancer. Evidence is curated from published literature describing the diagnostic, prognostic, predictive, predisposing, oncogenic, or functional role of variants in specific cancer types. Evidence submitted by community curators is revised and moderated by expert editors. Individual evidence is synthesized into gene summaries, variant summaries and variant-disease assertions of specific clinical relevance. Anyone can make use of CIViC knowledge through the open web interface or API. Information on how to use or contribute to CIViC is available in our help docs (docs.civicdb.org). The main distinguishing feature of CIViC compared to similar resources it is total commitment to open data sharing. All data are available in the Public Domain (CC0). The code is available for any use under an MIT license. 22.5 CTAT The Trinity Cancer Transcriptome Analysis Toolkit (CTAT) provides a diverse collection of tools to gain insights into the biology of cancer through the lens of the transcriptome. Using RNA-seq as input, CTAT modules enable detection of mutations, fusion transcripts, copy number aberrations, cancer-specific splicing aberrations, and oncogenic viruses including insertions into the human genome. CTAT uses both read mapping and de novo assembly methods to analyze RNA-seq, leveraging tumor bulk and single cell transcriptomes. CTAT modules provide interactive visualizations as outputs, are easily installed for local execution or run via cloud computing (eg. Terra), have detailed user guides and tutorials, and are well-supported through user forums. 22.6 DeepPhe DeepPhe: Natural Language Processing Tools for Cancer Research Under development since 2014, the DeepPhe suite of software tools aims to extract deep phenotype information from the Electronic Medical Records from patients with cancer. DeepPhe combines: multiple natural language processing (NLP) techniques based on cTAKES,1 a structured cancer information model including concepts from the NCIT and the HemOnc ontology a graph data model supporting persistence of extracted details including links between patient data enabling semantically informed interpretation, aggregation, and disaggregation of key attributes, visual analytics tools supporting patient- and cohort-level displays of extracted data5 including identification of patients matching key research criteria and the examination of individual patient records such as exploration of links between summary items and supporting text mentions, and multiple strategies for use, including containerized REST services and GUIs for installation and pipeline execution. DeepPhe tools are available for download and installation from the DeepPhe website under an open-source license for non-commercial use. 22.7 Genetic Cancer Risk Detector (GARDE) Genetic Cancer Risk Detector (GARDE) screens and identifies patients who meet National Comprehensive Cancer Network (NCCN) criteria for genetic evaluation of familial cancer risk based on their family history in the EHR using both structured data and natural language processing of free-text data. Patients identified by GARDE are imported into an EHR’s population health management dashboard (e.g., Epic’s Healthy Planet module) where genetic counseling staff review individual cases, select, and send bulk outreach messages to patients via chatbot and/or through the patient portal. GARDE is a population clinical decision support (CDS) platform based on Fast Healthcare Interoperability Resources (FHIR) and CDS Hooks standards to support interoperability and logic sharing beyond single vendor solutions. 22.8 GenePattern GenePattern, www.genepattern.org, is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data. Analyses include general machine learning methods, the gene set enrichment analysis suite, ’omics-specific tools for bulk and single-cell gene expression, proteomics, flow cytometry, variant annotation, sequence variation and others, as well as cancer-specific analyses. Also included are data preprocessing and utility tools. A web-based interface provides easy, non-programmatic access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research. The GenePattern Notebook interface, notebook.genepattern.org, extends the Jupyter Notebook system to allow users to combine GenePattern analyses with text, graphics, and code to create complete research narratives. It includes many additional features to make notebooks accessible to non-programmers. The online GenePattern Notebook Workspace allows investigators to create, run, and collaborate on notebooks using only a web browser. A library of GenePattern Notebooks implementing common scientific workflows is available for investigators to use as templates and adapt to their own requirements. To get started with GenePattern you can go through the GenePattern Quick Start Tutorial, view the GenePattern User Guide, or the videos on our YouTube channel. To learn more about GenePattern Notebook, view the GenePattern Notebook Quick Start, GenePattern Notebook documentation, run through the tutorial notebooks (click the Tutorial button), or view the videos on the GenePattern Notebooks YouTube channel. 22.9 Gene Set Enrichment Analysis (GSEA) Gene Set Enrichment Analysis (GSEA) is a method to identify the coordinate activation or repression of groups of genes that share common biological functions, pathways, chromosomal locations, or regulation, thereby distinguishing even subtle differences between phenotypes or cellular states. Gene set-based enrichment analysis is now standard practice for interpreting global transcription profiling experiments and elucidating the biological mechanisms associated with disease and other biological phenotypes of interest. The method is more powerful than typical single-gene approaches to comparing phenotypes, as it can identify sets of genes (e.g., perturbation signatures or molecular pathways) that are coordinately up- or downregulated when each gene in the set may not be significantly differentially expressed. The GSEA software provides useful visualizations and reports for the exploration and interpretation of results. GSEA bundles direct access to the Molecular Signatures Database (MSigDB) – a comprehensive curated repository of annotated gene sets representing signatures derived from publications, pathway databases, and other sources of public data; MSigDB can also be used independently. The website for the GSEA-MSigDB resource can be found at gsea-msigdb.org. To get started with GSEA you can view the GSEA User Guide, and access the GSEA software through the downloads page or through the GSEA modules available on GenePattern. See the MSigDB section of the website for more information about MSigDB and to interactively explore the gene sets and their annotations. User support for GSEA and MSigDB is available through our help forum. 22.10 Integrative Genomics Viewer (IGV) The Integrative Genomics Viewer (IGV) is a track-based browser for interactively exploring genomic data mapped to a reference genome. IGV supports all the standard genomic data types (aligned reads, variants, signal peaks, genome annotations, copy number variation, etc.) as well as sample information, such as clinical, phenotypic, or other attributes. IGV provides great flexibility in loading data, whether investigator generated or publicly available, directly from multiple disparate sources without the need for any pre-processing. Supported data sources include local file systems; web servers on the user’s intranet or the Internet; commercial cloud providers (Google, Amazon, Azure, Dropbox); web links to data in public repositories. Authentication to access private data on the web is supported with the industry standard OAuth protocol. IGV is available in multiple forms, including both end-user applications and versions for use by developers. The IGV website at https://igv.org provides access to all modalities of IGV. Download and install the IGV Desktop application from the downloads page. To learn about using the application see the tutorial videos on the IGV YouTube channel and the online User Guide. The IGV-Web app is available at https://igv.org/app. To learn about using the app, the Help link in the menu bar provides access to the documentation, and see also the tutorial videos on the YouTube channel. The igv.js JavaScript component is for web developers who wish to embed IGV in their web apps or portals. More information can be found in the Readme file and the Wiki in the igv.js GitHub repository. IGV user support is available through the igv-help online forum and the GitHub repositories. 22.11 NDEx The Network Data Exchange (NDEx) project provides an open-source framework where scientists and organizations can store, share and publish biological network knowledge. A distinctive feature of NDEx is that it serves as a home for models that are currently available only as figures, tables, or supplementary information, such as networks produced via systematic mining and integration of large-scale molecular data. NDEx includes features to support data distribution and access according to FAIR principles. Its full integration with Cytoscape, the popular desktop application for network analysis and visualization, provides the cloud back-end component for data I/O; so, if a network file format can be opened in Cytoscape, it can also be stored in (and retrieved from) NDEx. NDEx can be accessed via its web user interface or programmatically, via REST API and client libraries in Python, R, Java. Web applications can interface with NDEx via JavaScript: MSigDB, CRAVAT, cBioPortal and IQuery, are all examples of web applications integrated with NDEx. For more information, please review the About NDEx page. To get started, visit the NDEx public server: there, you can review the NDEx FAQ, access documentation, contact us, and search or browse thousands of biological network models. 22.12 MultiAssayExperiment MultiAssayExperiment is an R/Bioconductor package that harmonizes data management, manipulation, and subsetting of multiple experimental assays performed on an overlapping set of specimens. It supports on-disk and remote data storage, and provides reshaping tools for adaptability to arbitrary downstream analysis. MultiAssayExperiment is distinct from alternative approaches in its focus on multi’omic data management and manipulation and in its integration with the Bioconductor ecosystem: it is used by more than 50 other Bioconductor packages, it provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment while supporting an open-ended mix of data classes for individual assays, and it allows subsetting by genomic ranges, row names, phenotypic data, and assays. You can get started with the MultiAssayExperiment Bioconductor package documentation, or start with prebuilt MultiAssayExperiments objects from curatedTCGAData, cBioPortalData, or SingleCellMultiModal. 22.13 OpenCRAVAT OpenCRAVAT uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations. To get started go to opencravat.org. Download and run on your local machine, multi-user servers, at https://run.opencravat.org or in the cloud. We offer a broader selection of annotation tools than comparable software and results can be explored with an interactive GUI that provides customized filtering options, interactive tables and widgets. Use it for a single sample or a large cohort, or pull single variant reports with a structured url (Example: https://run.opencravat.org/webapps/variantreport/index.html?chrom=chr11&amp;pos=48123823&amp;ref_base=A&amp;alt_base=C ) 22.14 pVACtools Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines. We have built a computational framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant allele expression, peptide binding affinities, and determination whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows clinical users to efficiently generate, review, and interpret results, selecting candidate peptides for individual patient vaccine designs. Additional modules support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq and pVACfuse), prioritization, and selection using a graphical Web-based interface (pVACview), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide vaccines. pVACtools is available at http://www.pvactools.org. 22.15 TumorDecon It is only software that includes these four digital cytometry methods in one platform, so that users can compare the results of these methods. It is the only software that includes a method for creating signature matrix from single cell gene expression data. TumorDecon software includes four deconvolution methods (DeconRNAseq [Gong2013], CIBERSORT [Newman2015], ssGSEA [Şenbabaoğlu2016], Singscore [Foroutan2018]) and several signature matrices of various cell types, including LM22. The input of this software is the gene expression profile of the tumor, and the output is the relative number of each cell type and several visualization plots. Users have an option to choose any of the implemented deconvolution methods and included signature matrices or import their own signature matrix to get the results. Additionally, TumorDecon can be used to generate customized signature matrices from single-cell RNA-sequence profiles. In addition to the 3 tutorials provided on GitHub (tutorial.py, sig_matrix_tutorial.py, &amp; full_tutorial.py) there is a User Manual available at: https://people.math.umass.edu/~aronow/TumorDecon TumorDecon is available on Github (https://github.com/ShahriyariLab/TumorDecon) and PyPI (https://pypi.org/project/TumorDecon/). For more info please see: Rachel A. Aronow, Shaya Akbarinejad, Trang Le, Sumeyye Su, Leili Shahriyari, TumorDecon: A digital cytometry software, SoftwareX, Volume 18, 2022, 101072, https://doi.org/10.1016/j.softx.2022.101072. 22.16 WebMeV WebMeV is an online tool that facilitates analysis of large-scale RNA-seq and other multi-omic datasets by providing intuitive access to advanced analytical methods and high-performance computing for a wide range of basic, clinical, and translational researchers. Although WebMeV provides support for “bulk” RNA-seq data, single-cell RNA-seq, and other types of -omic data and provides easy access to public data resources such as The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project (GTEx)—as well as user-provided data. WebMeV uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses that are increasingly required for genomic data analysis. WebMeV’s design places an emphasis on user-driven data analysis by providing users the ability to visualize, interact with, and dissect genomic data at each step in the analysis with a “point-and-click” interactive data environment. Although the primary input is normalized “count matrices,” WebMeV does include tools for data normalization and quality control and uses Dropbox and Google Drive as means of easily uploading data. Analytical methods include statistical tests for comparing cohorts, for identifying gene seats, for doing functional enrichment analysis on gene sets (GSEA), and for inferring gene regulatory network models and comparing these networks between phenotypes to understand the drivers of disease. WebMeV also provides a platform to support reproducible research and makes code for the entire system and its component methods available as open-source software code. 22.17 Xena UCSC Xena is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. Xena showcases seminal cancer genomics datasets from TCGA, the Pan-Cancer Atlas, GDC, PCAWG, ICGC, and more; a total of more than 1500 datasets across 50 cancer types. We support virtually any type of functional genomics data (sometimes known as level 3 or 4 data). This includes SNPs, INDELs, copy number variation, gene expression, ATAC-seq, DNA methylation, exon-, transcript-, miRNA-, lncRNA-expression and structural variants. We also support clinical data such as phenotype information, subtype classifications and biomarkers. All of our data is available for download via python or R APIs, or through our URL links. 22.17.1 Questions Xena can help you answer include: Is overexpression of this gene associated with better survival? What genes are differentially expressed between these two groups of samples? What is the relationship between mutation, copy number, expression, etc for this gene? Our tool differentiates itself by its ability to visualize more uncommon data types, such as DNA methylation, its visual integration of multiple types of genomic data side-by-side, and its ability to easily privately visualize your own data. Get started with our tutorials: https://ucsc-xena.gitbook.io/project/tutorials. If you use us please cite us: https://www.nature.com/articles/s41587-020-0546-8 "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) Candace Savonen Lecturer(s) Candace Savonen Content Contributor(s) Cailin Jordan - sc-ATAC-Seq Carrie Wright Claire Mills - Whole Genome Sequencing Jacob Greene - ChIP-seq Kate Isaac - Goals of DNA Methods Oscar Ospina - Spatial transcriptomics Ye Zheng - CUTRUN/CUTTag Content Directors Jeff Leek Content Consultants Carrie Wright Cliff Meyer - ATAC-seq Frederick Tan Content Editors/Reviewers Kate Isaac Acknowledgments Technical Course Publishing Engineer Candace Savonen Template Publishing Engineers Candace Savonen, Carrie Wright Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Funding Funder National Cancer Institute (NCI) UE5 CA254170 Funding Staff Sandy Ormbrek, Shasta Nicholson   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.2 (2023-10-31) ## os Ubuntu 22.04.4 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-12-11 ## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## askpass 1.2.0 2023-09-03 [1] RSPM (R 4.3.0) ## bookdown 0.41 2024-10-16 [1] CRAN (R 4.3.2) ## bslib 0.6.1 2023-11-28 [1] RSPM (R 4.3.0) ## cachem 1.0.8 2023-05-01 [1] RSPM (R 4.3.0) ## chromote 0.3.1 2024-08-30 [1] CRAN (R 4.3.2) ## cli 3.6.2 2023-12-11 [1] RSPM (R 4.3.0) ## devtools 2.4.5 2022-10-11 [1] RSPM (R 4.3.0) ## digest 0.6.34 2024-01-11 [1] RSPM (R 4.3.0) ## dplyr 1.1.4 2023-11-17 [1] RSPM (R 4.3.0) ## ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.3.0) ## evaluate 0.23 2023-11-01 [1] RSPM (R 4.3.0) ## fansi 1.0.6 2023-12-08 [1] RSPM (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.3.0) ## fs 1.6.3 2023-07-20 [1] RSPM (R 4.3.0) ## generics 0.1.3 2022-07-05 [1] RSPM (R 4.3.0) ## glue 1.7.0 2024-01-09 [1] RSPM (R 4.3.0) ## hms 1.1.3 2023-03-21 [1] RSPM (R 4.3.0) ## htmltools 0.5.7 2023-11-03 [1] RSPM (R 4.3.0) ## htmlwidgets 1.6.4 2023-12-06 [1] RSPM (R 4.3.0) ## httpuv 1.6.14 2024-01-26 [1] RSPM (R 4.3.0) ## httr 1.4.7 2023-08-15 [1] RSPM (R 4.3.0) ## janitor 2.2.0 2023-02-02 [1] RSPM (R 4.3.0) ## jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.3.0) ## jsonlite 1.8.8 2023-12-04 [1] RSPM (R 4.3.0) ## knitr 1.48 2024-07-07 [1] CRAN (R 4.3.2) ## later 1.3.2 2023-12-06 [1] RSPM (R 4.3.0) ## lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.3.0) ## lubridate 1.9.3 2023-09-27 [1] RSPM (R 4.3.0) ## magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.3.0) ## memoise 2.0.1 2021-11-26 [1] RSPM (R 4.3.0) ## mime 0.12 2021-09-28 [1] RSPM (R 4.3.0) ## miniUI 0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0) ## openssl 2.1.1 2023-09-25 [1] RSPM (R 4.3.0) ## ottrpal 1.3.0 2024-10-23 [1] Github (jhudsl/ottrpal@2e19782) ## pillar 1.9.0 2023-03-22 [1] RSPM (R 4.3.0) ## pkgbuild 1.4.3 2023-12-10 [1] RSPM (R 4.3.0) ## pkgconfig 2.0.3 2019-09-22 [1] RSPM (R 4.3.0) ## pkgload 1.3.4 2024-01-16 [1] RSPM (R 4.3.0) ## processx 3.8.3 2023-12-10 [1] RSPM (R 4.3.0) ## profvis 0.3.8 2023-05-02 [1] RSPM (R 4.3.0) ## promises 1.2.1 2023-08-10 [1] RSPM (R 4.3.0) ## ps 1.7.6 2024-01-18 [1] RSPM (R 4.3.0) ## purrr 1.0.2 2023-08-10 [1] RSPM (R 4.3.0) ## R6 2.5.1 2021-08-19 [1] RSPM (R 4.3.0) ## Rcpp 1.0.12 2024-01-09 [1] RSPM (R 4.3.0) ## readr 2.1.5 2024-01-10 [1] RSPM (R 4.3.0) ## remotes 2.4.2.1 2023-07-18 [1] RSPM (R 4.3.0) ## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.2) ## rmarkdown 2.25 2023-09-18 [1] RSPM (R 4.3.0) ## rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.2) ## sass 0.4.8 2023-12-06 [1] RSPM (R 4.3.0) ## sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.3.0) ## shiny 1.8.0 2023-11-17 [1] RSPM (R 4.3.0) ## snakecase 0.11.1 2023-08-27 [1] RSPM (R 4.3.0) ## stringi 1.8.3 2023-12-11 [1] RSPM (R 4.3.0) ## stringr 1.5.1 2023-11-14 [1] RSPM (R 4.3.0) ## tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.2) ## tidyselect 1.2.0 2022-10-10 [1] RSPM (R 4.3.0) ## timechange 0.3.0 2024-01-18 [1] RSPM (R 4.3.0) ## tzdb 0.4.0 2023-05-12 [1] RSPM (R 4.3.0) ## urlchecker 1.0.1 2021-11-30 [1] RSPM (R 4.3.0) ## usethis 2.2.3 2024-02-19 [1] RSPM (R 4.3.0) ## utf8 1.2.4 2023-10-22 [1] RSPM (R 4.3.0) ## vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.3.0) ## webshot2 0.1.1 2023-08-11 [1] CRAN (R 4.3.2) ## websocket 1.4.2 2024-07-22 [1] CRAN (R 4.3.2) ## xfun 0.48 2024-10-03 [1] CRAN (R 4.3.2) ## xml2 1.3.6 2023-12-04 [1] RSPM (R 4.3.0) ## xtable 1.8-4 2019-04-21 [1] RSPM (R 4.3.0) ## yaml 2.3.8 2023-12-11 [1] RSPM (R 4.3.0) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library ## ## ────────────────────────────────────────────────────────────────────────────── "],["references.html", "References", " References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]
diff --git a/docs/no_toc/sequencing-data.html b/docs/no_toc/sequencing-data.html
index 8e48d182..d784b7a3 100644
--- a/docs/no_toc/sequencing-data.html
+++ b/docs/no_toc/sequencing-data.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 6 Sequencing Data | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 6 Sequencing Data | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="general-data-analysis-tools.html"/>
 <link rel="next" href="microarray-data.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 6 Sequencing Data | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,18 +535,18 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="sequencing-data" class="section level1" number="6">
-<h1><span class="header-section-number">Chapter 6</span> Sequencing Data</h1>
+<div id="sequencing-data" class="section level1 hasAnchor" number="6">
+<h1><span class="header-section-number">Chapter 6</span> Sequencing Data<a href="sequencing-data.html#sequencing-data" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-4" class="section level2" number="6.1">
-<h2><span class="header-section-number">6.1</span> Learning Objectives</h2>
-<p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png" title="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." alt="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." width="100%" /></p>
+<div id="learning-objectives-4" class="section level2 hasAnchor" number="6.1">
+<h2><span class="header-section-number">6.1</span> Learning Objectives<a href="sequencing-data.html#learning-objectives-4" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g144bc8d6a68_0_7.png" alt="This chapter will demonstrate how to: Understand the very general basics of sequencing data collection and processing workflow. Understand the limitations and strengths of sequencing data in general." width="100%" /></p>
 <p>In this section, we are going to discuss generalities that apply to all sequencing data. This is meant to be a “primer” for you which data-type specific chapters will build off of to give you more specific and practical steps and advice in regards to your data type.</p>
 </div>
-<div id="how-does-sequencing-work" class="section level2" number="6.2">
-<h2><span class="header-section-number">6.2</span> How does sequencing work?</h2>
+<div id="how-does-sequencing-work" class="section level2 hasAnchor" number="6.2">
+<h2><span class="header-section-number">6.2</span> How does sequencing work?<a href="sequencing-data.html#how-does-sequencing-work" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g14492c87338_0_45.png" width="100%" /></p>
 <p>Sequencing methods, whether they are targeting DNA, transcriptomes, or some other target of the genome, have some commonalities in the steps as well as what types of biases and data generation artifacts to look out for.</p>
 <p>All sequencing experiments start out with the extraction of the biological material of interest. This biological material will be processed in some way to isolate to the genomic target of interest (we will cover the various techniques for this in more detail in each respective data chapter since it is highly specific to the data type).</p>
@@ -560,44 +554,44 @@ <h2><span class="header-section-number">6.2</span> How does sequencing work?</h2
 <p>The resulting sample material is often a very small quantity, which means Polymerase Chain Reaction (PCR) needs to be used to amplify the material to a quantity large enough to be reliably sequenced. We will talk about how this very common method not only amplifies the sequences we want to read but amplifies sequence method biases that we would like to avoid.</p>
 <p>At the end of this process, base sequences are called for the samples (with varying degrees of confidence), creating huge amounts of data and what hopefully contains valuable research insights.</p>
 </div>
-<div id="sequencing-concepts" class="section level2" number="6.3">
-<h2><span class="header-section-number">6.3</span> Sequencing concepts</h2>
-<div id="inherent-biases" class="section level3" number="6.3.1">
-<h3><span class="header-section-number">6.3.1</span> Inherent biases</h3>
-<p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_5.png" title="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" alt="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" width="100%" /></p>
+<div id="sequencing-concepts" class="section level2 hasAnchor" number="6.3">
+<h2><span class="header-section-number">6.3</span> Sequencing concepts<a href="sequencing-data.html#sequencing-concepts" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="inherent-biases" class="section level3 hasAnchor" number="6.3.1">
+<h3><span class="header-section-number">6.3.1</span> Inherent biases<a href="sequencing-data.html#inherent-biases" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_5.png" alt="Sequence related biases GC bias - guanine and cytosine bond melts at higher temp - if a sequence has a lot of G’s and C’s Sequence complexity - certain sequences more likely to have primers bound to them (and more likely to be sequenced). Length bias - longer targets are more likely to be amplified or sequenced. These biases are worsened by PCR amplification!" width="100%" /></p>
 <p>Sequences are not all sequenced or amplified at the same rate. In a perfect world, we could take a simple snapshot of the genome we are interested in and know exactly what and how many sequences were in a sample. But in reality, sequencing methods and the resulting data always have some biases we have to be aware of and hopefully use methods that attempt to mitigate the biases.</p>
-<div id="gc-bias" class="section level4" number="6.3.1.1">
-<h4><span class="header-section-number">6.3.1.1</span> GC bias</h4>
+<div id="gc-bias" class="section level4 hasAnchor" number="6.3.1.1">
+<h4><span class="header-section-number">6.3.1.1</span> GC bias<a href="sequencing-data.html#gc-bias" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>You may recall that with nucleotides: adenine binds with thymine and guanine binds with cytosine. But, the guanine-cytosine bond (GC) has 3 hydrogen bonds whereas the adenine-thymine bond (AT) has only 2 bonds. This means that the GC bond is stickier (to put it scientifically) and needs higher temperatures to unbind. The sequencing and PCR amplification process involves cycling through temperatures and binding and unbinding of sequences which means that if a sequence has a lot of G’s and C’s (high GC content) it will unbind at a different temperatures than a sequence of low GC content.</p>
 </div>
-<div id="sequence-complexity" class="section level4" number="6.3.1.2">
-<h4><span class="header-section-number">6.3.1.2</span> Sequence complexity</h4>
+<div id="sequence-complexity" class="section level4 hasAnchor" number="6.3.1.2">
+<h4><span class="header-section-number">6.3.1.2</span> Sequence complexity<a href="sequencing-data.html#sequence-complexity" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Nonrepeating sequences are harder to sequence and amplify than repeating sequences. This means that the complexity of a target sequence influences the PCR amplification and detection.</p>
 </div>
-<div id="length-bias" class="section level4" number="6.3.1.3">
-<h4><span class="header-section-number">6.3.1.3</span> Length bias</h4>
+<div id="length-bias" class="section level4 hasAnchor" number="6.3.1.3">
+<h4><span class="header-section-number">6.3.1.3</span> Length bias<a href="sequencing-data.html#length-bias" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Longer sequences – whether they represent long sequence variants, long transcripts, or etc, are more likely to be identified than shorter ones! So if you are attempting to quantify the presence of a sequence, a longer sequence is much more likely to be counted more often.</p>
 </div>
 </div>
-<div id="pcr-amplification" class="section level3" number="6.3.2">
-<h3><span class="header-section-number">6.3.2</span> PCR Amplification</h3>
+<div id="pcr-amplification" class="section level3 hasAnchor" number="6.3.2">
+<h3><span class="header-section-number">6.3.2</span> PCR Amplification<a href="sequencing-data.html#pcr-amplification" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>All of the above biases are amplified when the sequences are being amplified! You can picture that if each of these biases have a certain effect for one copy, then as PCR steps copy the sequence exponentially, the error is also being multiplied! PCR amplification is generally a necessary part of the process. But there are tools that allow you to try to combat the biases of PCR amplification in your data analysis. These tools will be dependent on the type of sequencing methods you are using and will be something that is discussed in each data type chapter.</p>
 </div>
-<div id="depth-of-coverage" class="section level3" number="6.3.3">
-<h3><span class="header-section-number">6.3.3</span> Depth of coverage</h3>
+<div id="depth-of-coverage" class="section level3 hasAnchor" number="6.3.3">
+<h3><span class="header-section-number">6.3.3</span> Depth of coverage<a href="sequencing-data.html#depth-of-coverage" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_10.png" width="100%" /></p>
 <p>The depth of sequencing refers to how many times on average a particular base is sequenced. Obviously the more times something is sequenced, the more you can be confident that the base call is accurate. However, sequencing at greater depths also takes more time and money. Depending on your sequencing goals and methods there is an appropriate level of depth that is needed.</p>
 <p>Coverage on the other hand has to do with how much of the target is covered. If you are doing Whole Genome Sequencing, what percentage of the whole genome were you able to sequence? You may realize how depth is related to coverage, in that the greater depth of sequencing you use the more likely you are to also cover more of the genome. As discussed in relation to the biases, some part of the genome are harder to reach than others, so by reading at greater depths some of those “hard to read” parts of the genome will be able to be covered.</p>
 </div>
-<div id="quality-controls" class="section level3" number="6.3.4">
-<h3><span class="header-section-number">6.3.4</span> Quality controls</h3>
+<div id="quality-controls" class="section level3 hasAnchor" number="6.3.4">
+<h3><span class="header-section-number">6.3.4</span> Quality controls<a href="sequencing-data.html#quality-controls" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_0.png" width="100%" /></p>
 <p>Sequencing bases involves some error/confidence rate. As mentioned, some parts of the genome are harder to read than others. Or, sometimes your sequencing can be influenced by poor quality sample that has degraded. Before you jump in to further analyzing your data, you will want to investigate the quality of the sequencing data you’ve collected.</p>
 <p>The most common and well-known method for assessing sequencing quality controls is <a href="https://www.bioinformatics.babraham.ac.uk/projects/fastqc/">FASTQC</a>. FASTQC creates an abundance of sequencing quality control reports from fastq files. These reports need to be interpreted within the context of your sequencing methods, samples, and experimental goals. Often bioinformatics cores are good to contact about these reports (they may have already run FASTQC on your data if that is where you obtained your data initially). They can help you wade through the flood of quality control reports printed out by FASTQC.</p>
 <p>FASTQC also has great documentation that can attempt to guide you through report interpretation. This also includes examples of good and bad FASTQC reports. But note that all FASTQC report interpretations must be done relative to the experiment that you have done. In other words, there is not a one size fits all quality control cutoffs for your FASTQC reports. The failure/success icons FASTQC reports back are based on defaults that may not be accurate or applicable to your data, so further investigation and consultation is warranted before you decided to trust or pitch your sequencing data.</p>
 </div>
-<div id="alignment" class="section level3" number="6.3.5">
-<h3><span class="header-section-number">6.3.5</span> Alignment</h3>
+<div id="alignment" class="section level3 hasAnchor" number="6.3.5">
+<h3><span class="header-section-number">6.3.5</span> Alignment<a href="sequencing-data.html#alignment" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_15.png" width="100%" /></p>
 <p>Once you have your reads and you find them reasonably trustworthy through quality control checks, you will want to align them to your reference. The reference you align your sequences to will depend on the data type you have: a reference genome, a reference transcriptome, something else?</p>
 <ul>
@@ -606,43 +600,43 @@ <h3><span class="header-section-number">6.3.5</span> Alignment</h3>
 </ul>
 <p>TODO: considerations for alignment.</p>
 </div>
-<div id="single-end-vs-paired-end" class="section level3" number="6.3.6">
-<h3><span class="header-section-number">6.3.6</span> Single End vs Paired End</h3>
+<div id="single-end-vs-paired-end" class="section level3 hasAnchor" number="6.3.6">
+<h3><span class="header-section-number">6.3.6</span> Single End vs Paired End<a href="sequencing-data.html#single-end-vs-paired-end" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p><img src="resources/images/06-sequencing-data_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15b8dbcbef7_0_20.png" width="100%" /></p>
 <p>Sequencing can be done single-end or paired-end. Paired end means the primers are going to bind to both sides of a sequence. This can help you avoid some 3’ bias and give you more complete coverage of the area you are sequencing. But, as you may guess, pair-end read sequencing is more expensive than single end.</p>
 <p>You will want to determine whether your sequencing is paired end or single end. If it is paired end you will likely see file names that indicate this. You should have pairs of files that may or may not be labeled with <code>_1</code> and <code>_2</code> or <code>_F</code> and <code>_R</code>. We will discuss file nomenclature more specifically as it pertains to different data types in the upcoming chapters.</p>
 </div>
 </div>
-<div id="very-general-sequencing-workflow" class="section level2" number="6.4">
-<h2><span class="header-section-number">6.4</span> Very General Sequencing Workflow</h2>
+<div id="very-general-sequencing-workflow" class="section level2 hasAnchor" number="6.4">
+<h2><span class="header-section-number">6.4</span> Very General Sequencing Workflow<a href="sequencing-data.html#very-general-sequencing-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>In the data type specific chapters, we will cover the sequencing data workflows and file formats in more detail. But in the most general sense, sequencing workflows look like this:</p>
-<div id="04D1599E33D92E370515664A58E78BE1EEB_69816">
-<div id="04D1599E33D92E370515664A58E78BE1EEB_69816_robot">
+<div id="id_04D1599E33D92E370515664A58E78BE1EEB_69816">
+<div id="id_04D1599E33D92E370515664A58E78BE1EEB_69816_robot">
 <a href="https://cloud.smartdraw.com/share.aspx/?pubDocShare=04D1599E33D92E370515664A58E78BE1EEB" target="_blank"><img src="https://cloud.smartdraw.com/cloudstorage/04D1599E33D92E370515664A58E78BE1EEB/preview2.png"></a>
 </div>
 </div>
 <script src="https://cloud.smartdraw.com/plugins/html/js/sdjswidget_html.js" type="text/javascript"></script>
 <script type="text/javascript">SDJS_Widget("04D1599E33D92E370515664A58E78BE1EEB",69816,1,"");</script>
 <p><br/></p>
-<div id="sequencing-file-formats-1" class="section level3" number="6.4.1">
-<h3><span class="header-section-number">6.4.1</span> Sequencing file formats</h3>
-<div id="sam---sequence-alignment-map-1" class="section level4" number="6.4.1.1">
-<h4><span class="header-section-number">6.4.1.1</span> SAM - Sequence Alignment Map</h4>
+<div id="sequencing-file-formats-1" class="section level3 hasAnchor" number="6.4.1">
+<h3><span class="header-section-number">6.4.1</span> Sequencing file formats<a href="sequencing-data.html#sequencing-file-formats-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="sam---sequence-alignment-map-1" class="section level4 hasAnchor" number="6.4.1.1">
+<h4><span class="header-section-number">6.4.1.1</span> SAM - Sequence Alignment Map<a href="sequencing-data.html#sam---sequence-alignment-map-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>SAM Files are text based files that have sequence information. It generally has not been quantified or mapped. It is the reads in their raw form. <a href="https://samtools.github.io/hts-specs/SAMv1.pdf">For more about SAM files</a>.</p>
 </div>
-<div id="bam---binary-alignment-map-1" class="section level4" number="6.4.1.2">
-<h4><span class="header-section-number">6.4.1.2</span> BAM - Binary Alignment Map</h4>
+<div id="bam---binary-alignment-map-1" class="section level4 hasAnchor" number="6.4.1.2">
+<h4><span class="header-section-number">6.4.1.2</span> BAM - Binary Alignment Map<a href="sequencing-data.html#bam---binary-alignment-map-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>BAM files are like SAM files but are compressed (made to take up less space on your computer). This means if you double click on a BAM file to look at it, it will look jumbled and unintelligible. You will need to convert it to a SAM file if you want to see it yourself (but this isn’t necessary necessarily).</p>
 </div>
-<div id="fasta---fast-a-1" class="section level4" number="6.4.1.3">
-<h4><span class="header-section-number">6.4.1.3</span> FASTA - “fast A”</h4>
+<div id="fasta---fast-a-1" class="section level4 hasAnchor" number="6.4.1.3">
+<h4><span class="header-section-number">6.4.1.3</span> FASTA - “fast A”<a href="sequencing-data.html#fasta---fast-a-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>Fasta files are sequence files that can be either nucleotide or amino acid sequences. They look something like this (the example below illustrating an amino acid sequence):</p>
 <pre><code>&gt;SEQ_ID
 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT</code></pre>
 <p>For <a href="https://en.wikipedia.org/wiki/FASTA_format">more about fasta files</a>.</p>
 </div>
-<div id="fastq---fast-q-1" class="section level4" number="6.4.1.4">
-<h4><span class="header-section-number">6.4.1.4</span> FASTQ - “Fast q”</h4>
+<div id="fastq---fast-q-1" class="section level4 hasAnchor" number="6.4.1.4">
+<h4><span class="header-section-number">6.4.1.4</span> FASTQ - “Fast q”<a href="sequencing-data.html#fastq---fast-q-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>A Fastq file is like a Fasta file except that it also contains information about the <strong>Q</strong>uality of the read. By quality, we mean, how sure was the sequencing machine that the nucleotide or amino acid called was indeed called correctly?</p>
 <pre><code>@SEQ_ID
 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
@@ -652,35 +646,42 @@ <h4><span class="header-section-number">6.4.1.4</span> FASTQ - “Fast q”</h4>
 <p>Later in this course we will discuss the importance of examining the quality of your sequencing data and how to do that. If you received your data from a bioinformatics core it is possible that they’ve already done this quality analysis for you.</p>
 <p><em>Sequencing data that is not of high enough quality should not be trusted!</em> It may need to be re-run entirely or may need extra processing (trimming) in order to make it more trustworthy. We will discuss this more in later chapters.</p>
 </div>
-<div id="bcl---binary-base-call-bcl-sequence-file-format-1" class="section level4" number="6.4.1.5">
-<h4><span class="header-section-number">6.4.1.5</span> BCL - binary base call (BCL) sequence file format</h4>
+<div id="bcl---binary-base-call-bcl-sequence-file-format-1" class="section level4 hasAnchor" number="6.4.1.5">
+<h4><span class="header-section-number">6.4.1.5</span> BCL - binary base call (BCL) sequence file format<a href="sequencing-data.html#bcl---binary-base-call-bcl-sequence-file-format-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>This type of sequence file is specific to Illumina data. In most cases, you will simply want to convert it to Fastq files for use with non-Illumina programs.</p>
 <p><a href="https://medium.com/@marija190396/bcl-to-fastq-conversion-e289852823d0">More about BCL to Fastq conversion</a>.</p>
 </div>
-<div id="vcf---variant-call-format-1" class="section level4" number="6.4.1.6">
-<h4><span class="header-section-number">6.4.1.6</span> VCF - Variant Call Format</h4>
+<div id="vcf---variant-call-format-1" class="section level4 hasAnchor" number="6.4.1.6">
+<h4><span class="header-section-number">6.4.1.6</span> VCF - Variant Call Format<a href="sequencing-data.html#vcf---variant-call-format-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>VCF files are further processed form of data than the sequence files we discussed above. VCF files are specially for storing only where a particular sample’s sequences differ or are <em>variant</em> from the reference genome or each other.</p>
 <p>This will only be pertinent to you if you care about DNA variants. We will discuss this in the DNA seq chapter.</p>
 <p>For <a href="https://en.wikipedia.org/wiki/Variant_Call_Format">more on VCF files</a>.</p>
 </div>
-<div id="maf---mutation-annotation-format-1" class="section level4" number="6.4.1.7">
-<h4><span class="header-section-number">6.4.1.7</span> MAF - Mutation Annotation Format</h4>
+<div id="maf---mutation-annotation-format-1" class="section level4 hasAnchor" number="6.4.1.7">
+<h4><span class="header-section-number">6.4.1.7</span> MAF - Mutation Annotation Format<a href="sequencing-data.html#maf---mutation-annotation-format-1" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p>MAF files are aggregated versions of VCF files. So for a group of samples for which each has a VCF file, your entire group of samples’ variants will be summarized in the form of a MAF file.</p>
 <p>For <a href="https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/#:~:text=Mutation%20Annotation%20Format%20(MAF)%20is,(or%20open%2Daccess).">more on MAF files</a>.</p>
 </div>
 </div>
-<div id="other-files-1" class="section level3" number="6.4.2">
-<h3><span class="header-section-number">6.4.2</span> Other files</h3>
+<div id="other-files-1" class="section level3 hasAnchor" number="6.4.2">
+<h3><span class="header-section-number">6.4.2</span> Other files<a href="sequencing-data.html#other-files-1" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>* If you didn’t see a file type listed you are looking for, take a look at this <a href="https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats">list by the BROAD</a>. Or, it may be covered in the data type specific chapters.</p>
 
 </div>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -750,7 +751,7 @@ <h3><span class="header-section-number">6.4.2</span> Other files</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/single-cell-atac-seq-1.html b/docs/no_toc/single-cell-atac-seq-1.html
index 94327008..1039a200 100644
--- a/docs/no_toc/single-cell-atac-seq-1.html
+++ b/docs/no_toc/single-cell-atac-seq-1.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 17 Single cell ATAC-Seq | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 17 Single cell ATAC-Seq | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="atac-seq-1.html"/>
 <link rel="next" href="chip-seq-1.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 17 Single cell ATAC-Seq | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,21 +535,21 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="single-cell-atac-seq-1" class="section level1" number="17">
-<h1><span class="header-section-number">Chapter 17</span> Single cell ATAC-Seq</h1>
+<div id="single-cell-atac-seq-1" class="section level1 hasAnchor" number="17">
+<h1><span class="header-section-number">Chapter 17</span> Single cell ATAC-Seq<a href="single-cell-atac-seq-1.html#single-cell-atac-seq-1" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is incomplete! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-15" class="section level2" number="17.1">
-<h2><span class="header-section-number">17.1</span> Learning Objectives</h2>
-<p><img src="resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png" title="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" width="100%" /></p>
+<div id="learning-objectives-15" class="section level2 hasAnchor" number="17.1">
+<h2><span class="header-section-number">17.1</span> Learning Objectives<a href="single-cell-atac-seq-1.html#learning-objectives-15" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/11b-sc-ATAC-Seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g227d7dd1e08_0_41.png" alt="Learning objectives This chapter will demonstrate how to: Understand the basics of single cell ATAC-Seq data collection and processing workflow Identify the next steps for your particular single cell ATAC-Seq data. Formulate questions to ask about your single cell ATAC-Seq data" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-scatac-seq-analysis" class="section level2" number="17.2">
-<h2><span class="header-section-number">17.2</span> What are the goals of scATAC-seq analysis?</h2>
+<div id="what-are-the-goals-of-scatac-seq-analysis" class="section level2 hasAnchor" number="17.2">
+<h2><span class="header-section-number">17.2</span> What are the goals of scATAC-seq analysis?<a href="single-cell-atac-seq-1.html#what-are-the-goals-of-scatac-seq-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>The primary goal of single-cell ATAC-seq is to obtain a high-resolution map of chromatin accessibility at the single-cell level. It is often used for the identification of cell type-specific cis-regulatory elements (CREs) or transcription factor (TF) binding sites because single-cell resolution enables researchers to parse heterogeneous subgroups within a sample. Single-cell ATAC-seq is often applied to questions in developmental biology and cell differentiation.</p>
 </div>
-<div id="scatac-seq-general-workflow-overview" class="section level2" number="17.3">
-<h2><span class="header-section-number">17.3</span> scATAC-seq general workflow overview</h2>
+<div id="scatac-seq-general-workflow-overview" class="section level2 hasAnchor" number="17.3">
+<h2><span class="header-section-number">17.3</span> scATAC-seq general workflow overview<a href="single-cell-atac-seq-1.html#scatac-seq-general-workflow-overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Align reads to genome and assign to cells based on barcodes
 This step can be performed using Cell Ranger if the data were generated using a 10X Genomics kit (commercially available). For other methods, this step largely resembles the alignment step of bulk ATAC-seq analysis, using aligners such as Bowtie2 or BWA, filtering tools such as Picard, and adapter-trimming tools such Trimmomatic. Prior to adapter trimming barcodes should be matched to the list of known barcodes generated in the experiment and either assigned to a cell or assigned as ambiguous. At this stage unique molecular identifiers (UMIs) added to fragments during library preparation are also extracted and associated with each read to allow for PCR deduplication.
 Quality control</p>
@@ -565,45 +559,45 @@ <h2><span class="header-section-number">17.3</span> scATAC-seq general workflow
 <p>Doublet detection is any approach that attempts to computationally identify cell barcodes which contain reads from a mixture of single cells. Although an extremely high number of fragment counts may indicate that a cell is in fact a doublet, doublet detection provides a more targeted approach by assigning a score or a probability that each cell is a doublet. These approaches may compare cells to simulated doublets generated randomly from the data, or may rely on the fact that the number of ATAC-seq reads in a single cell is limited to only two reads per cell for diploid organisms. This step is not as common in scATAC-seq analysis as it is in single cell RNA-seq analysis owing to the difficulty of estimating doublets from the highly sparse data, but can be done for additional rigor or if there is particular concern that the dataset contains a high number of doublets.</p>
 <p>Additionally, the fragment size distribution of the library should exhibit nucleosomal periodicity, where fragments are enriched at ~147 bp intervals corresponding to the length of nucleosome-bound DNA that are refractory to Tn5 insertion.</p>
 </div>
-<div id="peak-calling-1" class="section level2" number="17.4">
-<h2><span class="header-section-number">17.4</span> Peak calling</h2>
+<div id="peak-calling-1" class="section level2 hasAnchor" number="17.4">
+<h2><span class="header-section-number">17.4</span> Peak calling<a href="single-cell-atac-seq-1.html#peak-calling-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Peak calling in ATAC-seq is performed in a similar manner to bulk ATAC-seq [ref bulk chapter]. Importantly, it should be performed by treating data from all cells within a cluster as a pseudo-bulk replicate. This is because scATAC-seq data is highly sparse and any individual cell only has enough information to convey whether a region is accessible or inaccessible, due to the maximum of 2 reads per locus per cell. Peak calling is commonly performed using MACS2, but other peak callers suitable for ATAC-seq could be used as well, as described in our chapter on bulk ATAC-seq (reference).</p>
 </div>
-<div id="dimensionality-reduction" class="section level2" number="17.5">
-<h2><span class="header-section-number">17.5</span> Dimensionality reduction</h2>
+<div id="dimensionality-reduction" class="section level2 hasAnchor" number="17.5">
+<h2><span class="header-section-number">17.5</span> Dimensionality reduction<a href="single-cell-atac-seq-1.html#dimensionality-reduction" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>As ATAC-seq data is extremely high dimensional, with counts for hundreds of thousands of peaks in thousands of cells, dimensionality reduction must be performed to represent the data in a way which reflects the major sources of variation while allowing for efficient computation. Many of the most popular dimensionality reduction approaches for ATAC-seq are borrowed from natural language processing, including latent semantic indexing (LSI) as well as probabilistic approaches such as latent Dirichlet allocation (LDA) and probabilistic LSI (pLSI). LSI and its variations are commonly used and are a simple, efficient approach based on PCA. Probabilistic approaches calculate the probability of information in a dataset being related to specific ‘topics’ identified by the statistical model. They are more mathematically complex than LSI but attempt to more accurately reconstruct the latent (not observable) structure in the data.</p>
 </div>
-<div id="embedding-visualization" class="section level2" number="17.6">
-<h2><span class="header-section-number">17.6</span> Embedding (visualization)</h2>
+<div id="embedding-visualization" class="section level2 hasAnchor" number="17.6">
+<h2><span class="header-section-number">17.6</span> Embedding (visualization)<a href="single-cell-atac-seq-1.html#embedding-visualization" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Embedding is the process of representing the high-dimensional scATAC-seq dataset in two (or occasionally three) dimensions for visualization. First, dimensionality reduction must have been performed using one of the methods described in the section above. Then, the result of dimensionality reduction can be provided as input to the chosen embedding approach. The most common method for generating ATAC-seq embeddings is UMAP (Uniform Manifold approximation) but other methods, such as force-directed graph layouts or t-SNE (t-distributed Stochastic Neighbor Embedding) can also be used.</p>
 </div>
-<div id="clustering" class="section level2" number="17.7">
-<h2><span class="header-section-number">17.7</span> Clustering</h2>
+<div id="clustering" class="section level2 hasAnchor" number="17.7">
+<h2><span class="header-section-number">17.7</span> Clustering<a href="single-cell-atac-seq-1.html#clustering" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Clustering is the process of computationally detecting populations of cells with similar characteristics - in this case, cells with similar accessibility profiles. Leiden clustering, which uses the similarity of cells to their neighbors to group cells into clusters, is a common choice for identifying clusters in scATAC-seq data.</p>
 </div>
-<div id="cell-type-annotation" class="section level2" number="17.8">
-<h2><span class="header-section-number">17.8</span> Cell type annotation</h2>
+<div id="cell-type-annotation" class="section level2 hasAnchor" number="17.8">
+<h2><span class="header-section-number">17.8</span> Cell type annotation<a href="single-cell-atac-seq-1.html#cell-type-annotation" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Cell type annotation on scATAC-seq data alone can be performed based on the enrichment of cell-type-specific CREs, or alternatively can be performed based on gene expression patterns observed in integrated scRNA-seq data. Gene scores are a measure of the accessibility of a gene locus and putative CREs within a defined window of the gene. Gene scores significantly above the expected background suggest a gene is active in a given cell type, and these scores can be used to identify markers for cell type annotation. Integration with scRNA-seq data can allow for identification of cell types which may be difficult to distinguish based on ATAC-seq profiles alone(ref), but requires an scRNA-seq dataset of a comparable population of cells.</p>
 <p>Trajectory analysis, which is used to infer and visualize the developmental or differentiation paths of individual cells within a population, can be performed on processed single-cell ATAC-seq data using tools developed for single-cell RNA-seq data. These approaches aim to reconstruct the temporal progression and identify the key intermediate states or cell fate decisions during biological processes such as embryonic development, tissue regeneration, or disease progression.</p>
 <p>Trajectory inference algorithms, such as:</p>
 <ul>
-<li><a href="https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/">Monocle</a> <span class="citation">Qiu et al. (<a href="#ref-Qiu2017" role="doc-biblioref">2017</a>)</span></li>
-<li><a href="https://github.com/dpeerlab/Palantir">Palantir</a> <span class="citation">Setty et al. (<a href="#ref-Setty2019" role="doc-biblioref">2019</a>)</span></li>
-<li><a href="https://github.com/theislab/paga">PAGA</a> <span class="citation">Wolf et al. (<a href="#ref-Wolf2019" role="doc-biblioref">2019</a>)</span></li>
+<li><a href="https://cole-trapnell-lab.github.io/monocle3/docs/trajectories/">Monocle</a> <span class="citation">Qiu et al. (<a href="#ref-Qiu2017">2017</a>)</span></li>
+<li><a href="https://github.com/dpeerlab/Palantir">Palantir</a> <span class="citation">Setty et al. (<a href="#ref-Setty2019">2019</a>)</span></li>
+<li><a href="https://github.com/theislab/paga">PAGA</a> <span class="citation">Wolf et al. (<a href="#ref-Wolf2019">2019</a>)</span></li>
 </ul>
 <p>These are commonly used to reconstruct the developmental trajectories and order the cells along these trajectories. The resulting trajectory models provide valuable insights into the underlying regulatory dynamics, lineage relationships, and critical regulatory genes or pathways governing cellular differentiation and development.</p>
 <p>Much like peak calling, it is not possible to obtain enough information from individual cells to perform differential accessibility analysis at the single cell level. Because of this limitation, differential accessibility analysis is performed in a similar manner to bulk ATAC-seq analysis using pseudo-bulk data at the cluster or cell type level, where counts from many single cells are aggregated together and treated as though they are a single sample generated from a bulk experiment. Common tools for differential accessibility analysis include deSeq2 and EdgeR, which were both developed for differential gene expression analysis.</p>
 </div>
-<div id="scatac-seq-data-strengths" class="section level2" number="17.9">
-<h2><span class="header-section-number">17.9</span> scATAC-seq data <strong>strengths</strong>:</h2>
+<div id="scatac-seq-data-strengths" class="section level2 hasAnchor" number="17.9">
+<h2><span class="header-section-number">17.9</span> scATAC-seq data <strong>strengths</strong>:<a href="single-cell-atac-seq-1.html#scatac-seq-data-strengths" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>scATAC-seq is the gold-standard for showing heterogeneity in chromatin accessibility between populations of cells and within tissues because single-cell resolution enables analysis of subpopulations that are challenging to isolate experimentally.</li>
 <li>scATAC-seq can be paired with scRNAseq to obtain transcriptome and chromatin accessibility measurements from the same cells. This is a powerful approach for gaining understanding of how specific patterns of chromatin accessibility affect gene expression.</li>
 <li>scATAC-seq is also a relatively high throughput technique, particularly with droplet based techniques. A single dataset can cover thousands of cells.</li>
 </ul>
 </div>
-<div id="scatac-seq-data-limitations" class="section level2" number="17.10">
-<h2><span class="header-section-number">17.10</span> scATAC-seq data <strong>limitations</strong>:</h2>
+<div id="scatac-seq-data-limitations" class="section level2 hasAnchor" number="17.10">
+<h2><span class="header-section-number">17.10</span> scATAC-seq data <strong>limitations</strong>:<a href="single-cell-atac-seq-1.html#scatac-seq-data-limitations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li>scATAC-seq has very high sparsity compared to single-cell RNA-seq since there are only two copies of each locus in a diploid cell compared to many copies of mRNAs. Like other single-cell techniques This results in the data essentially being binary at the single cell level - a region either has reads and is considered accessible in that cell or has no reads.</li>
 <li>Like bulk ATAC-seq, the Tn5 transposase has a sequence bias, so regions with a preferred sequence will undergo higher levels of transposition. Highly accessible regions of DNA will also be overrepresented in the final library.</li>
@@ -611,13 +605,13 @@ <h2><span class="header-section-number">17.10</span> scATAC-seq data <strong>lim
 <li>Many scATAC-seq datasets have low cell numbers due to the cost and technical difficulty of the assay. This presents a challenge for analysis since the data is highly sparse and noisy, which in combination with a small dataset can lead to difficulty interpreting the data.</li>
 </ul>
 </div>
-<div id="scatac-seq-data-considerations" class="section level2" number="17.11">
-<h2><span class="header-section-number">17.11</span> scATAC-seq data considerations</h2>
+<div id="scatac-seq-data-considerations" class="section level2 hasAnchor" number="17.11">
+<h2><span class="header-section-number">17.11</span> scATAC-seq data considerations<a href="single-cell-atac-seq-1.html#scatac-seq-data-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>scATAC-seq will always be sequenced with paired-end reads. There are two major experimental approaches for generating single-cell ATAC-seq data: droplet based methods, such as the commercially available <a href="https://www.10xgenomics.com/products/single-cell-atac">10X Chromium platform</a>, where nuclei are separated into individual droplets, and plate-based methods, which use multiple pooling and barcoding steps to tag each cell with a unique combination of barcodes (with a level of expected barcode collisions).</p>
 <p>The procedure for demultiplexing the reads will depend on the method used to generate the data. Data generated using 10X platforms can be de-multiplexed and aligned using the Cell Ranger software, while plate-based approaches typically use an alignment and peak-calling approach similar to that used for bulk ATAC-seq, with the additional step of matching the barcodes in each read to the known set of combinatorial barcodes. Correctly matching the reads to cells and filtering reads with non-matching barcodes is a critical step for scATAC-seq analysis.</p>
 </div>
-<div id="scatac-seq-analysis-tools" class="section level2" number="17.12">
-<h2><span class="header-section-number">17.12</span> scATAC-seq analysis tools</h2>
+<div id="scatac-seq-analysis-tools" class="section level2 hasAnchor" number="17.12">
+<h2><span class="header-section-number">17.12</span> scATAC-seq analysis tools<a href="single-cell-atac-seq-1.html#scatac-seq-analysis-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger">Cellranger</a> is a popular preprocessing tool specifically designed for scATAC-seq data generated using the 10x Genomics platform. It performs essential steps such as demultiplexing, barcode processing, read alignment, and filtering, providing a streamlined workflow for 10x-generated scATAC-seq data. However, it cannot be used for data generated by other methods.</li>
 <li><a href="https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml">Bowtie2</a>, <a href="https://broadinstitute.github.io/picard/">Picard tools</a>, and <a href="http://www.usadellab.org/cms/?page=trimmomatic">Trimmomatic</a>: These tools are commonly used for preprocessing scATAC-seq data generated using plate-based or combinatorial indexing approaches. Bowtie is a fast and widely used aligner for mapping sequencing reads to a reference genome, while Picard provides a suite of command-line tools for manipulating and analyzing BAM files and Trimmomatic can remove adapter sequences from reads. These tools can be utilized for aligning reads, removing duplicates, sorting, and filtering the data to obtain the necessary inputs for downstream analysis.</li>
@@ -627,31 +621,31 @@ <h2><span class="header-section-number">17.12</span> scATAC-seq analysis tools</
 <li><a href="https://stuartlab.org/signac/">Signac</a> is an R package specifically designed for the analysis of single-cell epigenomics data, including scATAC-seq. It offers a comprehensive set of functions for preprocessing, quality control, dimensionality reduction, clustering, trajectory analysis, differential accessibility, and visualization. Signac integrates well with Seurat, providing an additional tool for exploring and analyzing scATAC-seq data.</li>
 </ul>
 <p>Additional quality checking tools: Quality checking and filtering steps in scATAC-seq analysis can be performed using various tools depending on the workflow and programming language. Some commonly used tools with QC capabilities useful for examining library quality measures such as GC bias, overrepresented sequences, and quality scores include FastQC and deepTools.</p>
-<div id="doublet-detection" class="section level4" number="17.12.0.1">
-<h4><span class="header-section-number">17.12.0.1</span> Doublet detection</h4>
+<div id="doublet-detection" class="section level4 hasAnchor" number="17.12.0.1">
+<h4><span class="header-section-number">17.12.0.1</span> Doublet detection<a href="single-cell-atac-seq-1.html#doublet-detection" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p><a href="https://www.archrproject.com/">ArchR</a> has a tool for doublet detection - it generates synthetic doublets from combinations of cells in the dataset and uses the similarity of cells in the dataset to these synthetic doublets to identify doublets. This is a common approach, and variations of it are used by most doublet detection algorithms. Many are specifically designed to expect transcriptomic data (such as the commonly used Scrublet) and identify barcodes with mixed transcriptional signatures of multiple clusters/cell types, and these methods do not accept scATAC-seq input. Some transcription based tools can be given modified input to detect doublets in scATAC-seq data, as described in documentation from the Demuxafy project. There are also tools like AMULET which leverage the fact that the number of ATAC-seq reads at any locus in a single cell are limited by the number of copies of a chromosome to detect doublets. Overall, doublet detection is not as common of a step in scATAC-seq analysis as it is in scRNA-seq analysis, owing to the limited tools available and the difficulty of performing this analysis on extremely sparse data.</p>
 </div>
-<div id="visualization" class="section level4" number="17.12.0.2">
-<h4><span class="header-section-number">17.12.0.2</span> Visualization</h4>
+<div id="visualization" class="section level4 hasAnchor" number="17.12.0.2">
+<h4><span class="header-section-number">17.12.0.2</span> Visualization<a href="single-cell-atac-seq-1.html#visualization" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <p><a href="https://scanpy.readthedocs.io/en/stable/">Scanpy</a> (Python) and <a href="https://satijalab.org/seurat/">Seurat</a> (R) are the most commonly used tools for visualizing scATAC-seq data. These tools allow you to plot the accessibility of specific peaks or gene scores, as well as metadata such as cell type, clusters, etc. on the UMAP (or other) embedding at the single-cell level. Both packages include built-in functions to perform this plotting in a streamlined manner and to manipulate the data objects for additional quantification and visualization using general plotting packages such as <a href="https://matplotlib.org/">matplotlib</a> or <a href="https://ggplot2.tidyverse.org/">ggplot</a>. The choice between these tools is primarily determined by the programming language you choose for your analysis, as they share many of the same core features.
 Additionally, tools such as deepTools or enrichedHeatmap may be useful for visualizing heatmaps of pseudo-bulk data, and bedGraph or BigWig representations of pseudo-bulk data can be visualized using genome browsers such as IGV or UCSC genome browser. pyGenomeBrowser is a package which allows more customizable visualization of browser tracks and may be useful for generating publication-quality figures.</p>
 </div>
 </div>
-<div id="trajectory-analysis" class="section level2" number="17.13">
-<h2><span class="header-section-number">17.13</span> Trajectory analysis</h2>
+<div id="trajectory-analysis" class="section level2 hasAnchor" number="17.13">
+<h2><span class="header-section-number">17.13</span> Trajectory analysis<a href="single-cell-atac-seq-1.html#trajectory-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Several tools are available for single-cell trajectory analysis. These approaches are primarily distinguished by variations used in their mathematical approaches for calculating trajectories, but most make use of graph-based approaches which model the similarity or connections between cells in a dataset. The distinct approaches of the tools discussed here lead to varying levels of performance on different types of data, and extensive benchmarking has been performed (here) and (here) on synthetic datasets to determine the accuracy of different approaches. The most important consideration here is whether there are any cyclic trajectories expected in the dataset, where the end of the trajectory would connect back to the start, or disconnected trajectories, where not all trajectories originate from the same starting state. Not all approaches can reconstruct these trajectories accurately. Most popular methods expect a tree-like structure, with a single starting point and branches which lead toward terminal cell fates.</p>
 <p><a href="http://cole-trapnell-lab.github.io/monocle-release/">Monocle</a> is a popular choice that offers a comprehensive workflow for trajectory inference, visualization of trajectory analysis, pseudotime ordering of cells, and identification of differentially expressed genes along trajectories. Another commonly used tool is Slingshot, which utilizes a graph-based approach to infer trajectories, compute pseudotime ordering, and generate smooth curves to visualize trajectories. Additionally, it has the ability to infer multiple disconnected trajectories within a single dataset. PAGA (Partition-based Graph Abstraction) uses a distinct strategy with the goal of maintaining connections between similar groups of cells as well as the overall structure of the data. <a href="https://github.com/dpeerlab/Palantir">Palantir</a> is a tool which uses a probabilistic approach to assign cell fate probabilities to each cell in a dataset, which can be used to define cells belonging to a specific trajectory.</p>
 </div>
-<div id="motif-detection-ex.-chromvar" class="section level2" number="17.14">
-<h2><span class="header-section-number">17.14</span> Motif detection (ex. ChromVar)</h2>
+<div id="motif-detection-ex.-chromvar" class="section level2 hasAnchor" number="17.14">
+<h2><span class="header-section-number">17.14</span> Motif detection (ex. ChromVar)<a href="single-cell-atac-seq-1.html#motif-detection-ex.-chromvar" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://bioconductor.org/packages/release/bioc/html/chromVAR.html">Single-cell chromVAR analysis</a> is a computational approach used to assess cell-to-cell variation in chromatin accessibility profiles across a population of single cells. It aims to identify TF activity differences between cell types or states and elucidate the underlying regulatory dynamics. Single-cell chromVAR leverages the concept of TF motif enrichment or depletion within cell-specific accessible regions to infer TF activity. It compares the chromatin accessibility profiles of individual cells to a background model derived from the aggregate accessibility profiles of all cells, enabling the detection of cell-specific TF binding patterns. By quantifying the enrichment or depletion of TF motifs within accessible regions, single-cell chromVAR provides insights into TF activity variation, potential regulatory networks, and cell-type-specific transcriptional regulation. It serves as a valuable tool for understanding the contribution of TFs to cellular heterogeneity and regulatory processes in single-cell chromatin accessibility data.</p>
 </div>
-<div id="regulatory-network-detection" class="section level2" number="17.15">
-<h2><span class="header-section-number">17.15</span> Regulatory network detection</h2>
+<div id="regulatory-network-detection" class="section level2 hasAnchor" number="17.15">
+<h2><span class="header-section-number">17.15</span> Regulatory network detection<a href="single-cell-atac-seq-1.html#regulatory-network-detection" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p><a href="https://github.com/aertslab/cisTopic">CisTopic</a> is a computational tool used for the analysis of single-cell chromatin accessibility data to identify and characterize cell subpopulations with distinct regulatory patterns. It employs a topic modeling approach to capture the variability in chromatin accessibility profiles across cells and identifies the major regulatory patterns driving cell heterogeneity. CisTopic assigns cells to topics based on the similarity of their accessibility landscapes. By analyzing the differential accessibility of genomic regions within each topic, CisTopic facilitates the discovery of transcription factor binding motifs and CREs associated with specific cell subpopulations.</p>
 </div>
-<div id="tools-for-data-type-conversion" class="section level2" number="17.16">
-<h2><span class="header-section-number">17.16</span> Tools for data type conversion</h2>
+<div id="tools-for-data-type-conversion" class="section level2 hasAnchor" number="17.16">
+<h2><span class="header-section-number">17.16</span> Tools for data type conversion<a href="single-cell-atac-seq-1.html#tools-for-data-type-conversion" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>A comprehensive explanation of packages to convert between single-cell data object types used by Python and R packages is found here.</p>
 <p>The most common data types for processed scATAC-seq data are:</p>
 <ul>
@@ -661,8 +655,8 @@ <h2><span class="header-section-number">17.16</span> Tools for data type convers
 </ul>
 <p>H5seurat objects can be <a href="https://mojaveazure.github.io/seurat-disk/articles/convert-anndata.html">converted to annData objects using SeuratDisk</a>.</p>
 </div>
-<div id="more-resources-and-tutorials-about-scatac-seq-data" class="section level2" number="17.17">
-<h2><span class="header-section-number">17.17</span> More resources and tutorials about scATAC-seq data</h2>
+<div id="more-resources-and-tutorials-about-scatac-seq-data" class="section level2 hasAnchor" number="17.17">
+<h2><span class="header-section-number">17.17</span> More resources and tutorials about scATAC-seq data<a href="single-cell-atac-seq-1.html#more-resources-and-tutorials-about-scatac-seq-data" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://training.galaxyproject.org/training-material/topics/single-cell/tutorials/scatac-preprocessing-tenx/tutorial.html">Galaxy tutorial for sc-ATAC-seq analysis</a></li>
 <li><a href="https://stuartlab.org/signac/articles/pbmc_vignette.html">Signac scATAC-seq tutorial with pbmcs</a></li>
@@ -673,7 +667,7 @@ <h2><span class="header-section-number">17.17</span> More resources and tutorial
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-Qiu2017" class="csl-entry">
 Qiu, Xiaojie, Qi Mao, Ying Tang, Li Wang, Raghav Chawla, Hannah A. Pliner, and Cole Trapnell. 2017. <span>“Reversed Graph Embedding Resolves Complex Single-Cell Trajectories.”</span> <em>Nature Methods</em> 14 (10): 979–82. <a href="https://doi.org/10.1038/nmeth.4402">https://doi.org/10.1038/nmeth.4402</a>.
@@ -686,10 +680,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -759,7 +760,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/single-cell-rna-seq.html b/docs/no_toc/single-cell-rna-seq.html
index 4d96054e..2ad335fc 100644
--- a/docs/no_toc/single-cell-rna-seq.html
+++ b/docs/no_toc/single-cell-rna-seq.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 13 Single-cell RNA-seq | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 13 Single-cell RNA-seq | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="bulk-rna-seq-1.html"/>
 <link rel="next" href="spatial-transcriptomics-1.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 13 Single-cell RNA-seq | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,81 +535,81 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="single-cell-rna-seq" class="section level1" number="13">
-<h1><span class="header-section-number">Chapter 13</span> Single-cell RNA-seq</h1>
+<div id="single-cell-rna-seq" class="section level1 hasAnchor" number="13">
+<h1><span class="header-section-number">Chapter 13</span> Single-cell RNA-seq<a href="single-cell-rna-seq.html#single-cell-rna-seq" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-11" class="section level2" number="13.1">
-<h2><span class="header-section-number">13.1</span> Learning Objectives</h2>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png" title="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" alt="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" width="100%" /></p>
+<div id="learning-objectives-11" class="section level2 hasAnchor" number="13.1">
+<h2><span class="header-section-number">13.1</span> Learning Objectives<a href="single-cell-rna-seq.html#learning-objectives-11" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_1.png" alt="This chapter will demonstrate how to: Understand the basics of single cell RNA-Seq data collection and processing workflow. Identify the next steps for your particular single cell RNA-seq data. Formulate questions to ask about your single cell RNA-seq data" width="100%" /></p>
 </div>
-<div id="where-single-cell-rna-seq-data-comes-from" class="section level2" number="13.2">
-<h2><span class="header-section-number">13.2</span> Where single-cell RNA-seq data comes from</h2>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_6.png" title="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" alt="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" width="100%" /></p>
+<div id="where-single-cell-rna-seq-data-comes-from" class="section level2 hasAnchor" number="13.2">
+<h2><span class="header-section-number">13.2</span> Where single-cell RNA-seq data comes from<a href="single-cell-rna-seq.html#where-single-cell-rna-seq-data-comes-from" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_6.png" alt="As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity" width="100%" /></p>
 <p>As opposed to bulk RNA-seq which can only tell us about tissue level and within patient variation, single-cell RNA-seq is able to tell us cell to cell variation in transcriptomics including intra-tumor heterogeneity.</p>
 <p>Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity. If your research questions require cell-level transcriptional information, single-cell RNA-seq will on interest to you.</p>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_11.png" title="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." alt="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." width="100%" /></p>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_11.png" alt="Single cell RNA-seq can give us cell level transcriptional profiles. Whereas bulk RNA-seq masks cell to cell heterogeneity." width="100%" /></p>
 </div>
-<div id="single-cell-rna-seq-data-types" class="section level2" number="13.3">
-<h2><span class="header-section-number">13.3</span> Single-cell RNA-seq data types</h2>
+<div id="single-cell-rna-seq-data-types" class="section level2 hasAnchor" number="13.3">
+<h2><span class="header-section-number">13.3</span> Single-cell RNA-seq data types<a href="single-cell-rna-seq.html#single-cell-rna-seq-data-types" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>There are broadly two categories of single-cell RNA-seq data methods we will discuss.</p>
 <ul>
 <li><strong>Full length RNA-seq</strong>: Individual cells are physically separated and then sequenced.</li>
 <li><strong>Tag Based RNA-seq</strong>: Individual cells are tagged with a barcode and their data is separated computationally.</li>
 </ul>
 <p>Depending on your goals for your single cell RNA-seq analysis, you may want to choose one method over the other.</p>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_25.png" title="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." alt="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." width="100%" /></p>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_30.png" title="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" /></p>
-<p>(Material borrowed from <span class="citation">(<a href="#ref-AlexsLemonade2022" role="doc-biblioref"><span>“Alex’s Lemonade Training Modules”</span> 2022</a>)</span>).</p>
-<div id="unique-molecular-identifiers" class="section level3" number="13.3.1">
-<h3><span class="header-section-number">13.3.1</span> Unique Molecular identifiers</h3>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_25.png" alt="Full length single cell RNA-seq **Pros**: Can be paired end sequencing which has less 3' bias. More complete coverage of transcripts which may be better for transcript discovery purposes. Cons: Is not very efficient (96 wells per plate). Takes longer to run days/weeks depending on the sample size. Expensive." width="100%" /></p>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_30.png" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" /></p>
+<p>(Material borrowed from <span class="citation">(<a href="#ref-AlexsLemonade2022"><span>“Alex’s Lemonade Training Modules”</span> 2022</a>)</span>).</p>
+<div id="unique-molecular-identifiers" class="section level3 hasAnchor" number="13.3.1">
+<h3><span class="header-section-number">13.3.1</span> Unique Molecular identifiers<a href="single-cell-rna-seq.html#unique-molecular-identifiers" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>Often Tag based single cell RNA-seq methods will include not only a cell barcode for cell identification but will also have a unique molecular identifier (UMI) for original molecule identification. The idea behind the UMIs is it is a way to have insight into the original snapshot of the cell and potentially combat PCR amplification biases.</p>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_41.png" title="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" /></p>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g15bed4cad37_396_41.png" alt="Tag based single cell RNA-seq. Pros: Can profile up to millions of cells. Takes less computing power. File storage requirements are smaller. Much less expensive. Cons: More intense 3' bias. Coverage is not as deep as full length single cell RNA-seq" width="100%" /></p>
 </div>
 </div>
-<div id="single-cell-rna-seq-tools" class="section level2" number="13.4">
-<h2><span class="header-section-number">13.4</span> Single cell RNA-seq tools</h2>
+<div id="single-cell-rna-seq-tools" class="section level2 hasAnchor" number="13.4">
+<h2><span class="header-section-number">13.4</span> Single cell RNA-seq tools<a href="single-cell-rna-seq.html#single-cell-rna-seq-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>There are a lot of scRNA-seq tools for various steps along the way.</p>
-<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png" title="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." alt="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." width="100%" /></p>
+<p><img src="resources/images/10b-single-cell-RNA-seq_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g161687fdf93_0_0.png" alt="In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting duplets, and using this information to filter out data that is not trustworthy. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses." width="100%" /></p>
 <p>In a very general sense, single cell RNA-seq workflows involves first quantification/alignment. You will also need to conduct quality control steps that may involve using UMIs to check for what’s detected, detecting doublets (also known as duplets), and using this information to filter out data that is not trustworthy. <a href="https://bioconductor.org/books/3.15/OSCA.advanced/doublet-detection.html">Doublets are transcriptome data generated from two cells</a>, and an undesired technical artifact when single cell RNA-seq workflows want data representing a single cell at a time. After you have a set of reliable data, you need to normalize your data. Single cell data is highly skewed - a lot of genes barely or not detected and a few genes that are detected a lot. After data has been normalized you are ready to conduct your downstream analyses. This will be highly dependent on the original goals and questions of your experiment. It may include dimension reduction, cell classification, differential expression, detecting cell trajectories or any number of other analyses.</p>
 <p>Each step of this very general representation of a workflow can be conducted by a variety of tools. We will highlight some of the more popular tools here. But, to look through a full list, you can consult the <a href="https://www.scrna-tools.org/table">scRNA-tools website</a>.</p>
 </div>
-<div id="quantification-and-alignment-tools" class="section level2" number="13.5">
-<h2><span class="header-section-number">13.5</span> Quantification and alignment tools</h2>
+<div id="quantification-and-alignment-tools" class="section level2 hasAnchor" number="13.5">
+<h2><span class="header-section-number">13.5</span> Quantification and alignment tools<a href="single-cell-rna-seq.html#quantification-and-alignment-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <div class="warning">
 <p>This following pros and cons sections have been written by AI and may need verification by experts. This is meant to give you a basic idea of the pros and cons of these tools but should ultimately be used with your own judgment.</p>
 </div>
 <ul>
-<li><a href="https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html">STAR</a> <span class="citation">(<a href="#ref-dobin2013star" role="doc-biblioref">Dobin et al. 2013</a>)</span>:
+<li><a href="https://hbctraining.github.io/Intro-to-rnaseq-hpc-O2/lessons/03_alignment.html">STAR</a> <span class="citation">(<a href="#ref-dobin2013star">Dobin et al. 2013</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Accurate alignment of RNA-seq reads to the genome. Can handle a wide range of RNA-seq protocols, including scRNA-seq. Provides read counts and gene-level expression values.</li>
 <li><strong>Cons</strong>: Requires a significant amount of memory and computational resources. May be difficult to set up and run for beginners.</li>
 </ul></li>
-<li><a href="http://daehwankimlab.github.io/hisat2/">HISAT2</a> <span class="citation">(<a href="#ref-kim2015hisat" role="doc-biblioref">Kim, Langmead, and Salzberg 2015</a>)</span>:
+<li><a href="http://daehwankimlab.github.io/hisat2/">HISAT2</a> <span class="citation">(<a href="#ref-kim2015hisat">Kim, Langmead, and Salzberg 2015</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Accurate alignment of RNA-seq reads to the genome. Provides transcript-level expression values. Supports splice-aware alignment.</li>
 <li><strong>Cons</strong>: May require significant computational resources for large datasets. May not be as accurate as some other alignment tools.</li>
 </ul></li>
-<li><a href="https://www.kallistobus.tools/">Kallisto bustools</a> <span class="citation">(<a href="#ref-bray2016near" role="doc-biblioref">Bray et al. 2016</a>)</span>:
+<li><a href="https://www.kallistobus.tools/">Kallisto bustools</a> <span class="citation">(<a href="#ref-bray2016near">Bray et al. 2016</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Requires less memory and computational resources than alignment-based methods.</li>
 <li><strong>Cons</strong>: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates.</li>
 </ul></li>
-<li><a href="https://salmon.readthedocs.io/en/latest/alevin.html">Alevin/Salmon</a> <span class="citation">(<a href="#ref-patro2017salmon" role="doc-biblioref">Patro et al. 2017</a>)</span>:
+<li><a href="https://salmon.readthedocs.io/en/latest/alevin.html">Alevin/Salmon</a> <span class="citation">(<a href="#ref-patro2017salmon">Patro et al. 2017</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Fast and accurate quantification of RNA-seq reads without the need for alignment. Provides transcript-level expression values. Supports both single-end and paired-end sequencing.</li>
 <li><strong>Cons</strong>: May not be as accurate as alignment-based methods for lowly expressed genes. Cannot provide allele-specific expression estimates.</li>
 </ul></li>
-<li><a href="https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger">Cell Ranger</a> <span class="citation">(<a href="#ref-zheng2017massively" role="doc-biblioref">Zheng et al. 2017</a>)</span>:
+<li><a href="https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger">Cell Ranger</a> <span class="citation">(<a href="#ref-zheng2017massively">Zheng et al. 2017</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Specifically designed for 10x Genomics scRNA-seq data, with optimized workflows for alignment and quantification. Provides read counts and gene-level expression values. Offers a streamlined pipeline with minimal input from the user.</li>
 <li><strong>Cons</strong>: Limited options for customizing parameters or analysis methods. May not be suitable for datasets from other scRNA-seq platforms.</li>
 </ul></li>
 </ul>
 </div>
-<div id="downstream-tools-pros-and-cons" class="section level2" number="13.6">
-<h2><span class="header-section-number">13.6</span> Downstream tools Pros and Cons</h2>
+<div id="downstream-tools-pros-and-cons" class="section level2 hasAnchor" number="13.6">
+<h2><span class="header-section-number">13.6</span> Downstream tools Pros and Cons<a href="single-cell-rna-seq.html#downstream-tools-pros-and-cons" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://satijalab.org/seurat/">Seurat</a>:
 <ul>
@@ -638,20 +632,20 @@ <h2><span class="header-section-number">13.6</span> Downstream tools Pros and Co
 <li><strong>Cons</strong>: May not be as feature-rich for clustering or differential expression analysis as some other tools. Requires some knowledge of R programming language.</li>
 </ul></li>
 </ul>
-<div id="doublet-tool-pros-and-cons" class="section level3" number="13.6.1">
-<h3><span class="header-section-number">13.6.1</span> Doublet Tool Pros and Cons</h3>
+<div id="doublet-tool-pros-and-cons" class="section level3 hasAnchor" number="13.6.1">
+<h3><span class="header-section-number">13.6.1</span> Doublet Tool Pros and Cons<a href="single-cell-rna-seq.html#doublet-tool-pros-and-cons" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <ul>
-<li><a href="https://github.com/chris-mcginnis-ucsf/DoubletFinder">DoubletFinder</a><span class="citation">(<a href="#ref-mcginnis2020doubletfinder" role="doc-biblioref">McGinnis, Murrow, and Gartner 2020</a>)</span>:
+<li><a href="https://github.com/chris-mcginnis-ucsf/DoubletFinder">DoubletFinder</a><span class="citation">(<a href="#ref-mcginnis2020doubletfinder">McGinnis, Murrow, and Gartner 2020</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Uses a machine learning approach to detect doublets based on transcriptome similarity. Can be used with a variety of scRNA-seq platforms. Offers a user-friendly interface and extensive documentation.</li>
 <li><strong>Cons</strong>: Can be computationally intensive for large datasets. May require some knowledge of R programming language.</li>
 </ul></li>
-<li><a href="https://github.com/swolock/scrublet">Scrublet</a> <span class="citation">(<a href="#ref-wolock2019scrublet" role="doc-biblioref">Wolock, Krishnaswamy, and Huang 2019</a>)</span>:
+<li><a href="https://github.com/swolock/scrublet">Scrublet</a> <span class="citation">(<a href="#ref-wolock2019scrublet">Wolock, Krishnaswamy, and Huang 2019</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Uses a density-based approach to detect doublets based on barcode sharing. Fast and computationally efficient, making it suitable for large datasets. Offers a user-friendly interface and extensive documentation.</li>
 <li><strong>Cons</strong>:May not be as accurate as other methods, especially for low-quality data. Limited to 10x Genomics data.</li>
 </ul></li>
-<li><a href="https://github.com/EDePasquale/DoubletDecon">DoubletDecon</a> <span class="citation">(<a href="#ref-de2019doubletdecon" role="doc-biblioref">De Pasquale and Dudoit 2019</a>)</span>:
+<li><a href="https://github.com/EDePasquale/DoubletDecon">DoubletDecon</a> <span class="citation">(<a href="#ref-de2019doubletdecon">De Pasquale and Dudoit 2019</a>)</span>:
 <ul>
 <li><strong>Pros</strong>: Uses a statistical approach to identify doublets based on the distribution of the number of unique molecular identifiers (UMIs) per cell. Can be used with different platforms and species. Offers a user-friendly interface and extensive documentation.</li>
 <li><strong>Cons</strong>: May not be as accurate as other methods, especially for data with low sequencing depth or low cell numbers. Requires some knowledge of R programming language.</li>
@@ -660,8 +654,8 @@ <h3><span class="header-section-number">13.6.1</span> Doublet Tool Pros and Cons
 <p>It’s important to note that no doublet detection method is perfect, and it’s often a good idea to combine multiple methods to increase the accuracy of doublet identification. Additionally, manual inspection of the data is always recommended to confirm the presence or absence of doublets.</p>
 </div>
 </div>
-<div id="more-scrna-seq-tools-and-tutorials" class="section level2" number="13.7">
-<h2><span class="header-section-number">13.7</span> More scRNA-seq tools and tutorials</h2>
+<div id="more-scrna-seq-tools-and-tutorials" class="section level2 hasAnchor" number="13.7">
+<h2><span class="header-section-number">13.7</span> More scRNA-seq tools and tutorials<a href="single-cell-rna-seq.html#more-scrna-seq-tools-and-tutorials" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://bioconductor.org/packages/release/bioc/html/alevinQC.html">AlevinQC</a></li>
 <li><a href="https://notebook.genepattern.org/single-cell/">Gene Pattern’s single cell RNA-seq tutorials</a> - an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data.</li>
@@ -670,16 +664,16 @@ <h2><span class="header-section-number">13.7</span> More scRNA-seq tools and tut
 <li><a href="https://people.math.umass.edu/~aronow/TumorDecon">TumorDecon</a> can be used to generate customized signature matrices from single-cell RNA-sequence profiles. It is available on Github (<a href="https://github.com/ShahriyariLab/TumorDecon" class="uri">https://github.com/ShahriyariLab/TumorDecon</a>) and PyPI (<a href="https://pypi.org/project/TumorDecon/" class="uri">https://pypi.org/project/TumorDecon/</a>).</li>
 </ul>
 </div>
-<div id="visualization-gui-tools-1" class="section level2" number="13.8">
-<h2><span class="header-section-number">13.8</span> Visualization GUI tools</h2>
+<div id="visualization-gui-tools-1" class="section level2 hasAnchor" number="13.8">
+<h2><span class="header-section-number">13.8</span> Visualization GUI tools<a href="single-cell-rna-seq.html#visualization-gui-tools-1" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://webmev.tm4.org">WebMeV</a> uniquely provides a user-friendly, intuitive, interactive interface to processed analytical data uses cloud-computing elasticity for computationally intensive analyses and is compatible with single cell or bulk RNA-seq input data.</li>
 <li><a href="http://xena.ucsc.edu/">UCSC Xena</a> is a web-based visualization tool for multi-omic data and associated clinical and phenotypic annotations. It can be used with single cell RNA-seq data.</li>
 <li><a href="https://software.broadinstitute.org/software/igv/">Integrative Genomics Viewer (IGV)</a> is a track-based browser for interactively exploring genomic data mapped to a reference genome.</li>
 </ul>
 </div>
-<div id="useful-tutorials" class="section level2" number="13.9">
-<h2><span class="header-section-number">13.9</span> Useful tutorials</h2>
+<div id="useful-tutorials" class="section level2 hasAnchor" number="13.9">
+<h2><span class="header-section-number">13.9</span> Useful tutorials<a href="single-cell-rna-seq.html#useful-tutorials" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>These tutorials cover explicit steps, code, tool recommendations and other considerations for analyzing RNA-seq data.</p>
 <ul>
 <li><a href="http://bioconductor.org/books/3.15/OSCA.intro/">Orchestrating Single Cell Analysis with Bioconductor</a> - An excellent tutorial for processing single cell data using Bioconductor.</li>
@@ -690,24 +684,24 @@ <h2><span class="header-section-number">13.9</span> Useful tutorials</h2>
 <li><a href="https://swaruplab.bio.uci.edu/tutorial/cellranger/cellranger-rna.html">Processing raw 10X Genomics single-cell RNA-seq data (with cellranger)</a> - a tutorial based on using CellRanger.</li>
 </ul>
 </div>
-<div id="useful-readings" class="section level2" number="13.10">
-<h2><span class="header-section-number">13.10</span> Useful readings</h2>
-<ul>
-<li><a href="https://doi.org/10.1016/j.omtm.2018.07.003">An Introduction to the Analysis of Single-Cell RNA-Sequencing Data</a> <span class="citation">(<a href="#ref-Aljanahi2018" role="doc-biblioref">AlJanahi, Danielsen, and Dunbar 2018</a>)</span>.</li>
-<li><a href="https://www.nature.com/articles/s41592-019-0654-x">Orchestrating single-cell analysis with Bioconductor</a> <span class="citation">(<a href="#ref-Amezquita2020" role="doc-biblioref">Amezquita et al. 2020</a>)</span>.</li>
-<li><a href="https://cgatoxford.wordpress.com/2015/08/14/unique-molecular-identifiers-the-problem-the-solution-and-the-proof/">UMIs the problem, the solution and the proof</a> <span class="citation">(<a href="#ref-Smith2015" role="doc-biblioref">Smith 2015</a>)</span>.</li>
-<li><a href="https://doi.org/10.1093/bfgp/elx035">Experimental design for single-cell RNA sequencing</a> <span class="citation">(<a href="#ref-BaranGale2018" role="doc-biblioref">Baran-Gale, Chandra, and Kirschner 2018</a>)</span>.</li>
-<li><a href="https://doi.org/10.1038/s41596-018-0073-y">Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies</a> <span class="citation">(<a href="#ref-Lafzi2018" role="doc-biblioref">Lafzi et al. 2018</a>)</span>.</li>
-<li><a href="http://dx.doi.org/10.1016/j.molcel.2017.01.023">Comparative Analysis of Single-Cell RNA Sequencing Methods</a> <span class="citation">(<a href="#ref-Ziegenhain2017" role="doc-biblioref">Ziegenhain et al. 2017</a>)</span>.</li>
-<li><a href="https://doi.org/10.1016/j.molcel.2018.10.020">Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq Systems</a> <span class="citation">(<a href="#ref-Zhang2019" role="doc-biblioref">X. Zhang et al. 2019</a>)</span>.</li>
-<li><a href="http://dx.doi.org/10.1016/j.coisb.2017.07.004">Single cells make big data: New challenges and opportunities in transcriptomics</a> <span class="citation">(<a href="#ref-Angerer2017" role="doc-biblioref">Angerer et al. 2017</a>)</span>.</li>
-<li><a href="https://www.biorxiv.org/content/10.1101/2021.02.15.430948v2">Comparative Analysis of common alignment tools for single cell RNA sequencing</a> <span class="citation">(<a href="#ref-Bruning2021" role="doc-biblioref">Brüning et al. 2021</a>)</span>.</li>
-<li><a href="https://doi.org/10.15252/msb.20188746">Current best practices in single-cell RNA-seq analysis: a tutorial</a> <span class="citation">(<a href="#ref-Luecken2019" role="doc-biblioref">Luecken and Theis 2019</a>)</span>.</li>
+<div id="useful-readings" class="section level2 hasAnchor" number="13.10">
+<h2><span class="header-section-number">13.10</span> Useful readings<a href="single-cell-rna-seq.html#useful-readings" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<ul>
+<li><a href="https://doi.org/10.1016/j.omtm.2018.07.003">An Introduction to the Analysis of Single-Cell RNA-Sequencing Data</a> <span class="citation">(<a href="#ref-Aljanahi2018">AlJanahi, Danielsen, and Dunbar 2018</a>)</span>.</li>
+<li><a href="https://www.nature.com/articles/s41592-019-0654-x">Orchestrating single-cell analysis with Bioconductor</a> <span class="citation">(<a href="#ref-Amezquita2020">Amezquita et al. 2020</a>)</span>.</li>
+<li><a href="https://cgatoxford.wordpress.com/2015/08/14/unique-molecular-identifiers-the-problem-the-solution-and-the-proof/">UMIs the problem, the solution and the proof</a> <span class="citation">(<a href="#ref-Smith2015">Smith 2015</a>)</span>.</li>
+<li><a href="https://doi.org/10.1093/bfgp/elx035">Experimental design for single-cell RNA sequencing</a> <span class="citation">(<a href="#ref-BaranGale2018">Baran-Gale, Chandra, and Kirschner 2018</a>)</span>.</li>
+<li><a href="https://doi.org/10.1038/s41596-018-0073-y">Tutorial: guidelines for the experimental design of single-cell RNA sequencing studies</a> <span class="citation">(<a href="#ref-Lafzi2018">Lafzi et al. 2018</a>)</span>.</li>
+<li><a href="http://dx.doi.org/10.1016/j.molcel.2017.01.023">Comparative Analysis of Single-Cell RNA Sequencing Methods</a> <span class="citation">(<a href="#ref-Ziegenhain2017">Ziegenhain et al. 2017</a>)</span>.</li>
+<li><a href="https://doi.org/10.1016/j.molcel.2018.10.020">Comparative Analysis of Droplet-Based Ultra-High-Throughput Single-Cell RNA-Seq Systems</a> <span class="citation">(<a href="#ref-Zhang2019">X. Zhang et al. 2019</a>)</span>.</li>
+<li><a href="http://dx.doi.org/10.1016/j.coisb.2017.07.004">Single cells make big data: New challenges and opportunities in transcriptomics</a> <span class="citation">(<a href="#ref-Angerer2017">Angerer et al. 2017</a>)</span>.</li>
+<li><a href="https://www.biorxiv.org/content/10.1101/2021.02.15.430948v2">Comparative Analysis of common alignment tools for single cell RNA sequencing</a> <span class="citation">(<a href="#ref-Bruning2021">Brüning et al. 2021</a>)</span>.</li>
+<li><a href="https://doi.org/10.15252/msb.20188746">Current best practices in single-cell RNA-seq analysis: a tutorial</a> <span class="citation">(<a href="#ref-Luecken2019">Luecken and Theis 2019</a>)</span>.</li>
 </ul>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-AlexsLemonade2022" class="csl-entry">
 <span>“Alex’s Lemonade Training Modules.”</span> 2022. <a href="https://github.com/AlexsLemonade/training-modules">https://github.com/AlexsLemonade/training-modules</a>.
@@ -768,10 +762,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -841,7 +842,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/spatial-transcriptomics-1.html b/docs/no_toc/spatial-transcriptomics-1.html
index 0b971a8f..e25ef234 100644
--- a/docs/no_toc/spatial-transcriptomics-1.html
+++ b/docs/no_toc/spatial-transcriptomics-1.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 14 Spatial transcriptomics | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 14 Spatial transcriptomics | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="single-cell-rna-seq.html"/>
 <link rel="next" href="chromatin-methods-overview.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 14 Spatial transcriptomics | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,43 +535,40 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="spatial-transcriptomics-1" class="section level1" number="14">
-<h1><span class="header-section-number">Chapter 14</span> Spatial transcriptomics</h1>
-<div class="warning">
-<p>This chapter has currently been written by ChatGPT and has not been verified by experts. We need help writing and reviewing it! If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
-</div>
-<div id="learning-objectives-12" class="section level2" number="14.1">
-<h2><span class="header-section-number">14.1</span> Learning objectives</h2>
-<p><img src="resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png" title="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" alt="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" width="100%" /></p>
+<div id="spatial-transcriptomics-1" class="section level1 hasAnchor" number="14">
+<h1><span class="header-section-number">Chapter 14</span> Spatial transcriptomics<a href="spatial-transcriptomics-1.html#spatial-transcriptomics-1" class="anchor-section" aria-label="Anchor link to header"></a></h1>
+<div id="learning-objectives-12" class="section level2 hasAnchor" number="14.1">
+<h2><span class="header-section-number">14.1</span> Learning objectives<a href="spatial-transcriptomics-1.html#learning-objectives-12" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g258b14267ad_278_14.png" alt="This chapter will demonstrate how to: Approach collection of spatial transcriptomics data and design a typical analysis pipeline. Adjust your analysis pipeline to the research question, opportunities, and limitations concerning you spatial transcriptomics project. Learn about the questions that can be addressed with spatial transcriptomics data" width="100%" /></p>
 </div>
-<div id="what-are-the-goals-of-spatial-transcriptomic-analysis" class="section level2" number="14.2">
-<h2><span class="header-section-number">14.2</span> What are the goals of spatial transcriptomic analysis?</h2>
-<p>Spatial transcriptomics (ST) technologies have been developed as a solution to the lack of spatial context in single cell transcriptomics (scRNA-seq) data <span class="citation">(<a href="#ref-rao2021exploring" role="doc-biblioref">Rao et al. 2021</a>; <a href="#ref-ospina2023primer" role="doc-biblioref">Ospina, Soupir, and Fridley 2023</a>)</span>. There is a diversity of ST methods, however all have in common two features: Multiple measurements of gene expression and the locations within the tissue where those gene expression measurements were taken. Data analysis of ST data requires integration of those two components, and it’s primary goal is to characterize gene expression patterns within the tissue or cellular context. The ability to quantify gene expression at different locations within the tissue is of tremendous value to understand the functional variation of different tissue regions, domains, or niches. It also places cell-cell communication in the context of cell neighborhoods, which ultimately facilitates a deeper understanding of cell and tissue biology, but also enables practical applications such as discovery of novel drug targets for complex diseases such as cancer <span class="citation">(<a href="#ref-dries2021advances" role="doc-biblioref">Dries et al. 2021</a>; <a href="#ref-williams2022introduction" role="doc-biblioref">Williams et al. 2022</a>)</span>. Following, are some of the specific goals that a study using ST could achieve:</p>
+<div id="what-are-the-goals-of-spatial-transcriptomic-analysis" class="section level2 hasAnchor" number="14.2">
+<h2><span class="header-section-number">14.2</span> What are the goals of spatial transcriptomic analysis?<a href="spatial-transcriptomics-1.html#what-are-the-goals-of-spatial-transcriptomic-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p>Spatial transcriptomics (ST) technologies have been developed as a solution to the lack of spatial context in single cell transcriptomics (scRNA-seq) data <span class="citation">(<a href="#ref-rao2021exploring">Rao et al. 2021</a>; <a href="#ref-ospina2023primer">Ospina, Soupir, and Fridley 2023</a>)</span>. There is a diversity of ST methods, however all have in common two features: Multiple measurements of gene expression and the locations within the tissue where those gene expression measurements were taken. Data analysis of ST data requires integration of those two components, and it’s primary goal is to characterize gene expression patterns within the tissue or cellular context. The ability to quantify gene expression at different locations within the tissue is of tremendous value to understand the functional variation of different tissue regions, domains, or niches. It also places cell-cell communication in the context of cell neighborhoods, which ultimately facilitates a deeper understanding of cell and tissue biology, but also enables practical applications such as discovery of novel drug targets for complex diseases such as cancer <span class="citation">(<a href="#ref-dries2021advances">Dries et al. 2021</a>; <a href="#ref-williams2022introduction">Williams et al. 2022</a>)</span>. Following, are some of the specific goals that a study using ST could achieve:</p>
 <ol style="list-style-type: decimal">
-<li><strong>Describe tissue-specific cellular neighborhoods of cell types and cell type sub-populations:</strong> Although scRNA-seq continues to be a powerful method to assign biological identities to a mixture of cells, integrated analysis of ST combined with scRNA-seq adds crucial information to cell phenotypes by describing the neighborhoods where cells occur <span class="citation">(<a href="#ref-longo2021integrating" role="doc-biblioref">Longo et al. 2021</a>)</span>. Many methods to phenotype ST data are available, with most of them relying on the availability of a curated (scRNA-seq) cell type reference. Once cell identities have been determined, clustering or spatial statistics can be applied to describe the composition of tissue niches or domains. The explosion of ST data has resulted on novel and comprehensive tissue- or disease-specific atlases, not only describing the cell types within organs, but also the functional cell-cell relationships that result from spatial organization (e.g., <span class="citation">Guilliams et al. (<a href="#ref-guilliams2022spatial" role="doc-biblioref">2022</a>)</span>; <span class="citation">Wu et al. (<a href="#ref-wu2021single" role="doc-biblioref">2021</a>)</span>).</li>
-<li><strong>Uncover spatially regulated biological processes:</strong> With ST data, there comes the ability to detect genes or gene pathways that are expressed in specific areas within tissues (i.e., spatially-restricted expression). Detecting genes with spatially-restricted expression is key to achieve further understanding of specific biological processes, such as tissue gradients, cell differentiation, or signaling pathways. For example, cancer researchers are now able to study signaling pathways restricted to the tumor-stroma interface <span class="citation">(<a href="#ref-hunter2021spatially" role="doc-biblioref">Hunter et al. 2021</a>)</span>, which could lead to the discovery of mechanisms representing cancer vulnerabilities resulting from interactions between the tumor and stroma cells.</li>
-<li><strong>Investigate cell-cell interactions:</strong> From basic to applied tissue biology research, the study of cell-cell interactions is of high interest, especially the interactions that occur via ligand-receptor pairs. The construction of comprehensive databases of ligand-receptor interactions has been possible due the large amounts of single-cell data sets produced by researchers. A major contribution of ST to the study of tissue biology is the addition of the spatial context to previously identified ligand-receptor interactions. Because single-cell RNA-seq requires physical separation of cells, current ligand-receptor databases represent hypotheses which ST can help to address by using models of spatial co-localization, enabling in-situ examination of cell-cell interactions and communication <span class="citation">(<a href="#ref-raredon2023comprehensive" role="doc-biblioref">Raredon et al. 2023</a>; <a href="#ref-wang2023promising" role="doc-biblioref">X. Wang, Almet, and Nie 2023</a>)</span>.</li>
-<li><strong>Integrate imaging data:</strong> Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., <span class="citation">Ravi et al. (<a href="#ref-ravi2022spatially" role="doc-biblioref">2022</a>)</span>). The opposite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., <span class="citation">Hu et al. (<a href="#ref-hu2021spagcn" role="doc-biblioref">2021</a>)</span>; <span class="citation">Xu et al. (<a href="#ref-xu2022deepst" role="doc-biblioref">2022</a>)</span>; <span class="citation">Tan et al. (<a href="#ref-tan2020spacell" role="doc-biblioref">2020</a>)</span>). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy.</li>
-<li><strong>Identify biomarkers and drug targets:</strong> The use of ST allows the exploration of tissue niche-specific expression patterns and gene pathway analysis. This exploration can lead to generation of hypotheses about potential biomarkers for specific tissue functions or disease states. Furthermore, the molecular interactions predicted using scRNA-seq (e.g., ligand-receptor), can now be put in context of the larger tissue architecture using ST data. The spatial context of these interactions will likely boost the identification of novel drug targets, as well as improved understanding of current therapies <span class="citation">(<a href="#ref-lyubetskaya2022assessment" role="doc-biblioref">Lyubetskaya et al. 2022</a>; <a href="#ref-zhang2022clinical" role="doc-biblioref">L. Zhang et al. 2022</a>)</span>.</li>
+<li><strong>Describe tissue-specific cellular neighborhoods of cell types and cell type sub-populations:</strong> Although scRNA-seq continues to be a powerful method to assign biological identities to a mixture of cells, integrated analysis of ST combined with scRNA-seq adds crucial information to cell phenotypes by describing the neighborhoods where cells occur <span class="citation">(<a href="#ref-longo2021integrating">Longo et al. 2021</a>)</span>. Many methods to phenotype ST data are available, with most of them relying on the availability of a curated (scRNA-seq) cell type reference. Once cell identities have been determined, clustering or spatial statistics can be applied to describe the composition of tissue niches or domains. The explosion of ST data has resulted on novel and comprehensive tissue- or disease-specific atlases, not only describing the cell types within organs, but also the functional cell-cell relationships that result from spatial organization (e.g., <span class="citation">Guilliams et al. (<a href="#ref-guilliams2022spatial">2022</a>)</span>; <span class="citation">Wu et al. (<a href="#ref-wu2021single">2021</a>)</span>).</li>
+<li><strong>Uncover spatially regulated biological processes:</strong> With ST data, there comes the ability to detect genes or gene pathways that are expressed in specific areas within tissues (i.e., spatially-restricted expression). Detecting genes with spatially-restricted expression is key to achieve further understanding of specific biological processes, such as tissue gradients, cell differentiation, or signaling pathways. For example, cancer researchers are now able to study signaling pathways restricted to the tumor-stroma interface <span class="citation">(<a href="#ref-hunter2021spatially">Hunter et al. 2021</a>)</span>, which could lead to the discovery of mechanisms representing cancer vulnerabilities resulting from interactions between the tumor and stroma cells.</li>
+<li><strong>Investigate cell-cell interactions:</strong> From basic to applied tissue biology research, the study of cell-cell interactions is of high interest, especially the interactions that occur via ligand-receptor pairs. The construction of comprehensive databases of ligand-receptor interactions has been possible due the large amounts of single-cell data sets produced by researchers. A major contribution of ST to the study of tissue biology is the addition of the spatial context to previously identified ligand-receptor interactions. Because single-cell RNA-seq requires physical separation of cells, current ligand-receptor databases represent hypotheses which ST can help to address by using models of spatial co-localization, enabling in-situ examination of cell-cell interactions and communication <span class="citation">(<a href="#ref-raredon2023comprehensive">Raredon et al. 2023</a>; <a href="#ref-wang2023promising">X. Wang, Almet, and Nie 2023</a>)</span>.</li>
+<li><strong>Integrate imaging data:</strong> Spatial transcriptomics data has enabled direct integration of gene expression measurements with digital images of the same (or adjacent) tissue. Improved molecular description and/or exploration of tissue niches or domains is now possible. One approach consists on differential expression of histopathology annotations done by an expert on tissue images (e.g., <span class="citation">Ravi et al. (<a href="#ref-ravi2022spatially">2022</a>)</span>). The opposite approach is possible, which uses unsupervised clustering of ST data assisted by color/intensity information derived from images. Machine learning for integration of ST and imaging data is an active area of development (e.g., <span class="citation">Hu et al. (<a href="#ref-hu2021spagcn">2021</a>)</span>; <span class="citation">Xu et al. (<a href="#ref-xu2022deepst">2022</a>)</span>; <span class="citation">Tan et al. (<a href="#ref-tan2020spacell">2020</a>)</span>). Furthermore, ST data findings can be qualitatively validated by assessing the approximate location of regions such as immune-infiltrated areas or damaged tissue, often resulting from inspection of fluorescence microscopy.</li>
+<li><strong>Identify biomarkers and drug targets:</strong> The use of ST allows the exploration of tissue niche-specific expression patterns and gene pathway analysis. This exploration can lead to generation of hypotheses about potential biomarkers for specific tissue functions or disease states. Furthermore, the molecular interactions predicted using scRNA-seq (e.g., ligand-receptor), can now be put in context of the larger tissue architecture using ST data. The spatial context of these interactions will likely boost the identification of novel drug targets, as well as improved understanding of current therapies <span class="citation">(<a href="#ref-lyubetskaya2022assessment">Lyubetskaya et al. 2022</a>; <a href="#ref-zhang2022clinical">L. Zhang et al. 2022</a>)</span>.</li>
 </ol>
 </div>
-<div id="overview-of-a-spatial-transcriptomics-workflow" class="section level2" number="14.3">
-<h2><span class="header-section-number">14.3</span> Overview of a spatial transcriptomics workflow</h2>
-<p>There is a large diversity in approaches to spatially profile tissues. Some ST technologies allow profiling at coarse cellular resolution, where regions of interest (ROIs) are usually identified by a pathologist. These ROIs may include tens of cells up to few hundreds (e.g., GeoMx <span class="citation">Bergholtz et al. (<a href="#ref-bergholtz2021best" role="doc-biblioref">2021</a>)</span>). Smaller ROI sizes can be found in other technologies such as Visium, where ROIs of 55uM of diameter (or “spots”) often contain no more than 10 cells (<a href="https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data" class="uri">https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data</a>). For finer cellular resolution, technologies such as MERFISH, SMI, or Xenium, among others, can measure gene expression at individual cells <span class="citation">(<a href="#ref-yue2023guidebook" role="doc-biblioref">Yue et al. 2023</a>)</span>. In general, there is a trade-off between the cellular resolution and molecular resolution, as the number of quantified genes and RNA molecules is lower in single-cell level spatial technologies compared to those at the ROI or spot level. In single-cell ST, often a panel of hundreds of genes is quantified, while in “mini-bulk” (ROI/spot) ST, it is possible to genes at the whole transcriptome level.</p>
-<p><img src="resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_461_0.png" title="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." alt="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." width="100%" /></p>
-<p>In addition to the differences in cellular and molecular, there are fundamental differences in the chemistry used to count the RNA transcripts in the tissue <span class="citation">(<a href="#ref-wang2021spatial" role="doc-biblioref">N. Wang et al. 2021</a>; <a href="#ref-yue2023guidebook" role="doc-biblioref">Yue et al. 2023</a>)</span>. Capture or hybridization of RNA followed by sequencing, or fluorescent imaging are two of the most common techniques used in ST methods. Because of large diversity in resolution and chemical procedures among ST technologies, data collection workflows are equally diverse. Finally, each study poses specific questions that cannot be addressed with traditional scRNA-seq pipelines, requiring customized workflows.</p>
+<div id="overview-of-a-spatial-transcriptomics-workflow" class="section level2 hasAnchor" number="14.3">
+<h2><span class="header-section-number">14.3</span> Overview of a spatial transcriptomics workflow<a href="spatial-transcriptomics-1.html#overview-of-a-spatial-transcriptomics-workflow" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p>There is a large diversity in approaches to spatially profile tissues. Some ST technologies allow profiling at coarse cellular resolution, where regions of interest (ROIs) are usually identified by a pathologist. These ROIs may include tens of cells up to few hundreds (e.g., GeoMx <span class="citation">Bergholtz et al. (<a href="#ref-bergholtz2021best">2021</a>)</span>). Smaller ROI sizes can be found in other technologies such as Visium, where ROIs of 55uM of diameter (or “spots”) often contain no more than 10 cells (<a href="https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data" class="uri">https://www.10xgenomics.com/resources/analysis-guides/integrating-single-cell-and-visium-spatial-gene-expression-data</a>). For finer cellular resolution, technologies such as MERFISH, SMI, or Xenium, among others, can measure gene expression at individual cells <span class="citation">(<a href="#ref-yue2023guidebook">Yue et al. 2023</a>)</span>. In general, there is a trade-off between the cellular resolution and molecular resolution, as the number of quantified genes and RNA molecules is lower in single-cell level spatial technologies compared to those at the ROI or spot level. In single-cell ST, often a panel of hundreds of genes is quantified, while in “mini-bulk” (ROI/spot) ST, it is possible to genes at the whole transcriptome level.</p>
+<p><img src="resources/images/10c-spatial-transcriptomics_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g2668d07d0b9_461_0.png" alt="A trade-off exists between the cellular resolution and molecular resolution in spatial transcriptomics." width="100%" /></p>
+<p>In addition to the differences in cellular and molecular, there are fundamental differences in the chemistry used to count the RNA transcripts in the tissue <span class="citation">(<a href="#ref-wang2021spatial">N. Wang et al. 2021</a>; <a href="#ref-yue2023guidebook">Yue et al. 2023</a>)</span>. Capture or hybridization of RNA followed by sequencing, or fluorescent imaging are two of the most common techniques used in ST methods. Because of large diversity in resolution and chemical procedures among ST technologies, data collection workflows are equally diverse. Finally, each study poses specific questions that cannot be addressed with traditional scRNA-seq pipelines, requiring customized workflows.</p>
 <p>Some of the commonalities in the workflows are presented here:</p>
 <ol style="list-style-type: decimal">
 <li><p><strong>Sample preparation:</strong> The preparation of a tissue sample will depend largely on the specific ST technology to be used. In general, this involves obtaining the tissue of interest in the form of a thin slice from a fresh frozen biopsy or a paraffin embedded tissue block. Tissue slices are generally about five to 10 micron of thickness. Given the instability of RNA molecules, the samples originating the tissue slices should be properly preserved and stabilized to maintain the integrity of RNA molecules. Many ST technologies are compatible with tissue microarrays (TMAs).</p></li>
-<li><p><strong>Capture or hybridization of RNA molecules:</strong> In this step, the tissue sample is typically placed on a solid substrate, such as regular positively charged glass slides or vendor-designed slides. The latter category include spatially barcoded slides. (e.g., Visium <span class="citation">(<a href="#ref-staahl2016visualization" role="doc-biblioref">Ståhl et al. 2016</a>)</span> ), where RNA capture probes are contained in microscopic spots arranged in arrays or grids. The use of positively charged slides are used in technologies using <em>in-situ</em> sequencing or imaging-based methods, however, capture-based methods like GeoMx also employ this type of slide. Each method entails specific considerations. An example of these considerations include optimization of tissue permeabilization in Visium slides to release the RNA molecules. In the case of imaging-based methods, RNA molecules are hybridized with fluorescent probes that uniquely identify each RNA species [e.g., SMI <span class="citation">(<a href="#ref-he2022high" role="doc-biblioref">S. He et al. 2022</a>)</span>, MERFISH <span class="citation">(<a href="#ref-zhang2021spatially" role="doc-biblioref">M. Zhang et al. 2021</a>)</span> ].</p></li>
+<li><p><strong>Capture or hybridization of RNA molecules:</strong> In this step, the tissue sample is typically placed on a solid substrate, such as regular positively charged glass slides or vendor-designed slides. The latter category include spatially barcoded slides. (e.g., Visium <span class="citation">(<a href="#ref-staahl2016visualization">Ståhl et al. 2016</a>)</span> ), where RNA capture probes are contained in microscopic spots arranged in arrays or grids. The use of positively charged slides are used in technologies using <em>in-situ</em> sequencing or imaging-based methods, however, capture-based methods like GeoMx also employ this type of slide. Each method entails specific considerations. An example of these considerations include optimization of tissue permeabilization in Visium slides to release the RNA molecules. In the case of imaging-based methods, RNA molecules are hybridized with fluorescent probes that uniquely identify each RNA species [e.g., SMI <span class="citation">(<a href="#ref-he2022high">S. He et al. 2022</a>)</span>, MERFISH <span class="citation">(<a href="#ref-zhang2021spatially">M. Zhang et al. 2021</a>)</span> ].</p></li>
 <li><p><strong>RNA quantification:</strong> The method used to count the number of captured or hybridized RNA molecules greatly varies from technology to technology. Capture methods often involve release of the RNA molecules from the tissue or slide, followed by library preparation, amplification, next generation sequencing, and read mapping to a reference genome. In this case, libraries are spatially multiplexed, whereby barcodes indicate the spatial location originating the captured RNA molecules. In imaging-based methods, segmentation is required to delineate the cell borders. Then, coded fluorescent probes are counted within each segmented cells.</p></li>
-<li><p><strong>Data quality control and pre-processing:</strong> As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriptomics data typically contain an excess of zeroes and high gene dropout <span class="citation">(<a href="#ref-zhao2022modeling" role="doc-biblioref">Zhao et al. 2022</a>)</span>. Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercised to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly <span class="citation">(<a href="#ref-ospina2023primer" role="doc-biblioref">Ospina, Soupir, and Fridley 2023</a>)</span>. In imaging-based methods, the area of cells can be used to detect “doublets” generated during image segmentation. Once filtering has been performed, gene count normalization and transformation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform <span class="citation">(<a href="#ref-hafemeister2019normalization" role="doc-biblioref">Hafemeister and Satija 2019</a>)</span> can be also used.</p></li>
+<li><p><strong>Data quality control and pre-processing:</strong> As with any omics technology, filtering and pre-processing is of paramount importance for downstream analysis. Spatial transcriptomics data typically contain an excess of zeroes and high gene dropout <span class="citation">(<a href="#ref-zhao2022modeling">Zhao et al. 2022</a>)</span>. Removing genes expressed in very few spots or cells is often done. Similarly, it is advisable to remove spots with very few counts, however, care needs to exercised to not remove biological variation due to cellularity (i.e., areas with fewer cells tend to have less counts). Mitochondrial or ribosomal genes if available in the data, can be used to assess the level of tissue necrosis and filter accordingly <span class="citation">(<a href="#ref-ospina2023primer">Ospina, Soupir, and Fridley 2023</a>)</span>. In imaging-based methods, the area of cells can be used to detect “doublets” generated during image segmentation. Once filtering has been performed, gene count normalization and transformation is typically a part of pre-processing. Commonly used methods in scRNA-seq such as library-size normalization and log-transformation, are also commonplace in spatial transcriptomics studies. Methods that attempt technical effect correction such as SCTransform <span class="citation">(<a href="#ref-hafemeister2019normalization">Hafemeister and Satija 2019</a>)</span> can be also used.</p></li>
 <li><p><strong>Visualization:</strong> Similar to scRNA-seq data, dimension reduction methods such as the Uniform Manifold Approximation and Projection (UMAP) are key to visualize the heterogeneity of the data set. Nonetheless, given the additional modality provided by the spatial coordinates, spatial gene expression heatmaps can be generated, which can be compared against the imaging data (e.g., H&amp;, IHC, mIF) to gain further insights into overall tissue architecture.</p></li>
 <li><p><strong>Clustering and cell/tissue domain phenotyping:</strong> There is a plethora of clustering approaches, ranging from employed in scRNA-seq analysis (e.g., Louvain) to novel neural network classification. Some methods take advantage of the spatial location information and/or tissue image to inform clustering. Compared to clustering, cell/domain phenotyping is an area of even more active development, within the majority of methods relying on the use of a comprehensive single-cell, tissue specific atlas from which cell types (i.e., “labels”) are obtained. Canonical marker-based phenotyping is still widely used, and in many cases unavoidable to identify specific cell populations. general, it is advisable to use the expert validation of a tissue biologist or pathologist to ascertain if clustering and phenotyping are capturing the tissue architecture adequately.</p></li>
 </ol>
 </div>
-<div id="spatial-transcriptomic-data-strengths" class="section level2" number="14.4">
-<h2><span class="header-section-number">14.4</span> Spatial transcriptomic data <strong>strengths</strong>:</h2>
+<div id="spatial-transcriptomic-data-strengths" class="section level2 hasAnchor" number="14.4">
+<h2><span class="header-section-number">14.4</span> Spatial transcriptomic data <strong>strengths</strong>:<a href="spatial-transcriptomics-1.html#spatial-transcriptomic-data-strengths" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><strong>Preservation of the spatial context:</strong> Spatial transcriptomics allows the investigation of gene expression patterns, cell types, and their interactions within the context of tissue spatial organization.</li>
 <li><strong>Integration with imaging data:</strong> Spatial transcriptomics provides an additional data modality in the form of imaging data, such as histological images or fluorescence microscopy. This integration enhances the interpretation of spatial transcriptomic data by correlating gene expression patterns with tissue morphology and specific cellular structures.</li>
@@ -586,8 +577,8 @@ <h2><span class="header-section-number">14.4</span> Spatial transcriptomic data
 <li><strong>Hypothesis generation and biomarker discovery:</strong> Spatial transcriptomic analysis can help in the generation of hypotheses and the identification of potential biomarkers related to specific tissue functions, regions, or disease states. By linking gene expression patterns to tissue organization and pathology, spatial transcriptomics facilitates the discovery of spatially restricted gene signatures and potential diagnostic or prognostic markers.</li>
 </ul>
 </div>
-<div id="spatial-transcriptomic-data-weaknesses" class="section level2" number="14.5">
-<h2><span class="header-section-number">14.5</span> Spatial transcriptomic data <strong>weaknesses</strong>:</h2>
+<div id="spatial-transcriptomic-data-weaknesses" class="section level2 hasAnchor" number="14.5">
+<h2><span class="header-section-number">14.5</span> Spatial transcriptomic data <strong>weaknesses</strong>:<a href="spatial-transcriptomics-1.html#spatial-transcriptomic-data-weaknesses" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><strong>Trade-off between spatial resolution and molecular resolution:</strong> Spatial transcriptomic techniques that provide whole transcriptome level information measure expression at the “mini-bulk” level (spots or ROIs), with each mini-bulk sample containing a collection of cells. Conversely, single-cell ST provide expression for a panel of genes (hundreds to a few thousands of genes). In addition, obtaining fine-grained spatial information may be challenging, especially in complex tissues or samples with high cellular density.</li>
 <li><strong>Technical variability and experimental artifacts:</strong> Spatial transcriptomic analysis involves multiple experimental steps, including tissue processing, capture/hybridization, and sequencing/imaging. Each step introduces technical variability and potential experimental artifacts, which can impact the accuracy and reproducibility of the results. Controlling and minimizing these sources of variation is crucial but can be challenging.</li>
@@ -597,146 +588,146 @@ <h2><span class="header-section-number">14.5</span> Spatial transcriptomic data
 <li><strong>Cost and time considerations:</strong> Spatial transcriptomic analysis can be relatively expensive and time-consuming compared to traditional transcriptomic techniques. The specialized protocols, reagents, and instrumentation required can add to the cost of the analysis. Moreover, the data generation and analysis processes can be time-intensive, which may limit the scalability of studies involving large sample sizes.</li>
 </ul>
 </div>
-<div id="tools-for-spatial-transcriptomics" class="section level2" number="14.6">
-<h2><span class="header-section-number">14.6</span> Tools for spatial transcriptomics</h2>
-<div id="data-processing" class="section level3" number="14.6.1">
-<h3><span class="header-section-number">14.6.1</span> Data processing:</h3>
-<div id="space-ranger" class="section level4" number="14.6.1.1">
-<h4><span class="header-section-number">14.6.1.1</span> <a href="https://www.10xgenomics.com/support/software/space-ranger/downloads">Space Ranger</a></h4>
+<div id="tools-for-spatial-transcriptomics" class="section level2 hasAnchor" number="14.6">
+<h2><span class="header-section-number">14.6</span> Tools for spatial transcriptomics<a href="spatial-transcriptomics-1.html#tools-for-spatial-transcriptomics" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<div id="data-processing" class="section level3 hasAnchor" number="14.6.1">
+<h3><span class="header-section-number">14.6.1</span> Data processing:<a href="spatial-transcriptomics-1.html#data-processing" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="space-ranger" class="section level4 hasAnchor" number="14.6.1.1">
+<h4><span class="header-section-number">14.6.1.1</span> <a href="https://www.10xgenomics.com/support/software/space-ranger/downloads">Space Ranger</a><a href="spatial-transcriptomics-1.html#space-ranger" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> Space Ranger is a software package developed by 10x Genomics specifically for processing and analyzing spatial transcriptomics raw data generated by their platform (Visium). It provides a streamlined workflow for processing raw data, including image registration, assignment of read counts to spots, and counting transcripts. Outputs from Space Ranger are commonly the input of many other ST analytical software.</li>
 <li><strong>Cons:</strong> Space Ranger has been designed to process only 10x Genomics data. The software does not provide methods to extract insights, which is accomplished by integration with other analytical suites. Requires knowledge of command line use.</li>
 </ul>
 </div>
-<div id="geomxtools" class="section level4" number="14.6.1.2">
-<h4><span class="header-section-number">14.6.1.2</span> <a href="https://www.bioconductor.org/packages/release/bioc/html/GeomxTools.html">GeomxTools</a></h4>
+<div id="geomxtools" class="section level4 hasAnchor" number="14.6.1.2">
+<h4><span class="header-section-number">14.6.1.2</span> <a href="https://www.bioconductor.org/packages/release/bioc/html/GeomxTools.html">GeomxTools</a><a href="spatial-transcriptomics-1.html#geomxtools" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The GeomxTools R package has been designed to take outputs from the GeoMx Digital Spatial Profiler (DSP) platform. The package includes methods to use raw .dcc files and .pkc probe set files to generate count matrices per ROI. Support for normalization and transformation of counts are also included in GeomxTools.</li>
 <li><strong>Cons:</strong> GeomxTools has been designed to process GeoMx DSP data outputs. Requires knowledge of R programming.</li>
 </ul>
 </div>
 </div>
-<div id="data-exploration" class="section level3" number="14.6.2">
-<h3><span class="header-section-number">14.6.2</span> Data exploration:</h3>
-<div id="seurat" class="section level4" number="14.6.2.1">
-<h4><span class="header-section-number">14.6.2.1</span> <a href="https://satijalab.org/seurat/">Seurat</a></h4>
+<div id="data-exploration" class="section level3 hasAnchor" number="14.6.2">
+<h3><span class="header-section-number">14.6.2</span> Data exploration:<a href="spatial-transcriptomics-1.html#data-exploration" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="seurat" class="section level4 hasAnchor" number="14.6.2.1">
+<h4><span class="header-section-number">14.6.2.1</span> <a href="https://satijalab.org/seurat/">Seurat</a><a href="spatial-transcriptomics-1.html#seurat" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> Seurat is a widely used R package in single-cell data, with expanded capabilities to analyze ST data from multiple platforms. Seurat features direct integration with outputs from Space Ranger, MERSCOPE, CosMx-SMI, among others. It provides a variety of functions for data pre-processing, dimensionality reduction, clustering, and visualization. Seurat has a large user community, extensive documentation, and tutorials, making it accessible to researchers.</li>
 <li><strong>Cons:</strong> Seurat can be memory-intensive, particularly when working with large data sets. It requires familiarity with R programming and bioinformatics concepts for effective use. Overall, methods in Seurat are the same methods applied to non-spatial scRNA-seq data.</li>
 </ul>
 </div>
-<div id="squidpy" class="section level4" number="14.6.2.2">
-<h4><span class="header-section-number">14.6.2.2</span> <a href="https://squidpy.readthedocs.io/en/stable/">Squidpy</a></h4>
+<div id="squidpy" class="section level4 hasAnchor" number="14.6.2.2">
+<h4><span class="header-section-number">14.6.2.2</span> <a href="https://squidpy.readthedocs.io/en/stable/">Squidpy</a><a href="spatial-transcriptomics-1.html#squidpy" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> Scanpy is a Python-based library specifically designed for single-cell and ST analysis. It offers a range of functionalities for data pre-processing, clustering, trajectory analysis, and visualization. Scanpy is known for its scalability, efficiency, and flexibility. It integrates well with other Python libraries and frameworks, making it suitable for integration with other analysis pipelines. Some of the statistical methods in Squidpy implicitly make use of the spatial coordinates to detect patterns.</li>
 <li><strong>Cons:</strong> Similar to Seurat, Scanpy requires some familiarity with Python programming and bioinformatics concepts. Users without prior programming experience may need to invest time in learning Python.</li>
 </ul>
 </div>
-<div id="giotto" class="section level4" number="14.6.2.3">
-<h4><span class="header-section-number">14.6.2.3</span> <a href="https://giottosuite.readthedocs.io/en/latest/index.html">Giotto</a></h4>
+<div id="giotto" class="section level4 hasAnchor" number="14.6.2.3">
+<h4><span class="header-section-number">14.6.2.3</span> <a href="https://giottosuite.readthedocs.io/en/latest/index.html">Giotto</a><a href="spatial-transcriptomics-1.html#giotto" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The analytical suite Giotto in a collection of methods to study spatial gene expression, agnostic to the platform used to generate the data. It allows users to perform data pre-processing, clustering, visualization, detection of spatially variable genes, and expression co-localization analysis. Computationally intensive analysis can be conducted in the cloud via integration with Terra.bio or locally using a Docker container. Some of the statistical methods in Giotto implicitly make use of the spatial coordinates to detect patterns.</li>
 <li><strong>Cons:</strong> Requires some familiarity with R, as well as bioinformatics and spatial statistics concepts. Installation requires setting up Python, as some modules use that language.</li>
 </ul>
 </div>
-<div id="spatialge-and-spatialge-web" class="section level4" number="14.6.2.4">
-<h4><span class="header-section-number">14.6.2.4</span> <a href="https://fridleylab.github.io/spatialGE/">spatialGE</a> and <a href="https://spatialge.moffitt.org/">spatialGE-web</a></h4>
+<div id="spatialge-and-spatialge-web" class="section level4 hasAnchor" number="14.6.2.4">
+<h4><span class="header-section-number">14.6.2.4</span> <a href="https://fridleylab.github.io/spatialGE/">spatialGE</a> and <a href="https://spatialge.moffitt.org/">spatialGE-web</a><a href="spatial-transcriptomics-1.html#spatialge-and-spatialge-web" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The spatialGE analysis suite allows users to study STdata form multiple platforms, including methods for pre-processing, clustering/domain detection, spatially variable genes, and functional analysis via detection of gene expression gradients and/or gene set enrichment spatial patterns. All the functionality of the R package has been implemented on a point-and-click web application requiring no coding experience and email notifications when analyses are completed. Statistcial methods in spatialGE implicitly take into account the spatial coordinates during calculations.</li>
 <li><strong>Cons:</strong> Use of the spatialGE R package requires familiarity with the language. The spatialGE web application by-pass the need of R coding, however computationally-intensive methods can take time to complete.</li>
 </ul>
 </div>
-<div id="loupe" class="section level4" number="14.6.2.5">
-<h4><span class="header-section-number">14.6.2.5</span> <a href="https://support.10xgenomics.com/spatial-gene-expression/software/visualization/latest/what-is-loupe-browser">Loupe</a></h4>
+<div id="loupe" class="section level4 hasAnchor" number="14.6.2.5">
+<h4><span class="header-section-number">14.6.2.5</span> <a href="https://support.10xgenomics.com/spatial-gene-expression/software/visualization/latest/what-is-loupe-browser">Loupe</a><a href="spatial-transcriptomics-1.html#loupe" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The Loupe browser is a point-and-click tool for exploration of both non-spatial scRNA-seq and ST. Loupe takes Visium outputs and allows visualization of gene expression, clustering, and detection of differentially expressed genes. The tool also allows for easy registration and comparative analysis of Visium imaging and expression data.</li>
 <li><strong>Cons:</strong> Loupe allows basic exploration of the data. To perform functional-level analysis of ST data, the use of additional tools might be required.</li>
 </ul>
 </div>
-<div id="st-pipeline" class="section level4" number="14.6.2.6">
-<h4><span class="header-section-number">14.6.2.6</span> <a href="https://pypi.org/project/stpipeline/">ST Pipeline</a></h4>
+<div id="st-pipeline" class="section level4 hasAnchor" number="14.6.2.6">
+<h4><span class="header-section-number">14.6.2.6</span> <a href="https://pypi.org/project/stpipeline/">ST Pipeline</a><a href="spatial-transcriptomics-1.html#st-pipeline" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> ST Pipeline is a bioinformatics pipeline developed by the Spatial Transcriptomics consortium. It provides a complete workflow for ST data analysis, including pre-processing, normalization, spot detection, and visualization. ST Pipeline supports various spatial transcriptomic platforms, making it versatile.</li>
 <li><strong>Cons:</strong> ST Pipeline requires familiarity with Python, command-line, and Linux environments. Users may need to invest time in setting up the pipeline and configuring parameters based on their specific datasets and platforms.</li>
 </ul>
 </div>
-<div id="semla" class="section level4" number="14.6.2.7">
-<h4><span class="header-section-number">14.6.2.7</span> <a href="https://ludvigla.github.io/semla/index.html">semla</a></h4>
+<div id="semla" class="section level4 hasAnchor" number="14.6.2.7">
+<h4><span class="header-section-number">14.6.2.7</span> <a href="https://ludvigla.github.io/semla/index.html">semla</a><a href="spatial-transcriptomics-1.html#semla" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The semla R package is a bioinformatics pipeline enabling pre-processing, visualization, spatial statistics, and image integration of ST data. The package provides integration with Seurat.</li>
 <li><strong>Cons:</strong> ST Pipeline requires familiarity with R.</li>
 </ul>
 </div>
 </div>
-<div id="clusteringtissue-domain-identification" class="section level3" number="14.6.3">
-<h3><span class="header-section-number">14.6.3</span> Clustering/tissue domain identification:</h3>
-<div id="spagcn" class="section level4" number="14.6.3.1">
-<h4><span class="header-section-number">14.6.3.1</span> <a href="https://github.com/jianhuupenn/SpaGCN/tree/master">SpaGCN</a></h4>
+<div id="clusteringtissue-domain-identification" class="section level3 hasAnchor" number="14.6.3">
+<h3><span class="header-section-number">14.6.3</span> Clustering/tissue domain identification:<a href="spatial-transcriptomics-1.html#clusteringtissue-domain-identification" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="spagcn" class="section level4 hasAnchor" number="14.6.3.1">
+<h4><span class="header-section-number">14.6.3.1</span> <a href="https://github.com/jianhuupenn/SpaGCN/tree/master">SpaGCN</a><a href="spatial-transcriptomics-1.html#spagcn" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The SpaGCN Python package performs prediction of tissue domains implicitly taking into account the spatial coordinates and optionally assisted by colors in the image data. The gene expression, coordinate, and image data are processed via graph convolutional networks (GCN) to find common patterns between the modalities. Based on predicted domains, SpaGCN can identify gene or collection of genes (meta genes) that are uniquely expressed in the domains. SpaGCN allows analysis of multiple ST technologies.</li>
 <li><strong>Cons:</strong> SpaGCN requires familiarity with Python and basic data frame processing. Some understanding of GCNs and parameters involved in calculations is advisable.</li>
 </ul>
 </div>
 </div>
-<div id="spatially-variable-gene-identification" class="section level3" number="14.6.4">
-<h3><span class="header-section-number">14.6.4</span> Spatially variable gene identification:</h3>
-<div id="spatialde" class="section level4" number="14.6.4.1">
-<h4><span class="header-section-number">14.6.4.1</span> <a href="https://github.com/Teichlab/SpatialDE">SpatialDE</a></h4>
+<div id="spatially-variable-gene-identification" class="section level3 hasAnchor" number="14.6.4">
+<h3><span class="header-section-number">14.6.4</span> Spatially variable gene identification:<a href="spatial-transcriptomics-1.html#spatially-variable-gene-identification" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="spatialde" class="section level4 hasAnchor" number="14.6.4.1">
+<h4><span class="header-section-number">14.6.4.1</span> <a href="https://github.com/Teichlab/SpatialDE">SpatialDE</a><a href="spatial-transcriptomics-1.html#spatialde" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> SpatialDE is a Python package designed for detecting spatially variable genes from ST data using non-parametric statistics. SpatialDE intergrates the spatial coordinates and image data to identify genes or group of genes showing spatial expression aggregation. The package can analyze data from multiple ST platforms.</li>
 <li><strong>Cons:</strong> SpatialDE requires familiarity with Python programming.</li>
 </ul>
 </div>
-<div id="spark-and-spark-x" class="section level4" number="14.6.4.2">
-<h4><span class="header-section-number">14.6.4.2</span> <a href="https://xzhoulab.github.io/SPARK/">SPARK and SPARK-X</a></h4>
+<div id="spark-and-spark-x" class="section level4 hasAnchor" number="14.6.4.2">
+<h4><span class="header-section-number">14.6.4.2</span> <a href="https://xzhoulab.github.io/SPARK/">SPARK and SPARK-X</a><a href="spatial-transcriptomics-1.html#spark-and-spark-x" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The SPARK methods allows scalable detection of genes showing spatial patterns. The tests are performed via generalized linear models and spatial autocorrelation matrix estimation. The SPARK implementation allows scalabilty and computing efficiency.</li>
 <li><strong>Cons:</strong> The SPARK methods require familiarity with Python programming. Some familiarity with spatial statistics is advisable.</li>
 </ul>
 </div>
-<div id="spacemarkers" class="section level4" number="14.6.4.3">
-<h4><span class="header-section-number">14.6.4.3</span> <a href="https://github.com/FertigLab/SpaceMarkers">SpaceMarkers</a></h4>
+<div id="spacemarkers" class="section level4 hasAnchor" number="14.6.4.3">
+<h4><span class="header-section-number">14.6.4.3</span> <a href="https://github.com/FertigLab/SpaceMarkers">SpaceMarkers</a><a href="spatial-transcriptomics-1.html#spacemarkers" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The SpaceMarkers approach detects sets of genes with evidence of spatial co-expression. Kernel smoothing is used to model the weight of expression of a gene taking into account neighboring areas.</li>
 <li><strong>Cons:</strong> Requires familiarity with R programming. The method has been tested in Visium data.</li>
 </ul>
 </div>
 </div>
-<div id="deconvolutionphenotyping" class="section level3" number="14.6.5">
-<h3><span class="header-section-number">14.6.5</span> Deconvolution/phenotyping:</h3>
-<div id="spotlight" class="section level4" number="14.6.5.1">
-<h4><span class="header-section-number">14.6.5.1</span> <a href="https://www.bioconductor.org/packages/release/bioc/html/SPOTlight.html">SPOTlight</a></h4>
+<div id="deconvolutionphenotyping" class="section level3 hasAnchor" number="14.6.5">
+<h3><span class="header-section-number">14.6.5</span> Deconvolution/phenotyping:<a href="spatial-transcriptomics-1.html#deconvolutionphenotyping" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="spotlight" class="section level4 hasAnchor" number="14.6.5.1">
+<h4><span class="header-section-number">14.6.5.1</span> <a href="https://www.bioconductor.org/packages/release/bioc/html/SPOTlight.html">SPOTlight</a><a href="spatial-transcriptomics-1.html#spotlight" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The SPOTlight algorithm takes advantage of robust non-negative matrix factorization (NMF) to define transcriptomic profiles from an annotated scRNA-seq reference. The transcriptomic profiles are transferred to the spatial transcriptomics data using non-negative least squares regression. Instead of providing a single category for “mini-bulk” data (e.g., Visium), SPOTlight features piecharts to describe the cell type composition within each mini-bulk sample (e.g., spot).</li>
 <li><strong>Cons:</strong> Requires some familiarity with R programming. The method has been tested in Visium data. As with most deconvolution methods, accurate identification of cell types highly relies on a well-annotated scRNA reference.</li>
 </ul>
 </div>
-<div id="stdeconvolve" class="section level4" number="14.6.5.2">
-<h4><span class="header-section-number">14.6.5.2</span> <a href="https://jef.works/STdeconvolve/">STdeconvolve</a></h4>
+<div id="stdeconvolve" class="section level4 hasAnchor" number="14.6.5.2">
+<h4><span class="header-section-number">14.6.5.2</span> <a href="https://jef.works/STdeconvolve/">STdeconvolve</a><a href="spatial-transcriptomics-1.html#stdeconvolve" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The STdeconvolve algorithm uses latent dirichlet allocation (LDA) to define transcriptomic profiles or topics on the ST data. The topics are assigned a biological identity (e.g., cell type, tissue domain) using gene set enrichment of marker-based phenotyping. The topics are presented as proportions in “mini-bulk” data (e.g., Visium), where pie charts describe the cell type/domain composition within each mini-bulk sample (e.g., spot). STdeconvolve is one of very few reference-free ST deconvolution methods.</li>
 <li><strong>Cons:</strong> Requires some familiarity with R programming. The method has been mostly tested in Visium data. For MERFISH data, requires aggregation into spots.</li>
 </ul>
 </div>
-<div id="insitutype" class="section level4" number="14.6.5.3">
-<h4><span class="header-section-number">14.6.5.3</span> <a href="https://github.com/Nanostring-Biostats/InSituType">InSituType</a></h4>
+<div id="insitutype" class="section level4 hasAnchor" number="14.6.5.3">
+<h4><span class="header-section-number">14.6.5.3</span> <a href="https://github.com/Nanostring-Biostats/InSituType">InSituType</a><a href="spatial-transcriptomics-1.html#insitutype" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> InSituType is a cell phenotyping algorithm designed for CosMx-SMI data but applicable to other single-cell ST data. InSituType can transfer cell types from an annotated scRNA-seq data set, or run reference-free unsupervised clustering to detect cell populations. In addition, immunofluorescence data accompanying SMI data sets can be used to inform gene expression deconvolution. InSituType can phenotype large quantities of cells within reasonable time.</li>
 <li><strong>Cons:</strong> InSituType assumes cell populations can be defined via cluster centroids. Thus, deconvolution can be affected when samples contain cells with intermediate phenotypes or if technical/background noise is prevalent. Requires familiarity with R programming.</li>
 </ul>
 </div>
-<div id="spatialdecon" class="section level4" number="14.6.5.4">
-<h4><span class="header-section-number">14.6.5.4</span> <a href="https://bioconductor.org/packages/release/bioc/vignettes/SpatialDecon/inst/doc/SpatialDecon_vignette_NSCLC.html">SpatialDecon</a></h4>
+<div id="spatialdecon" class="section level4 hasAnchor" number="14.6.5.4">
+<h4><span class="header-section-number">14.6.5.4</span> <a href="https://bioconductor.org/packages/release/bioc/vignettes/SpatialDecon/inst/doc/SpatialDecon_vignette_NSCLC.html">SpatialDecon</a><a href="spatial-transcriptomics-1.html#spatialdecon" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> The SpatialDecon algorithm implements log-normal regression to alleviate the effects of ST data skewness in the prediction of cell types. The method is analogous to estimation of cell types proportions in bulk RNAseq to “mini-bulk” ROIs or spots in GeoMx and Visium experiments respectively. Hence, the method assumes cell type heterogeneity within the ROIs or spots. In the case of GeoMx experiments, SpatialDecon takes advantage of nuclei counts to provide absolute cell type counts within each ROI. The package includes pre-built cell type signature matrices for several tissue types, but scRNA references can be used to create custom signatures.</li>
 <li><strong>Cons:</strong> Requires familiarity with R programming.</li>
 </ul>
 </div>
 </div>
-<div id="cell-communication" class="section level3" number="14.6.6">
-<h3><span class="header-section-number">14.6.6</span> Cell communication:</h3>
-<div id="cellchat" class="section level4" number="14.6.6.1">
-<h4><span class="header-section-number">14.6.6.1</span> <a href="https://htmlpreview.github.io/?https://github.com/sqjin/CellChat/blob/master/tutorial/CellChat_analysis_of_spatial_imaging_data.html">CellChat</a></h4>
+<div id="cell-communication" class="section level3 hasAnchor" number="14.6.6">
+<h3><span class="header-section-number">14.6.6</span> Cell communication:<a href="spatial-transcriptomics-1.html#cell-communication" class="anchor-section" aria-label="Anchor link to header"></a></h3>
+<div id="cellchat" class="section level4 hasAnchor" number="14.6.6.1">
+<h4><span class="header-section-number">14.6.6.1</span> <a href="https://htmlpreview.github.io/?https://github.com/sqjin/CellChat/blob/master/tutorial/CellChat_analysis_of_spatial_imaging_data.html">CellChat</a><a href="spatial-transcriptomics-1.html#cellchat" class="anchor-section" aria-label="Anchor link to header"></a></h4>
 <ul>
 <li><strong>Pros:</strong> CellChat is an algorithm to infer cell communications via ligand-receptor interactions. CellChat was designed for non-spatial scRNA data, however, a recent implementation has been included to account for distances between cells in ST experiments. The package includes a comprehensive ligand-receptor data base which is queried after quantification of probability of interaction between two given cell types.</li>
 <li><strong>Cons:</strong> Requires familiarity with R programming. The spatial implementation of CellChat has been tested on Visium data.</li>
@@ -744,8 +735,8 @@ <h4><span class="header-section-number">14.6.6.1</span> <a href="https://htmlpre
 </div>
 </div>
 </div>
-<div id="more-tools-and-tutorials-regarding-spatial-transcriptomics" class="section level2" number="14.7">
-<h2><span class="header-section-number">14.7</span> More tools and tutorials regarding spatial transcriptomics</h2>
+<div id="more-tools-and-tutorials-regarding-spatial-transcriptomics" class="section level2 hasAnchor" number="14.7">
+<h2><span class="header-section-number">14.7</span> More tools and tutorials regarding spatial transcriptomics<a href="spatial-transcriptomics-1.html#more-tools-and-tutorials-regarding-spatial-transcriptomics" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <ul>
 <li><a href="https://satijalab.org/seurat/articles/spatial_vignette.html">Analysis, visualization, and integration of spatial datasets with Seurat</a></li>
 <li><a href="https://github.com/sheffield-bioinformatics-core/spatial_transcriptomics_tutorial">Sheffield Bioinformatics tutorial for spatial transcriptomics</a></li>
@@ -755,7 +746,7 @@ <h2><span class="header-section-number">14.7</span> More tools and tutorials reg
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-bergholtz2021best" class="csl-entry">
 Bergholtz, Helga, Jodi M Carter, Alessandra Cesano, Maggie Chon U Cheang, Sarah E Church, Prajan Divakar, Christopher A Fuhrman, et al. 2021. <span>“Best Practices for Spatial Profiling for Breast Cancer Research with the GeoMx<span></span> Digital Spatial Profiler.”</span> <em>Cancers</em> 13 (17): 4456.
@@ -831,10 +822,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -904,7 +902,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');
diff --git a/docs/no_toc/whole-genome-or-exome-sequencing.html b/docs/no_toc/whole-genome-or-exome-sequencing.html
index dae5ede3..677011ba 100644
--- a/docs/no_toc/whole-genome-or-exome-sequencing.html
+++ b/docs/no_toc/whole-genome-or-exome-sequencing.html
@@ -6,12 +6,11 @@
   <meta http-equiv="X-UA-Compatible" content="IE=edge" />
   <title>Chapter 10 Whole Genome or Exome Sequencing | Choosing Genomics Tools</title>
   <meta name="description" content="Description about Course/Book." />
-  <meta name="generator" content="bookdown 0.24 and GitBook 2.6.7" />
+  <meta name="generator" content="bookdown 0.41 and GitBook 2.6.7" />
 
   <meta property="og:title" content="Chapter 10 Whole Genome or Exome Sequencing | Choosing Genomics Tools" />
   <meta property="og:type" content="book" />
   
-  
   <meta property="og:description" content="Description about Course/Book." />
   
 
@@ -31,7 +30,6 @@
   <link rel="shortcut icon" href="assets/ITN_favicon.ico" type="image/x-icon" />
 <link rel="prev" href="dna-methods-overview.html"/>
 <link rel="next" href="rna-methods-overview.html"/>
-<script src="libs/header-attrs-2.10/header-attrs.js"></script>
 <script src="libs/jquery-3.6.0/jquery-3.6.0.min.js"></script>
 <script src="https://cdn.jsdelivr.net/npm/fuse.js@6.4.6/dist/fuse.min.js"></script>
 <link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
@@ -49,31 +47,26 @@
 
 
 
-<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
-<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
-  <html>
-  
-  <head>
-  <title>Chapter 10 Whole Genome or Exome Sequencing | Title</title>
-  </head>
-  
-  <body>
-  
+<link href="libs/anchor-sections-1.1.0/anchor-sections.css" rel="stylesheet" />
+<link href="libs/anchor-sections-1.1.0/anchor-sections-hash.css" rel="stylesheet" />
+<script src="libs/anchor-sections-1.1.0/anchor-sections.js"></script>
+
   <!-- Global site tag (gtag.js) - Google Analytics -->
   <script async src="https://www.googletagmanager.com/gtag/js?id=G-QWJXTLJBQ7"></script>
   <script>
     window.dataLayer = window.dataLayer || [];
     function gtag(){dataLayer.push(arguments);}
     gtag('js', new Date());
-    
+
     gtag('config', 'G-QWJXTLJBQ7');
   </script>
-      
-  </body>
-  </html>
 
 
 
+<style type="text/css">
+  
+  div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
+</style>
 <style type="text/css">
 /* Used with Pandoc 2.11+ new --citeproc when CSL is used */
 div.csl-bib-body { }
@@ -475,8 +468,9 @@
 <li class="chapter" data-level="21" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html"><i class="fa fa-check"></i><b>21</b> Microbiome Sequencing</a>
 <ul>
 <li class="chapter" data-level="21.1" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#learning-objectives-19"><i class="fa fa-check"></i><b>21.1</b> Learning Objectives</a></li>
-<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.2</b> Goals of Amplicon analysis</a></li>
-<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.3</b> Microbiome Analysis with QIIME 2</a></li>
+<li class="chapter" data-level="21.2" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#a-brief-introduction-to-microbiomes"><i class="fa fa-check"></i><b>21.2</b> A Brief Introduction to Microbiomes</a></li>
+<li class="chapter" data-level="21.3" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#goals-of-amplicon-analysis"><i class="fa fa-check"></i><b>21.3</b> Goals of Amplicon analysis</a></li>
+<li class="chapter" data-level="21.4" data-path="microbiome-sequencing.html"><a href="microbiome-sequencing.html#microbiome-analysis-with-qiime-2"><i class="fa fa-check"></i><b>21.4</b> Microbiome Analysis with QIIME 2</a></li>
 </ul></li>
 <li class="chapter" data-level="22" data-path="itcr--omic-tool-glossary.html"><a href="itcr--omic-tool-glossary.html"><i class="fa fa-check"></i><b>22</b> ITCR -omic Tool Glossary</a>
 <ul>
@@ -541,33 +535,33 @@ <h1>
 <div class="hero-image-container"> 
   <img class= "hero-image" src="assets/itcr_main_image.png">
 </div>
-<div id="whole-genome-or-exome-sequencing" class="section level1" number="10">
-<h1><span class="header-section-number">Chapter 10</span> Whole Genome or Exome Sequencing</h1>
+<div id="whole-genome-or-exome-sequencing" class="section level1 hasAnchor" number="10">
+<h1><span class="header-section-number">Chapter 10</span> Whole Genome or Exome Sequencing<a href="whole-genome-or-exome-sequencing.html#whole-genome-or-exome-sequencing" class="anchor-section" aria-label="Anchor link to header"></a></h1>
 <div class="warning">
 <p>This chapter is in a beta stage. If you wish to contribute, please <a href="https://forms.gle/dqYgmKH8XXE2ohwD9">go to this form</a> or our <a href="https://github.com/fhdsl/Choosing_Genomics_Tools">GitHub page</a>.</p>
 </div>
-<div id="learning-objectives-8" class="section level2" number="10.1">
-<h2><span class="header-section-number">10.1</span> Learning Objectives</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png" title="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" alt="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" width="100%" />
+<div id="learning-objectives-8" class="section level2 hasAnchor" number="10.1">
+<h2><span class="header-section-number">10.1</span> Learning Objectives<a href="whole-genome-or-exome-sequencing.html#learning-objectives-8" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g12890ae15d7_0_51.png" alt="The learning objectives for this course are to: 1 Define the uses and applications of WGS/WXS 2 Describe the steps for generating WGS/WXS data 3 Understand the data analysis workflow for WGS/WXS" width="100%" />
 The learning objectives for this course are to explain the use and application of Whole Genome Sequencing (WGS) and Whole Exome Sequencing (WES/WXS) for genomics studies, outline the technical steps in generating WGS/WXS data, and detail the processing steps for analyzing and interpreting WGS/WXS data.</p>
 <p><strong>To familiarize yourself with sequencing methods as a whole, we recommend you read our <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html">chapter on sequencing first</a>.</strong></p>
 </div>
-<div id="wgs-and-wgs-overview" class="section level2" number="10.2">
-<h2><span class="header-section-number">10.2</span> WGS and WGS Overview</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_8.png" title="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" alt="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" width="100%" />
+<div id="wgs-and-wgs-overview" class="section level2 hasAnchor" number="10.2">
+<h2><span class="header-section-number">10.2</span> WGS and WGS Overview<a href="whole-genome-or-exome-sequencing.html#wgs-and-wgs-overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_8.png" alt="Whole genome sequencing overview, Process of determining entirety of DNA sequence of organism’s genome at single time. Includes sequencing all chromosomal data and DNA from mitochondria. Used to identify functional variants associated with disease" width="100%" />
 The difference between WGS and WXS sequencing is whether or not the open reading frames and thus coding regions are targeted in sequencing. WGS attempts to sequence the whole genome, while for WXS only exons with open reading frames are targeted for sequencing. Both of these methods can be massively beneficial for studying rare and complex diseases.</p>
 <p>Thus, whole genome sequencing is a technique to thoroughly analyze the entire DNA sequence of an organism’s genome. This includes sequencing all genes both coding and non-coding and all mitochondrial DNA. WGS is beneficial for identifying new and previously established variants related to disease and the regulatory elements of the genome including promoters, enhancers, and silencers. Increasingly non-coding RNAs have also been identified to play a functional role in biological mechanisms and diseases. In order to learn more about the non-coding regions of the genome, WGS is necessary.</p>
 <p>Alternatively whole exome sequencing is used to sequence the coding regions of an organism’s genome. Although non-coding regions can sometimes reveal valuable insights, coding regions can be a useful area of the genome to focus sequencing methods on, since changes in a protein coding sequence of the genome generally have more information known about them. Often protein coding sequences can have more clearly functional changes - like if a stop codon is introduced or a codon is changed to a predictable amino acid. This can more easily lead to downstream investigations on the functional implications of the protein affected.</p>
 </div>
-<div id="advantages-and-disadvantages-of-wgs-vs-wxs" class="section level2" number="10.3">
-<h2><span class="header-section-number">10.3</span> Advantages and Disadvantages of WGS vs WXS</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png" title="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" alt="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" width="100%" /></p>
+<div id="advantages-and-disadvantages-of-wgs-vs-wxs" class="section level2 hasAnchor" number="10.3">
+<h2><span class="header-section-number">10.3</span> Advantages and Disadvantages of WGS vs WXS<a href="whole-genome-or-exome-sequencing.html#advantages-and-disadvantages-of-wgs-vs-wxs" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_13.png" alt="Advantages and Disadvantages of WGS as opposed to WXS: Most complete account of individual variation, Ability to study: Structural rearrangements, Copy number variations, Insertion-Deletions, SNPs, Sequencing repeats, Coding, non-coding, and mitochondrial genome coverage, allows for discovery - identify causative variants; Disadvantages include higher cost and more resources for storing and analyzing data" width="100%" /></p>
 <p>We more thoroughly discuss how to choose DNA sequencing methods <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/dna-methods.html">here in the previous chapter</a>, but we will briefly cover this here. Alternatives to WGS include Whole Exome Sequencing (WES/WXS), which sequences the open reading frame areas of the genome or Targeted Gene Sequencing where probes have been designed to sequence only regions of interest.
 The main advantages of WGS include the ability to comprehensively analyze all regions of a genome, the ability to study structural rearrangements, gene copy number alterations, insertions and deletions, single nucleotide polymorphisms (SNPs), and sequencing repeats. Some disadvantages include higher sequencing costs and the necessity for more robust storage and analysis solutions to manage the much larger data output generated from WGS.</p>
 </div>
-<div id="wgswxs-considerations" class="section level2" number="10.4">
-<h2><span class="header-section-number">10.4</span> WGS/WXS Considerations</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png" title="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" alt="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" width="100%" />
+<div id="wgswxs-considerations" class="section level2 hasAnchor" number="10.4">
+<h2><span class="header-section-number">10.4</span> WGS/WXS Considerations<a href="whole-genome-or-exome-sequencing.html#wgswxs-considerations" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_28.png" alt="WGS/WXS Considerations , Genome type/size, Coverage requirements, Tissue source: fresh tissue, FFPE, blood, Library preparation protocol: PCR vs PCR-free" width="100%" />
 Some important considerations for WGS/WXS include:</p>
 <ul>
 <li>What genome you are studying and the size of this genome. Included in this considerations is whether this genome has been sequenced before and you will have a “reference” genome to compare your data against or whether you will have to make a reference genome yourself. <a href="https://eriqande.github.io/eca-bioinf-handbook/alignment-of-sequence-data-to-a-reference-genome-and-associated-steps.html">This bioinformatics resource</a> provides a great overview of genome alignment.</li>
@@ -575,8 +569,8 @@ <h2><span class="header-section-number">10.4</span> WGS/WXS Considerations</h2>
 <li>The tissue source and whether genetic alterations were introduced during processing are important. Fixation for formalin-fixed paraffin embedded (FFPE) can introduce mutations/genetic changes that will need to be accounted for during data analysis. <a href="https://www.beckman.com/resources/technologies/next-generation-sequencing/challenges-with-ffpe-tissue-samples">This page from Beckman</a> addresses many of the questions researchers often have about utilizing FFPE samples for their sequencing studies.</li>
 <li>The library preparation method of DNA amplification via PCR is very important as PCR can often introduce duplicates that interfere with interpreting whether a mutant gene is truly frequent or just over amplified during sequencing preparation. <a href="https://www.illumina.com/products/by-type/sequencing-kits/library-prep-kits/dna-pcr-free-prep.html">Illumina</a> provides a comparison of using PCR and PCR-free library preparation methods on their website.</li>
 </ul>
-<div id="target-enrichment-techniques" class="section level3" number="10.4.1">
-<h3><span class="header-section-number">10.4.1</span> Target enrichment techniques</h3>
+<div id="target-enrichment-techniques" class="section level3 hasAnchor" number="10.4.1">
+<h3><span class="header-section-number">10.4.1</span> Target enrichment techniques<a href="whole-genome-or-exome-sequencing.html#target-enrichment-techniques" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <p>For WXS or other targeted sequencing specifically (so not relevant to WGS data), what methods were used to enrich for the targeted sequences? (Which is the entire exome in the case of general WXS) These methods are generally summarized into two major categories: Hybridization based and amplicon based enrichment.</p>
 <pre><code>- [Hybridization based enrichment](https://www.paragongenomics.com/target-enrichment/). This includes a variety of widely used methods that we will broadly categorize in two groups: Array-based and In-solution:
   - [Array-based capture](https://en.wikipedia.org/wiki/Exome_sequencing#:~:text=Target%2Denrichment%20strategies-,Array%2Dbased%20capture,-In%2Dsolution%20capture) uses microarrays that have probes designed to bind to known coding sequences. Fragments that do not bind to these probes are washed away, leaving the sample with known coding sequences bound and ready for PCR amplification [@Hodges2007; @Turner2009].
@@ -584,24 +578,24 @@ <h3><span class="header-section-number">10.4.1</span> Target enrichment techniqu
 - [PCR/Amplicon based enrichment](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/) requires even less sample than the other two strategies and so is ideal for when the amount of sample is limited or the DNA has been otherwise processed harshly (e.g. with paraffin embedding). Because the other two enrichment methods are done after PCR amplification has been done to the whole genomic DNA sample, its thought that this method of selective PCR amplification for enrichment can result in more uniformly amplified DNA in the resulting sample. However this is less suitable the more gene targets you have (like if you truly need to sequence all of the exome) since amplicons need to be designed for each target. Overall it is much more affordable of a method. There are several variations of this method that are [discussed thoroughly by @Singh2022](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9318977/).</code></pre>
 </div>
 </div>
-<div id="dna-sequencing-pipeline-overview" class="section level2" number="10.5">
-<h2><span class="header-section-number">10.5</span> DNA Sequencing Pipeline Overview</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_33.png" title="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" alt="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" width="100%" />
+<div id="dna-sequencing-pipeline-overview" class="section level2 hasAnchor" number="10.5">
+<h2><span class="header-section-number">10.5</span> DNA Sequencing Pipeline Overview<a href="whole-genome-or-exome-sequencing.html#dna-sequencing-pipeline-overview" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_33.png" alt="Pipeline overview: Step 1: DNA extraction from sample, Step 2: library preparation, Step 3: Sequencing, Step 4: Analysis including data processing from Fastq, aligning reads to generate a BAM file, identifying variants to create a final VCF file" width="100%" />
 In order to create WGS/WXS data, DNA is first extracted from a specific sample type (tissue, blood samples, cells, FFPE blocks, etc.). Either traditional (involving phenol and chloroform) or commercial kits can be used for this first step. Next, the DNA sequencing libraries are prepared. This involves fragmenting the DNA, adding sequencing adapters, and DNA amplification if the input DNA is not of sufficient quantity. Recall that for WXS After sequencing, data is analyzed by converting and aligning reads to generate a BAM file. Many analysis tools will use the BAM file to identify variants, which then generates a VCF file. More information about sequencing and BAM and VCF file generation can be found <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html">here</a> in the sequencing data chapter.</p>
 </div>
-<div id="data-pre-processing" class="section level2" number="10.6">
-<h2><span class="header-section-number">10.6</span> Data Pre-processing</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png" title="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " alt="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " width="100%" />
+<div id="data-pre-processing" class="section level2 hasAnchor" number="10.6">
+<h2><span class="header-section-number">10.6</span> Data Pre-processing<a href="whole-genome-or-exome-sequencing.html#data-pre-processing" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_38.png" alt="Data pre-processing pipeline overview: Raw data from sequencing is transformed into a Fastq file, reads are aligned and a Bam file is created, the data is sorted and merged, duplicates are identified, and the base quality score is recalibrated to create a final BAM file " width="100%" />
 Raw sequencing reads are first transformed into a fastq file (more information about fastq files can be found <a href="http://hutchdatascience.org/Choosing_Genomics_Tools/sequencing-data.html">here</a> in the sequencing data chapter in the Quality Controls section. Then the sequencing reads are aligned to a reference genome to create a BAM file. This data is sorted and merged, and PCR duplicates are identified. The confidence that each read was sequenced correctly is reflected in the base quality score. This score must be recalibrated at this step before variants are called. A final BAM file is thus created. This can be used for future analysis steps include variant or mutation identification, which is outlined on the following slide.</p>
 </div>
-<div id="commonly-used-tools" class="section level2" number="10.7">
-<h2><span class="header-section-number">10.7</span> Commonly Used Tools</h2>
-<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png" title="Tools commonly used in WGS data analysis" alt="Tools commonly used in WGS data analysis" width="100%" />
+<div id="commonly-used-tools" class="section level2 hasAnchor" number="10.7">
+<h2><span class="header-section-number">10.7</span> Commonly Used Tools<a href="whole-genome-or-exome-sequencing.html#commonly-used-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
+<p><img src="resources/images/09a-WGS-and-WXS_files/figure-html/1YwxXy2rnUgbx_7B7ENH9wpDX-j6JpJz6lGVzOkjo0qY_g138a6ce16b7_35_43.png" alt="Tools commonly used in WGS data analysis" width="100%" />
 The following link provides the data analysis pipeline written by researchers in the NCI division of the NIH and provides a helpful overview of the typical steps necessary for <a href="https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/">WGS analysis</a>.</p>
 <p>Here are many of the tools and resources used by researchers for analyzing WGS data.</p>
 </div>
-<div id="data-pre-processing-tools" class="section level2" number="10.8">
-<h2><span class="header-section-number">10.8</span> Data pre-processing tools</h2>
+<div id="data-pre-processing-tools" class="section level2 hasAnchor" number="10.8">
+<h2><span class="header-section-number">10.8</span> Data pre-processing tools<a href="whole-genome-or-exome-sequencing.html#data-pre-processing-tools" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>In most cases, all of these tools will be used sequentially to prepare the data for downstream mutational and copy number variation (CNV) analysis.</p>
 <ul>
 <li><a href="https://bedtools.readthedocs.io/en/latest/">Bedtools</a> including the bamtofastq function, which is the first step in converting data off the sequencer to a usable format for downstream analysis</li>
@@ -610,8 +604,8 @@ <h2><span class="header-section-number">10.8</span> Data pre-processing tools</h
 <li><a href="https://gatk.broadinstitute.org/hc/en-us">GATK</a> is a comprehensive set of tools from the Broad Institute for analyzing many types of sequencing data. For pre-processing, the print read function is very beneficial for writing the reads from a BAM or SAM file that pass specific criteria to a new file</li>
 </ul>
 </div>
-<div id="tools-for-somatic-and-germline-variant-identification" class="section level2" number="10.9">
-<h2><span class="header-section-number">10.9</span> Tools for somatic and germline variant identification</h2>
+<div id="tools-for-somatic-and-germline-variant-identification" class="section level2 hasAnchor" number="10.9">
+<h2><span class="header-section-number">10.9</span> Tools for somatic and germline variant identification<a href="whole-genome-or-exome-sequencing.html#tools-for-somatic-and-germline-variant-identification" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>These tools are used to identify either somatic or germline mutations from a sequenced sample. Many researchers will often use a combination of these tools to narrow down only variants that are identified using a combination of these analysis algorithms. All of these mutation calling tools except SvABA can be used on both WGS and WXS data.</p>
 <ul>
 <li><a href="https://gatk.broadinstitute.org/hc/en-us/articles/9570422171291-Mutect2">Mutect2</a> This is a beneficial variant calling tool with functions including using a “panel of normals” (samples provided by the user of many normal controls) to better compare disease samples to normal and filtering functions for samples with orientation bias artifacts (FFPE samples) called F1R2, which is explained in the link above.</li>
@@ -626,8 +620,8 @@ <h2><span class="header-section-number">10.9</span> Tools for somatic and germli
 <p>Researchers may want to create a consensus file based on the mutation calls using multiple tools above. <a href="https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/snv-callers">OpenPBTA-analysis</a> shows an open source code example of how you might compare and contrast different SNV caller’s results.</p>
 <p>For researchers who prefer GUI based platforms: <a href="https://www.genepattern.org/variant-and-copy-number-analysis#gsc.tab=0">Gene Pattern has a great set of variant based tutorials</a>. GenePattern is an open software environment providing access to hundreds of tools for the analysis and visualization of genomic data.</p>
 </div>
-<div id="tools-for-variant-calling-annotation" class="section level2" number="10.10">
-<h2><span class="header-section-number">10.10</span> Tools for variant calling annotation</h2>
+<div id="tools-for-variant-calling-annotation" class="section level2 hasAnchor" number="10.10">
+<h2><span class="header-section-number">10.10</span> Tools for variant calling annotation<a href="whole-genome-or-exome-sequencing.html#tools-for-variant-calling-annotation" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>These are beneficial for providing functional meaning to the mutational hits identified above.</p>
 <ul>
 <li><a href="https://annovar.openbioinformatics.org/en/latest/">Annovar</a> This is a helpful tool for annotating, filtering, and combining the output data from the above tools. It can be used for gene-based, region-based, or filter-based annotations.</li>
@@ -637,8 +631,8 @@ <h2><span class="header-section-number">10.10</span> Tools for variant calling a
 <li><a href="http://www.pvactools.org">pVACtools</a> supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions.</li>
 </ul>
 </div>
-<div id="tools-for-copy-number-variation-analysis" class="section level2" number="10.11">
-<h2><span class="header-section-number">10.11</span> Tools for copy number variation analysis</h2>
+<div id="tools-for-copy-number-variation-analysis" class="section level2 hasAnchor" number="10.11">
+<h2><span class="header-section-number">10.11</span> Tools for copy number variation analysis<a href="whole-genome-or-exome-sequencing.html#tools-for-copy-number-variation-analysis" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Similar to the mutation calling tools, many researchers will use several of these tools and investigate the overlapping hits seen with different copy number variant calling algorithms:</p>
 <ul>
 <li><a href="https://gatk.broadinstitute.org/hc/en-us/articles/360035531092--How-to-part-I-Sensitively-detect-copy-ratio-alterations-and-allelic-segments">GATK</a> GATK has a variety of tools that can be used to study changes in copy numbers of genes. This link provides a tutorial for how to use the tools.</li>
@@ -651,8 +645,8 @@ <h2><span class="header-section-number">10.11</span> Tools for copy number varia
 <li><a href="http://compbio.med.harvard.edu/BIC-seq/">BIC-seq2</a> This tool is used to detect CNVs with or without control samples. The steps involved in this data processing tool include normalization and CNV detection.</li>
 </ul>
 </div>
-<div id="tools-for-data-visualization" class="section level2" number="10.12">
-<h2><span class="header-section-number">10.12</span> Tools for data visualization</h2>
+<div id="tools-for-data-visualization" class="section level2 hasAnchor" number="10.12">
+<h2><span class="header-section-number">10.12</span> Tools for data visualization<a href="whole-genome-or-exome-sequencing.html#tools-for-data-visualization" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>These tools are often used in parallel to look at regions of the genome, develop plots, and create other relevant figures:</p>
 <ul>
 <li><a href="https://run.opencravat.org">OpenCRAVAT</a> uses variation data in many popular variant file formats and its outputs are variant annotations and visualizations.</li>
@@ -661,8 +655,8 @@ <h2><span class="header-section-number">10.12</span> Tools for data visualizatio
 <li><a href="https://www.graphpad.com/scientific-software/prism/">Prism</a> Prism is a widely used tool in scientific research for organizing large datasets, generating plots, and creating readable figures. WGS or WXS data regarding mutations and CNV can be used as input for creating plots with this tool.</li>
 </ul>
 </div>
-<div id="resources-for-wgs" class="section level2" number="10.13">
-<h2><span class="header-section-number">10.13</span> Resources for WGS</h2>
+<div id="resources-for-wgs" class="section level2 hasAnchor" number="10.13">
+<h2><span class="header-section-number">10.13</span> Resources for WGS<a href="whole-genome-or-exome-sequencing.html#resources-for-wgs" class="anchor-section" aria-label="Anchor link to header"></a></h2>
 <p>Online tutorials:</p>
 <ul>
 <li><a href="https://training.galaxyproject.org/training-material/topics/sequence-analysis/">Galaxy tutorials</a></li>
@@ -671,14 +665,14 @@ <h2><span class="header-section-number">10.13</span> Resources for WGS</h2>
 </ul>
 <p>Papers comparing analysis tools:</p>
 <ul>
-<li><span class="citation">(<a href="#ref-Hwang2019" role="doc-biblioref">Hwang et al. 2019</a>)</span></li>
-<li><span class="citation">(<a href="#ref-Naj2019" role="doc-biblioref">Naj et al. 2019</a>)</span></li>
-<li><span class="citation">(<a href="#ref-He2020" role="doc-biblioref">X. He et al. 2020</a>)</span></li>
+<li><span class="citation">(<a href="#ref-Hwang2019">Hwang et al. 2019</a>)</span></li>
+<li><span class="citation">(<a href="#ref-Naj2019">Naj et al. 2019</a>)</span></li>
+<li><span class="citation">(<a href="#ref-He2020">X. He et al. 2020</a>)</span></li>
 </ul>
 
 </div>
 </div>
-<h3>References</h3>
+<h3>References<a href="references.html#references" class="anchor-section" aria-label="Anchor link to header"></a></h3>
 <div id="refs" class="references csl-bib-body hanging-indent">
 <div id="ref-He2020" class="csl-entry">
 He, Xiaoyu, Shanyu Chen, Ruilin Li, Xinyin Han, Zhipeng He, Danyang Yuan, Shuying Zhang, Xiaohong Duan, and Beifang Niu. 2020. <span>“Comprehensive Fundamental Somatic Variant Calling and Quality Management Strategies for Human Cancer Genomes.”</span> <em>Briefings in Bioinformatics</em> 22 (3). <a href="https://doi.org/10.1093/bib/bbaa083">https://doi.org/10.1093/bib/bbaa083</a>.
@@ -691,10 +685,17 @@ <h3>References</h3>
 </div>
 </div>
 <hr>
-<center> 
+<center>
+<div class="container">
+  <iframe class="responsive-iframe" src="https://c-savonen.shinyapps.io/widget-survey/?course_name=choosing_genomics" style="width: 400px; height: 220px; overflow: auto;"></iframe>
+</div>
+  </div>
+</center>
+
+<hr>
+<center>
   <div class="footer">
       All illustrations <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY. </a>
-      <br>
       All other materials <a href= "https://creativecommons.org/licenses/by/4.0/"> CC-BY </a> unless noted otherwise.
   </div>
 </center>
@@ -764,7 +765,7 @@ <h3>References</h3>
     var script = document.createElement("script");
     script.type = "text/javascript";
     var src = "true";
-    if (src === "" || src === "true") src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-MML-AM_CHTML";
+    if (src === "" || src === "true") src = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.9/latest.js?config=TeX-MML-AM_CHTML";
     if (location.protocol !== "file:")
       if (/^https?:/.test(src))
         src = src.replace(/^https?:/, '');