Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

editing static content for 114 #848

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions ensembl/htdocs/Mus_musculus_strains.inc
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,20 @@
<div class="column-left">
<h2>Assembly and annotation</h2>

<p>The Mouse Genomes Project is an ongoing effort to sequence the genomes of the common laboratory mouse strains, cataloguing all forms of molecular variation. They have produced <i>de novo</i> assemblies and strain-specific gene annotation for 16 laboratory and wild-derived strains (129S1/SvImJ, A/J, ARK/J, BALB/cJ, C3H/HeJ, C57BL/6NJ, CAST/EiJ, CBA/J, DBA/2J, FVB/NJ, LP/J, NOD/ShiLtJ, NZO/HlLtJ, PWK/PhJ, SPRET/EiJ, and WSB/EiJ) from a mixture of short- and long-range illumina libraries, optical maps, and third generation sequencing.</p>
<p>The Mouse Genomes Project [https://www.mousegenomes.org/] is an ongoing initiative to sequence and catalogue molecular variation across common laboratory mouse strains. Currently, high-quality reference genomes are available for 16 inbred strains (129S1/SvImJ, A/J, ARK/J, BALB/cJ, C3H/HeJ, C57BL/6NJ, CAST/EiJ, CBA/J, DBA/2J, FVB/NJ, JF1/MsJ, LP/J, NOD/ShiLtJ, NZO/HlLtJ, PWK/PhJ, and WSB/EiJ) [https://www.mousegenomes.org/reference-genomes/] created using a combination of short- and long-range Illumina libraries, optical maps, and third-generation sequencing data.</p>

<p>The strain-specific genome annotation was created by a combination of mapping over Gencode M8 mouse transcripts, strain-specific RNA-seq to refine mapped transcripts with Augustus, and Augustus CGP (comparative gene prediction) with strain-specific RNA-seq to annotate novel or private transcripts.</p>

<p>The assemblies and annotation were loaded into the Ensembl framework and additional analyses were run. These included: repeatmasking using Repeatmasker; <i>ab initio</i> gene predictions from Genscan; CpG island identification; prediction of transcription start sites using Eponine; tRNA predictions from tRNAscan; alignments of sequences from UniProt, UniGene and the ENA vertebrate RNA collection. Protein domains were annotated using InterProScan.</p>
</div>
<p>The strain-specific genome annotations were generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p>
</div>
</div>
<div class="column-two">
<div class="column-right">


<h2>Comparative analysis</h2>

<p>We generated a multiple genome alignment of all the genomes from the The Mouse Genomes Project with <i>Mus musculus</i>, <i>Rattus norvegicus</i> and additional four Mus species using our EPO pipeline. Additionally, we have computed a LastZ alignment of <i>Mus musculus</i> and <i>Mus spretus</i>.</p>

<p>We provide two sets of gene-trees and orthologues in Ensembl. The standard gene-trees and orthologues comprise genes from one representative for every Ensembl species, whilst the Murinae-specific gene-trees and orthologues comprise genes from all mouse strains and include genes from <i>Mus musculus</i>, <i>Mus spretus</i> and <i>Rattus norvegicus</i>. A stepwise approach via one these three species is required in order to compare genes from mouse strains to genes from species not in the Murinae set.</p>
Comment on lines 15 to 17
Copy link
Contributor

@twalsh-ebi twalsh-ebi Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion it is. Now it's my turn to ask @EreboPSilva and @ens-sb to check if the updated text tallies with your understanding.

Suggested change
<p>We generated a multiple genome alignment of all the genomes from the The Mouse Genomes Project with <i>Mus musculus</i>, <i>Rattus norvegicus</i> and additional four Mus species using our EPO pipeline. Additionally, we have computed a LastZ alignment of <i>Mus musculus</i> and <i>Mus spretus</i>.</p>
<p>We provide two sets of gene-trees and orthologues in Ensembl. The standard gene-trees and orthologues comprise genes from one representative for every Ensembl species, whilst the Murinae-specific gene-trees and orthologues comprise genes from all mouse strains and include genes from <i>Mus musculus</i>, <i>Mus spretus</i> and <i>Rattus norvegicus</i>. A stepwise approach via one these three species is required in order to compare genes from mouse strains to genes from species not in the Murinae set.</p>
<p>Using our EPO pipeline, we generated a multiple genome alignment of 16 of the
reference-quality genomes from the The Mouse Genomes Project with <i>Mus
musculus</i>, <i>Rattus norvegicus</i> and an additional three <i>Mus</i>
species&colon; <i>Mus caroli</i>, <i>Mus pahari</i> and <i>Mus
spicilegus</i>. Additionally, we have computed a LastZ alignment of <i>Mus
spretus</i> and the three additional <i>Mus</i> species against the <i>Mus
musculus</i> reference genome.</p>
<p>We provide multiple sets of gene-trees and orthologues in Ensembl, two of
which include genes from a mouse genome. The standard gene-trees and
orthologues comprise genes from representatives of selected Ensembl species,
whilst the Murinae-specific gene-trees and orthologues comprise genes from
all mouse strains and include genes from <i>Mus musculus</i>, <i>Rattus
norvegicus</i>, <i>Mus spretus</i> and the three aforementioned <i>Mus</i>
species. A stepwise approach via one of these six species is required in
order to compare genes from mouse strains to genes from species not in the
Murinae set.</p>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twalsh-ebi ! Looks good to me!


<p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
</div>
</div>
</div>
6 changes: 1 addition & 5 deletions ensembl/htdocs/ssi/species/Bos_taurus_annotation.html
Original file line number Diff line number Diff line change
@@ -1,5 +1 @@
<p>The gene annotation process was carried out using a combination of protein-to-genome alignments, annotation mapping from a suitable reference species and RNA-seq alignments (where RNA-seq data with appropriate meta data were publicly available). For each candidate gene region, a selection process was applied to choose the most appropriate set of transcripts based on evolutionary distance, experimental evidence for the source data and quality of the alignments.</br>Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold.</br> Pseudogenes were calculated by looking at genes with a large percentage of non-biological introns (introns of &lt;10bp), where the gene was covered in repeats, or where the gene was single exon and evidence of a functional multi-exon paralog was found elsewhere in the genome.</br> lincRNAs were generated via RNA-seq data where no evidence of protein homology or protein domains could be found in the transcript.</p>
<ul>
<li><a href="/info/genome/genebuild/2023_11_Bos_taurus_gene_annotation.pdf">Detailed information on the genebuild</a> (PDF)</li>
</ul>
<p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
<p>Genome annotation was generated using the <a href="https://beta.ensembl.org/help/articles/vertebrate-genome-annotation">Ensembl vertebrate annotation pipeline</a>. </p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
2 changes: 1 addition & 1 deletion ensembl/htdocs/ssi/species/Bos_taurus_assembly.html
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<p>The ARS-UCD1.3 assembly was submitted by Usda Ars on May 2022. The assembly is on chromosome level, consisting of 2,343 contigs assembled into 1,957 scaffolds. From these sequences, 30 chromosomes have been built. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 25,896,116 while the scaffold N50 is 103,308,737.</p>
<p>The ARS-UCD2.0 assembly was submitted by USDA ARS and last updated on 2024-08-16. The assembly is on the chromosome level, consisting of 2344 contigs assembled into 1958 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 26402946 while the scaffold N50 is 103308737.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated using the <a href="https://beta.ensembl.org/help/articles/vertebrate-genome-annotation">Ensembl vertebrate annotation pipeline</a>. </p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The Felis_catus_9.0 assembly was submitted by Genome Sequencing Center (GSC) at Washington University (WashU) School of Medicine and last updated on 2019-11-21. The assembly is on the chromosome level, consisting of 4908 contigs assembled into 4524 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 41915695 while the scaffold N50 is 83967707.</p>
8 changes: 1 addition & 7 deletions ensembl/htdocs/ssi/species/Felis_catus_annotation.html
Original file line number Diff line number Diff line change
@@ -1,7 +1 @@
<p>The gene annotation process was carried out using a combination of protein-to-genome alignments, annotation mapping from a suitable reference species and RNA-seq alignments (where RNA-seq data with appropriate meta data were publicly available). For each candidate gene region, a selection process was applied to choose the most appropriate set of transcripts based on evolutionary distance, experimental evidence for the source data and quality of the alignments.</br>Small ncRNAs were obtained using a combination of BLAST and Infernal/RNAfold.</br> Pseudogenes were calculated by looking at genes with a large percentage of non-biological introns (introns of &lt;10bp), where the gene was covered in repeats, or where the gene was single exon and evidence of a functional multi-exon paralog was found elsewhere in the genome.</br> lincRNAs were generated via RNA-seq data where no evidence of protein homology or protein domains could be found in the transcript.</p>

<ul>
<li><a href="/info/genome/genebuild/2019_09_felis_catus_gene_annotation.pdf">Detailed information on cat genebuild</a> (PDF)</li>
</ul>

<p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
<p>Genome annotation was generated using the <a href="https://beta.ensembl.org/help/articles/vertebrate-genome-annotation">Ensembl vertebrate annotation pipeline</a>. </p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
2 changes: 1 addition & 1 deletion ensembl/htdocs/ssi/species/Felis_catus_assembly.html
Original file line number Diff line number Diff line change
@@ -1 +1 @@
<p>The Felis_catus_9.0 assembly was submitted by Genome Sequencing Center (GSC) at Washington University (WashU) School of Medicine on November 2017. The assembly is on chromosome level, consisting of 4,909 contigs assembled into 4,525 scaffolds. From these sequences, 19 chromosomes have been built. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 41,915,695 while the scaffold N50 is 83,967,707.</p>
<p>The F.catus_Fca126_mat1.0 assembly was submitted by Texas A&M University and last updated on 2021-10-22. The assembly is on the chromosome level, consisting of 109 contigs assembled into 70 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 90731473 while the scaffold N50 is 148491486.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for 129S1/SvImJ was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 1110 contigs assembled into 180 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 5000807 while the scaffold N50 is 126792426.</p>
1 change: 1 addition & 0 deletions ensembl/htdocs/ssi/species/Mus_musculus_aj_annotation.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
1 change: 1 addition & 0 deletions ensembl/htdocs/ssi/species/Mus_musculus_aj_assembly.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for A/J was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 488 contigs assembled into 158 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 16959821 while the scaffold N50 is 127192181.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
1 change: 1 addition & 0 deletions ensembl/htdocs/ssi/species/Mus_musculus_akrj_assembly.html
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for AKR/J was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 1334 contigs assembled into 195 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 4609526 while the scaffold N50 is 126857991.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for BALB/cJ was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 424 contigs assembled into 210 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 35826829 while the scaffold N50 is 127177172.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for C3H/HeJ was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 752 contigs assembled into 171 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 8562414 while the scaffold N50 is 127227482.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for C57BL/6NJ was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 798 contigs assembled into 208 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 8304515 while the scaffold N50 is 126851431.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>Genome annotation was generated by mapping GENCODE M30 [https://www.gencodegenes.org/mouse/release_M30.html] genes and transcripts via the Ensembl Human automated annotation system [https://beta.ensembl.org/help/articles/human-genome-automated-annotation], supplemented by methods from the Ensembl vertebrate annotation pipeline [https://beta.ensembl.org/help/articles/vertebrate-genome-annotation]. Mapped GENCODE structures served as the primary evidence with gaps in the annotations filled using aligned short-read transcriptomic data and full-length transcripts derived from PacBio IsoSeq long-read data.</p><p>In accordance with the <a href="https://en.wikipedia.org/wiki/Fort_Lauderdale_Agreement">Fort Lauderdale Agreement</a>, please check the publication status of the genome/assembly before publishing any genome-wide analyses using these data.</p>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<p>The assembly for CAST/EiJ was generated as part of The Mouse Genomes Project [https://www.mousegenomes.org/], additional strains can be found in Ensembl [https://www.ensembl.org/Mus_musculus/Info/Strains].</p><p>The assembly is on the chromosome level, consisting of 369 contigs assembled into 147 scaffolds. The N50 size is the length such that 50% of the assembled genome lies in blocks of the N50 size or longer. The N50 length for the contigs is 32955239 while the scaffold N50 is 126140585.</p>
Loading