Skip to content

Release 3.1.0

Compare
Choose a tag to compare
@alevar alevar released this 20 Jun 06:11
· 11 commits to master since this release

This release addresses several major and minor inconsistencies in the formatting of the CHESS annotation.

Changelog

  1. Introducing gene features to the GFF3 files, full with RefSeq descriptors
  2. All valid and complete ORFs now include the stop codon in the CDS coordinates. Some transcripts have been extended up to 3 positions to include the missing stop codon
  3. Fixed duplicated gene IDs on the CHM13 version of the annotation. Gene copies identified by LiftOff are now assigned their own CHESS ID and the LiftOff metadata is stored in the auxiliary tags
  4. Protein sequences based on the CHM13 genome sequence are now also included
  5. Minor improvements to the comment lines

Statement

Chess Release 3.1.0

Files

Filenames Genome Content Description
chess3.1.0.GRCh38.gff.gz, chess3.1.0.GRCh38.gtf.gz, chess3.1.0.GRCh38.bb.gz GRCh38 CHESS gene annotation This file contains the primary gene set described in the CHESS paper. All genes and transcripts are mapped onto human genome release GRCh38.p12. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess3.1.0.CHM13.gff.gz, chess3.1.0.CHM13.gtf.gz, chess3.1.0.CHM13.bb.gz CHM13 CHESS gene annotation on CHM13 This file contains the primary gene set described in the CHESS paper mapped over to the CHM13 human reference genome.
chess3.1.0.GRCh38.primary.gff.gz, chess3.1.0.GRCh38.primary.gtf.gz GRCh38 CHESS gene annotation excluding alternative scaffolds This file contains the primary gene set described in the CHESS paper but excludes annotations of any alternative scaffolds. All genes and transcripts are mapped onto human genome release GRCh38.p12.
chess3.1.0.GRCh38.protein.fa.gz GRCh38 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the GRCh38 human reference genome.
chess3.1.0.CHM13.protein.fa.gz CHM13 CHESS proteins This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes based on the CHM13 human reference genome.
chess3.1.0.mapfile.tsv - Cross-Reference This tab-separated file contains a list of transcript identifiers in CHESS 3.1.0 along with the corresponding identifiers in other popular databases (RefSeq, GENCODE, CHESS2) .
assembled.gtf.gz GRCh38 Assembled Transcripts Noise-filtered set of assembled GTEx transcripts used to generate the final CHESS dataset.

Summary

genes transcripts
protein_coding 19838 99201
lncRNA 17624 34709
pseudogene 16774 17263
other 4269 7190
alt_scaffolds 5250 10088