CHESS_v2.0

alevar released this 20 Apr 17:22

· 31 commits to master since this release

3e143c9

Statement

Chess Release 2.0

Files

Filename	Content	Description
chess2.0.gff.gz	CHESS gene annotation	This file contains the primary gene set described in the CHESS paper, in GFF format. All genes and transcripts are mapped onto human genome release GRCh38.p8. Included in this file are genes on the reference chromosomes, unmapped scaffolds, assembly patches, and alternate loci.
chess2.0.genes	CHESS gene list	This file is a table showing all 43,162 genes in CHESS release 2.0, in a tab-delimited text file with one gene per line. For each gene it provides features such as gene ID, type, gene name, source of the annotation, location(s), GFF ID(s), and a free text description of the gene.
chess2.0.protein.fa.gz	CHESS proteins	This FASTA file contains the sequences of all the proteins translated from the CHESS protein-coding genes. For each gene locus that has more than one protein (e.g., splice variants), the longest protein sequence is provided.
chess2.0_assembly.gff.gz	Gene annotation for transcriptome assembly	This is a subset of the gene annotation GFF file (chess2.0.gff), containing annotations only on the reference chromosomes and the mitochondrion. It also includes the tRNA and rRNA gene annotations from RefSeq. We recommend using this file with transcriptome assemblers such as StringTie or Cufflinks.
chess2.0_and_refseq.gff.gz	CHESS plus RefSeq gene annotations	This is a superset of chess2.0.gff. It adds multiple other gene types annotated in Refseq that are not included in CHESS, such as pseudogenes, V_segements,C_segements,D_segements,J_segements, snoRNAs, snRNAs, telomerase RNAs, guide RNAs, etc. Note that many of these elements (e.g., pseudogenes) are not actually genes, but they are included here for users who want everything in RefSeq plus the additional genes in CHESS.

Summary

	genes	transcripts
protein_coding	21306	267478
lncRNA	18484	49314
other	3372	7035
total	43162	323827
novel_protein_coding	1178	1446
novel_lncRNAs	2268	2755

Assets 8