-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing data in human EnsDb v112? #155
Comments
I found a terrorible mistake... I understand that liftover can take care of it. But I still prefer to use GRCh38 version... Do you have any idea how to change the annotation database to GRCh38 version? Update:
Currently, EnsemblDB v112 is not available for GRCh38. Maybe this will affect the output? |
As a result, using the latest EnsemblDB v112, I could map fewer genes.
But using EnsemblDB v111, I could get most of my genes.
Is there anyway to fix it? |
This is a bit surprising. I will dig a bit into this. I do not specifically set any Genome build version (AFAIK) but simply querying data from Ensembl using the perl API and putting that into SQLite databases (that are shared through |
Uh, yes, Ensembl release 112 has two core databases, one for GRCH37 and one for GRCH38. My automatic script picked up the first, which happened to be the GRCH37 :( . I'll create the correct version and fix/update the file in AnnotationHub. |
Thanks for having a look at this discrepancy, and finding and fixing the source of it so quickly. :) Since processing of files on the |
Sure - I'm just at a conference at present and bandwidth is limited ... so all takes a bit longer. but I'll post a link here once I'm done (seems only the human annotations were affected, so I'll replace that) |
Thank you so much! |
thanks for spotting this issue! I was not aware (and was not expecting) that Ensembl creates now two flavors of annotations, one for GRCH37 and one for GRCH38! |
so, the |
Thanks for the updated file. Updated database:
|
Also obtained expected results on the level of transcripts, so as far as I am concerned the updated database is correct. @zyh4482 : what about the outcomes of your analyses? |
For my dataset, I got:
At this stage, everything looks perfect. Previously, I used v111 for my downstream analysis. I found some discrepancies regarding to unmapped genomic coordinates. I'll update my results later. |
I finished my analysis. This database is perfect to use. Thanks again! @jorainer |
Just to update: AH (both the 3.19 release and devel branches) should now also provide the database with the correct genome release. |
Hi Johannes,
When working with the human
EnsDb
version 112 I obtained an unexpected result. Specifically, it seems a substantial number of genes (and transcripts) are not included in the currentEnsDb
. This seems not to be the case withEnsDb
version 111 (the previous one). Moreover, when manually checking some of the 'missing' genes I found that they are still listed/included on the v112 Ensembl website. See below for details.Question: did I oversee something and is this expected behavior, or is indeed something wrong?
Also note:
| No. of genes
differs a lot betweenEnsDb
v111 and v112 (72035 and 64102). Idem for| No. of transcripts
(278721 and 215647).Also note 2:
EnsDb
v112 has tag|genome_build: GRCh37
, but is should beGRCh38
(like v111), right?Thanks for your feedback!
G
Background:
For a recent sequencing experiment (human samples) I mapped the reads to the latest GENCODE release (= release 46) using the tool
salmon
. Next I imported all counts in R/Bioconductor usingtximeta
. So far, so good!To add annotations I use the
EnsDb
. GENCODE release 46 corresponds to Ensembl-release 112. I thus downloaded humanEnsDb
v112 from theAnnotationHub
. See: #154 (comment)Yet, after annotating all genes I noticed I 'lost' quite a lot of genes. This is unexpected, because this did not happen before. Moreover, this is not the case when using
EnsDb
v111 for annotating.Code to illustrate this:
Yet, these not-annotated genes are listed on the ensembl website...!? (despite likely being pseudo-genes etc).
ENSG00000273497 link
ENSG00000273512 link
ENSG00000293594 link
ENSG00000293600 link
The text was updated successfully, but these errors were encountered: