Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene2transcripts API: Genome Assembly Filter #515

Open
Sophiaj93 opened this issue Jul 13, 2023 · 11 comments
Open

gene2transcripts API: Genome Assembly Filter #515

Sophiaj93 opened this issue Jul 13, 2023 · 11 comments

Comments

@Sophiaj93
Copy link

Is your feature request related to a problem? Please describe.
I am trying to use the API to retrieve exon genomic start/end coordinates (for a specific genome assembly) for MANE transcripts.

Describe the solution you'd like
It would be useful to add the ability to filter on required genome assembly as part of the API request (and/or specify which genome assembly the coordinates correspond to in the API response JSON)

Describe alternatives you've considered
Manually searching the transcript IDs in RefSeq to find the associated genome assembly.

Additional context
With this API call I am hoping to generate a list of exon genomic start/end coordinates that I could then write to a BED file

@ifokkema
Copy link
Collaborator

We (LOVD) solve this by looking up the given NC refseq in a small dictionary that contains refseqs and genome builds. That is good enough for us, but since VV has the information on NC-to-genome-build, I imagine you'd want it included in the output. Until then, it's easy to work around it using a small dictionary.

@leicray
Copy link
Contributor

leicray commented Jul 16, 2023

Your requirement is to use "...the API to retrieve exon genomic start/end coordinates (for a specific genome assembly) for MANE transcripts." As far as I am aware, mappings for MANE transcripts are only comprehensively maintained for GRCh38. Some limited mapping data for old versions of MANE can be found for GRCh37: http://tark.ensembl.org/web/mane_GRCh37_list/.

If we were to implement retrieval of exon start/stop coordinates via the API, it would probably have to be only for GRCh38. Support for GRCh37 might prove to be problematic.

UPDATE: I have looked again at your original request and it looks like you would like our API to output exon genomic start/end coordinates to allow you to use the data for some other purpose. Unless output of these data provided enhanced functionality for normal validation of sequence variants, it is unlikely that we prioritise such a request.

@ifokkema
Copy link
Collaborator

UPDATE: I have looked again at your original request and it looks like you would like our API to output exon genomic start/end coordinates to allow you to use the data for some other purpose. Unless output of these data provided enhanced functionality for normal validation of sequence variants, it is unlikely that we prioritise such a request.

Maybe I misunderstand the request, but the gene2transcripts API endpoint already provides genomic start/stop locations of exons for input genes and transcripts. So, it's a feature that already exists? The only issue that I see compared to the request is that the output contains NC IDs instead of genome-build identifiers. That's fine by us, but I assume @Sophiaj93 meant she'd like to see those in the output, too.

@Sophiaj93
Copy link
Author

Sophiaj93 commented Jul 25, 2023

Thanks both. Yes, being able to see the genome-build identifiers in the output as well as the NC IDs would be useful and solve my problem. This is something i've discussed briefly with @Peter-J-Freeman as part of an MSc project at Manchester Uni.

@Peter-J-Freeman
Copy link
Collaborator

Sorry for the slow responses @Sophiaj93 . As you know I have been slammed with teaching material development.

I have developed an update to the API v2 version of genes to transcripts. The input can now be a list of genes "|" delimited. You can now also filter by transcript ID, or the key filters described in the Swagger docs. You can also now filter by genome build :)

Data are returned in list format.

Will be live and ready for testing by the end of the day

@Peter-J-Freeman
Copy link
Collaborator

@Sophiaj93 Code is now live, ready for testing. https://rest.variantvalidator.org/
gene2transcripts_v2

@ifokkema
Copy link
Collaborator

I believe this adds required fields to the API endpoint, right? If so, this breaks existing implementations. Luckily, I'm not using v2 yet of this function, but updates like this require an API with versioning. Have you checked the server logs for calls to gene2transcripts_v2?
#128

@Peter-J-Freeman
Copy link
Collaborator

This endpoint is still in dev, so not yet fixed. Just haven't had time to maintain a dev server recently. But will check. Very much doubt its being used though. Good point, thanks

Still need to implement the API versioning. On the to do list.

@ifokkema
Copy link
Collaborator

Ah, I see. Is there any documentation or annotation on the Swagger UI on what endpoints are in dev and, therefore, can change at any given moment?

@Peter-J-Freeman
Copy link
Collaborator

That a good idea. No need in this case because I will fix by the end of the month and I think already fixed now, but good plan!

@Sophiaj93
Copy link
Author

Hi Pete,

Sorry for the huge delay in looking at this. Thanks again for adding these features!

I've just updated my code based on the new version of the gene2transcripts_v2 endpoint and all is working well. The genome build filter in particular is really helpful.

I haven't properly implemented the "|" delimited genes list in my code yet but have tried it out via the URL. Can't see any issues.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants