-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Will the API offer an alias to digest conversion endpoint? #4
Comments
The big issue for me is what we mean by
In all likeliness all of the above is the right answer, since GRC patches are additive it means to refer to p13 is to refer to all prior, but since patches are frequently released you want to use the "tag". Also it means GRCh38 applies equally to all of these. So there is an imprecise query coming in "I want the assembly that refers to hg38" which we cannot give an exact answer to because seqcol is going to be very precise about what you're going to work with. |
This is the reverse lookup use case and similar to the discussion with refget reverse lookup workstream so I guess I can add my current thinking here:
We could specify in the |
I think this is the right way to think about the issue so we can combine our thinking for sequence reverse lookup and this. Having this be an implementation specific issue is a good way around the problem, but I do think any service that's worth its salt will register all known aliases. The bigger problem now will be how to handle the ambiguity and pass back the "correct" and precise collection or sequence from an imprecise query. I don't think that's this API's business but something that'll have to be an out of scope manual curation process. Though I can see someone from a genome provider like UCSC, Ensembl or INSDC making those calls. |
Well, if it doesn't want to make an authoritative claim on what a human readable alias means it would pass back all the possible matches. If or if it does want to make an authoritative claim, it would pass back just the one it claims is the match. |
I think this issue can now be addressed with bespoke attributes and the For examples one could have an attribute that will record the assembly accession with the following schema properties:
assembly_accession:
type: string
collated: false
description: "CURIE of the accession given to this assembly by a specified naming authority using the format <naming_authority>:<accession>"
ga4gh:
passthru:
- assembly_accession Because the passthru attribute CAN (as long as the implementation supports it) be used in the |
But would this only work if there was a single alias? Or would this work if a given seqcol had a list of aliases? I was imagining you'd want a list of aliases, which I think would mean the |
I guess it will be a simple extension to just add that implementations MAY support |
One of the use cases brought up was this. What if a user wants to get the sequence collection checksum(s) from either the name of the collections (e.g. grch38).
We determined that Sequence collections should be congruent with the approach taken by refget in terms of allowing human-readable alias-based queries.
In this issue: samtools/hts-specs/issues/329 it seems clear that refget was not intended to do this.
@andrewyatz says:
In light of this, I'd propose the seqcol spec specifically not provide endpoints that operate on human-readable aliases.
On the other hand, 'chr1' is a much more universal identifier than something like 'hg38', so perhaps there is some value in returning a list of identifiers that include "hg38" under "aliases".
The text was updated successfully, but these errors were encountered: