-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice for using DRS and Data Connect together #394
Comments
Three suggestions:
|
To respond to Ian's comment 1, we now have DRS bulk so hopefully item 1 is solved. Agree with 2 Agree with DRS URIs Do we have bi-directional links between a DRS object and a Data Connect query? Do we have info in the service-info about the Data Connect server linked to this DRS server? |
I (still) strongly agree with the premise above that "No new APIs (or API changes for DRS) are needed. Instead, we should add an appendix to the DRS spec documenting best practices for building systems that use DRS and care about metadata." And now that compound objects are well-documented in the spec, I don't think we need to say more about how to handle them with Data Connect. Therefore, the simplest thing that could possibly work is to add a few sentences to the DRS doc saying roughly:
And maybe add:
Or:
|
link to the issue where we have been discussed this: #336 (comment) |
@dglazer re: "There may be use cases where users find a DRS id without any context. If so, they could look up the object in the catalog to understand what that's an ID for."
|
@bheavner the main idea is that you do not search The discoverability track is huge, but it's better done via FHIR, Data Connect, Cohort Portals that provides a list of DRS Uris at the end. Usually the search is like "give me all the files for patients that have this disease with these conditions". Discovering DRS_uris per se does not make too much sense. |
@mattions - oh, I certainly agree that DRS shouldn't do the work of FHIR! I don't mean to solve the problem of discoverability. Instead, I was hoping there might be a way to include a breadcrumb for a receiving system to know where to go for more context about the bytes the DRS URI is pointing to. That could be something like a FHIR endpoint, or a landing page, or a homebrew API from some external system that can resolve DRS URIs and provide authorization information as required by the spec. Informally, a conversation like: System executing a workflow: "please resolve DRS://FOO" Data hosting system: "DRS://FOO points to file BAR in cloud location GS://BAT. It requires this kind of authorization token/passport/credentials: MAGIC_KEY. You can learn more about it at: ENDPOINT_URI, which is a FHIR_API." System executing a workflow: "Thanks. Here's my MAGIC_KEY. Please give me file BAR from location GS://BAT. (p.s. I'll be sure to pass that ENDPOINT_URI along to my user interface and record it in my log for provenance tracking purposes.)" That sentence "You can learn more about it at: ENDPOINT_URI, which is a FHIR_API" is the one that I mean to propose, and wonder if it might benefit the spec by giving some flexibility and increase opportunity to link data with the context of that data. |
p.s. Perhaps a key:value approach to the "you can learn more about it at" bit could also enable something like "This DRS URI is being included in a list of DRS URIs that are gathered in response to SEARCHQUERY_REFERENCE (of some sort)" |
@bheavner yes, I'm 100% aligned with that. The comment on #336 (comment) was exactly hinting at that. The idea was to include something like:
where on the system I work with, these tend to be in FHIR system, however we can have a list with an enumeration, so people could implement how to navigate the one that are more adopted (I have the example of dataconnect, but I do not have anything with that for example) We could call it |
@mattions yes - that word "metadata" is always causing problems! Let's avoid it. Sounds like we're on the same page then, thanks for following up. Given that this is fairly early stage and in a brainstorming/conversation phase, do you think it's a better approach to have a general "additional_info" field and see what people do with it or how it grows (potentially splitting into more specific kinds of additional information at some point in the future), or is it better to limit it a bit? If someone wants to use "additional_info" to link to some sort of search provenance information, would that be good? Personally, I like the approach currently proposed - enumerated values for additional_info, but we can always expand the allowable values, and perhaps it could be refined at some point in the future if people are actually using it for real functionality. |
In the Cloud WS meeting on Aug 12th, 2024 we decided to just add text to the spec that you should have a catalog, such as Data Connect. And have that be sufficient for DRS 1.5. As a result I'll merge #406 |
Background
Feature Branch: https://github.com/ga4gh/data-repository-service-schemas/tree/feature/issue-394-drs-plus-connect-docs-v1
I'm opening this issue based on followup to the April 20th, 2023 GA4GH Connect meeting "DRS and Data Connect" session. This session looked at exploring how standards from the Cloud and Discovery work streams can be used together to identify the two needs identified in the aims listed below:
Some resources of interest:
Key Takeaways from GA4GH Connect
Metadata + DRS
We agreed that best practices for working with metadata were important, and largely agreed on two guiding principles:
Compound Objects
We agreed with the way the DRS 1.3.0 develop branch frames the need for compound object support:
We discussed two possible ways to represent and retrieve compound object contents, but didn’t have time to discuss their tradeoffs:
Goal for this Issue
This issue is to give us a place to discuss the use of Data Connect and DRS together (and link PRs to). The immediate goal of this Issue is to get a corresponding PR that addresses the best practice of using Data Connect together with DRS to provide 1) more metadata about DRS objects and 2) a scalable alternative to bundles. The intention is a documentation only change with a best practice appendix to the DRS spec.
The text was updated successfully, but these errors were encountered: