Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Guidelines for Defining Data Package Replication #88

Open
clnsmth opened this issue Oct 9, 2024 · 2 comments
Open

New Guidelines for Defining Data Package Replication #88

clnsmth opened this issue Oct 9, 2024 · 2 comments

Comments

@clnsmth
Copy link
Contributor

clnsmth commented Oct 9, 2024

Hi everyone,

We're excited to announce the release of new guidelines for defining data package replication between EDI and other repositories. These guidelines offer solutions for describing replication at both the data package and data entity levels.

To learn more about the release and access the guidelines, please check out the following resources:

These guidelines may be good additions to the data packaging best practices.

Thanks!

@twhiteaker
Copy link
Collaborator

What if we're just replicating metadata? Previously we've tried this when attempting to replicate metadata from EDI to Arctic Data Center.

Option 1: Add a snippet of XML into additionalMetadata:

<additionalMetadata>
    <metadata>
      <d1v1:replicationPolicy xmlns:d1v1="http://ns.dataone.org/service/types/v1" numberReplicas="1"
        replicationAllowed="true">
        <preferredMemberNode>urn:node:ARCTIC</preferredMemberNode>
      </d1v1:replicationPolicy>
    </metadata>
  </additionalMetadata>

Option 2: manual process.

  1. The dataset must be synced and indexed at search.dataone.org. If you search
    for "knb-lter-ble" and find the dataset, then it is indexed. Syncing is something
    EDI manages, but sometimes the process lags, so if you notice something isn't
    synced after a couple of weeks, contact EDI to see what's going on.
  2. Once the dataset is synced to DataONE, the BLE information manager must
    provide the DOI of the dataset to ADC so they can harvest the metadata.

Option 1 hasn't worked for a few years. And a manual process like option 2 isn't ideal. Should we just replicate the whole dataset? Or is there a way to just replicate metadata with semantic annotation?

@clnsmth
Copy link
Contributor Author

clnsmth commented Oct 22, 2024

Thanks for your questions, @twhiteaker .

Regarding replicating metadata, I don’t believe this is currently possible because there’s no "subject" element in the EML record that could be used in semantic annotation to references itself. However, it’s possible that I may have missed something.

The second challenge is identifying a suitable "object" to reference. One option is using the URL of the metadata record, but this is less than ideal since URLs can change. Ideally, there would be a DOI for the metadata record that could be referenced, which would provide a more stable identifier. This issue is similar to describing entity-level replication within the EDI repository. Since we don’t assign DOIs to individual data entities, the best we can do is reference the data entity’s URL (as seen here).

Even if we overcame the above issues, we’d still face a "chicken and egg" problem: the user would need to know the DOI of the data package before it’s published in order to assign it in the metadata using the schema:sameAs annotation to reference itself. Since this isn’t possible from EDI’s side, the destination repository could add a sameAs reference to the replicated content it hosts. That said, perhaps ADC handles this differently? For example, the “Data Set Publishers” field on their data package landing page lists EDI as the publisher of the content.

As for whether there’s a semantic annotation mechanism to facilitate data replication—no, not at this time. The methods you mentioned are the only ones we’re aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants