Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dependency retrievals #75

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ajnelson-nist
Copy link
Member

No description provided.

@ajnelson-nist
Copy link
Member Author

Well, there's an unfortunate discovery in this PR. rdf-toolkit 1.10.0 (the prior version) reaches a looping steady state when asked to re-normalize a normalized file. We have not seen this to date because the Make-based workflows would generate an un-normalized file, which rdf-toolkit would then normalize once. Normalizing twice is not guaranteed to generate the same results...but the 4th re-run will.

By some strange luck, one of the files in this repository (examples/illustrations/exif_data/exif_data_validation-unstable.ttl) loops through three states when repeatedly run through rdf-toolkit:

SHA2-256 State description
07b8cc3600bf6ad5be4db87c87bc119... Checked out state
39afb357b6a8cccc0fd691bd7e7588a... First run of pre-commit
85a9c0203a846a8e67ce3a99b451a48... Second run of pre-commit
07b8cc3600bf6ad5be4db87c87bc119... ...and back.
39afb357b6a8cccc0fd691bd7e7588a... ...looping.

(I didn't calculate the hashes or loop-lengths for the three other files where this occurs.)

Unfortunately, the behavior also occurs with rdf-toolkit 1.11.0 (today's current version).

We'll need to think of how to deal with this.

Cc: @kchason

@ajnelson-nist
Copy link
Member Author

I've tried a potential resolution to this issue: attempting to guarantee normalization only "runs once". First, de-normalize the input file by round-tripping it through something aside the turtle--turtle rdf-toolkit normalization. Then, normalize the de-normalized output with rdf-toolkit.

Unfortunately, through several de-normalizing strategies, this consistently didn't work as well as the Make process had.

  1. Round-trip with rdflib to Turtle - same problem (albeit, shorter state cycles), where rdf-toolkit is sensitive to sort order, and rdflib is as well until Issue 1890 is resolved.
  2. Round-trip with rdflib to RDF-XML - same problem, with the added defect of the xsd: prefix getting lost.
  3. Round-trip with rdflib to JSON-LD - same problem, with extra issues with namespace management that will be resolved with the next rdflib release (Issue 1679).
  4. Round-trip with rdf-toolkit to RDF-XML - won't work due to issue with XML Entities.
  5. Round-trip with rdf-toolkit to JSON-LD - namespace prefixes are lost and namespace-prefix associations are broken, at least with rdf-toolkit 1.10.0.
  6. (No other output options are available for rdf-toolkit, and Turtle-Turtle was the initial case.)

I'm currently unsure if there is a resolution available aside from working with the rdf-toolkit developers to fix the sorting bug.

ajnelson-nist added a commit to casework/CASE-Implementation-PROV-O that referenced this pull request Jun 23, 2022
An issue was discovered with rdf-toolkit being used multiple times in
succession on a file, via testing with the CASE-Examples repository.
To avoid this issue causing an impact in this repository, the
rdf-toolkit action is being reverted.

References:
* casework/CASE-Examples#75

Signed-off-by: Alex Nelson <[email protected]>
@ajnelson-nist
Copy link
Member Author

This PR should be revisited on resolution of RDF Toolkit Issue 49.

@ajnelson-nist ajnelson-nist marked this pull request as draft March 14, 2024 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant