-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Df dev with cfgURL and datacatalog in release graph #57
base: dev
Are you sure you want to change the base?
Conversation
Added support for loading configuration files from a URL and improved data processing by implementing comprehensive Skolemization and graph association. Enhanced bulkLoader with a new flag for archiving, updated documentation, and incremented the version.
Replace standard log package with logrus for enhanced logging capabilities across the project. Update RDF metadata generation to include dynamic timestamps and bucket-derived names, improving the accuracy and relevance of generated data descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets work to avoid blind nodes, since we control the datacatalog generation. and use our own urns.
internal/objects/pipecopy.go
Outdated
// Once we are done with the loop, put in the triples to associate all the graphURIs with the org. | ||
if lastProcessed { | ||
|
||
data := `_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a urn?
urn:gleaner.io:{SOURCE}:datacatalog
Avoid dangling triples.
internal/objects/pipecopy.go
Outdated
data := `_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> . | ||
_:b0 <https://schema.org/dateCreated> "` + time.Now().Format("2006-01-02 15:04:05") + `" . | ||
_:b0 <https://schema.org/description> "GleanerIO Nabu generated catalog" . | ||
_:b0 <https://schema.org/provider> _:b1 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here...
urn:gleaner.io:{IMPLNET}:provider
internal/objects/pipecopy.go
Outdated
_:b0 <https://schema.org/dateCreated> "` + time.Now().Format("2006-01-02 15:04:05") + `" . | ||
_:b0 <https://schema.org/description> "GleanerIO Nabu generated catalog" . | ||
_:b0 <https://schema.org/provider> _:b1 . | ||
_:b0 <https://schema.org/publisher> _:b2 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urn:gleaner.io:{source}:organization
Nabu can also read the configuration file from over the network | ||
|
||
``` | ||
go run ../../cmd/nabu/main.go release --cfgURL https://provisium.io/data/nabuconfig.yaml --prefix summoned/dataverse --endpoint localoxi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nabu release --cfgURL https://provisium.io/data/nabuconfig.yaml --prefix summoned/dataverse --endpoint localoxi
I can make those changes, but the step at line 145 nabu/internal/objects/pipecopy.go Lines 144 to 146 in c897b6f
does Skolemize this and remove all blank nodes. It doesn't make a formal URN like your suggestion does though. In my approach the SPARQL would need to look for type DataCatalog with a schema:provider scheme:publisher property values text. In yours it would be the same but with by a URN/IRI. Happy to make the change though. I need to pull the IMPLNET... which is nicer than the bucket name. |
Yes, just looking for consistency, rather than randomness. Also, makes it easier to find the exact catalog for a source if it has a consistent ID, which is what we want to do. |
Enhanced the PipeCopy function to include logging of organization name and generation of named graphs with unique URIs for RDF datasets based on organization names. Included a helper function to generate date-based SHA256 hashes to ensure unique graph URIs.
here is what I have now. Noticed these were triples, not quads, so fixed that.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think pattern needs to be
urn:gleaner.io:ORG:SOURCE:object
if lastProcessed { | ||
|
||
data := `<urn:gleaner.io:` + orgname + `:datacatalog> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> . | ||
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/description> "GleanerIO Nabu generated catalog" . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urn:gleaner.io:ORG:SOURCE:datacatalog
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/publisher> <urn:gleaner.io:` + getLastElement(prefix) + `:publisher> . | ||
<urn:gleaner.io:` + orgname + `:provider> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> . | ||
<urn:gleaner.io:` + orgname + `:provider> <https://schema.org/name> "` + orgname + `" . | ||
<urn:gleaner.io:` + getLastElement(prefix) + `:publisher> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urn:gleaner.io:ORG:SOURCE:publisher
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/provider> <urn:gleaner.io:` + orgname + `:provider> . | ||
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/publisher> <urn:gleaner.io:` + getLastElement(prefix) + `:publisher> . | ||
<urn:gleaner.io:` + orgname + `:provider> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> . | ||
<urn:gleaner.io:` + orgname + `:provider> <https://schema.org/name> "` + orgname + `" . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is one provider, this is good
urn:gleaner.io:eco:provider
Dave, Here is the nabu PR with the addition of a cfgURL option.
so like
Also in the PR (sorry for overloading, I did it all in the same branch) is the "datacatalog" bit where I put in a datacatalog in the release graphs with all the named names in it.
feel free to push back for changes improvements...
example