Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Df dev with cfgURL and datacatalog in release graph #57

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from
Open

Conversation

fils
Copy link
Member

@fils fils commented Nov 15, 2024

Dave, Here is the nabu PR with the addition of a cfgURL option.

so like

nabu release --cfgURL https://example.org/data/nabuconfig.yaml --prefix summoned/dataverse --endpoint localoxi

Also in the PR (sorry for overloading, I did it all in the same branch) is the "datacatalog" bit where I put in a datacatalog in the release graphs with all the named names in it.

feel free to push back for changes improvements...

example

<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/dateCreated> "2024-11-15 10:16:20" . 
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/description> "GleanerIO Nabu generated catalog" .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/provider> <https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pc0> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/publisher> <https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pcg> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pc0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pc0> <https://schema.org/name> "africaioc" .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pcg> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pcg> <https://schema.org/name> "gleaner.oih" .<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:000db5913f586b6d06bc0e3f59f33e299878e9f1> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:00f8c810e243e27e3ee5f5f06bc792e1b4b4aec5> .
<https://gleaner.io/xid/genid/csrn7l7g8s2n3pqu8pbg> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:01db27cf260cd84f0d5074748070bde443c43985> .
...

fils added 4 commits December 4, 2023 10:29
Added support for loading configuration files from a URL and improved data processing by implementing comprehensive Skolemization and graph association. Enhanced bulkLoader with a new flag for archiving, updated documentation, and incremented the version.
Replace standard log package with logrus for enhanced logging capabilities across the project. Update RDF metadata generation to include dynamic timestamps and bucket-derived names, improving the accuracy and relevance of generated data descriptions.
@fils fils requested a review from valentinedwv November 15, 2024 16:33
Copy link
Member

@valentinedwv valentinedwv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets work to avoid blind nodes, since we control the datacatalog generation. and use our own urns.

// Once we are done with the loop, put in the triples to associate all the graphURIs with the org.
if lastProcessed {

data := `_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a urn?
urn:gleaner.io:{SOURCE}:datacatalog

Avoid dangling triples.

data := `_:b0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> .
_:b0 <https://schema.org/dateCreated> "` + time.Now().Format("2006-01-02 15:04:05") + `" .
_:b0 <https://schema.org/description> "GleanerIO Nabu generated catalog" .
_:b0 <https://schema.org/provider> _:b1 .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here...
urn:gleaner.io:{IMPLNET}:provider

_:b0 <https://schema.org/dateCreated> "` + time.Now().Format("2006-01-02 15:04:05") + `" .
_:b0 <https://schema.org/description> "GleanerIO Nabu generated catalog" .
_:b0 <https://schema.org/provider> _:b1 .
_:b0 <https://schema.org/publisher> _:b2 .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urn:gleaner.io:{source}:organization

Nabu can also read the configuration file from over the network

```
go run ../../cmd/nabu/main.go release --cfgURL https://provisium.io/data/nabuconfig.yaml --prefix summoned/dataverse --endpoint localoxi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nabu release --cfgURL https://provisium.io/data/nabuconfig.yaml --prefix summoned/dataverse --endpoint localoxi

@fils
Copy link
Member Author

fils commented Nov 15, 2024

I can make those changes, but the step at line 145

// TODO: Skolemize with sdataWithContext
sdata, err := graph.Skolemization(data, "release graph prov for ORG")
if err != nil {

does Skolemize this and remove all blank nodes. It doesn't make a formal URN like your suggestion does though.

In my approach the SPARQL would need to look for type DataCatalog with a schema:provider scheme:publisher property values text.

In yours it would be the same but with by a URN/IRI.

Happy to make the change though.

I need to pull the IMPLNET... which is nicer than the bucket name.

@valentinedwv
Copy link
Member

I can make those changes, but the step at line 145

// TODO: Skolemize with sdataWithContext
sdata, err := graph.Skolemization(data, "release graph prov for ORG")
if err != nil {

does Skolemize this and remove all blank nodes. It doesn't make a formal URN like your suggestion does though.

In my approach the SPARQL would need to look for type DataCatalog with a schema:provider scheme:publisher property values text.

In yours it would be the same but with by a URN/IRI.

Happy to make the change though.

I need to pull the IMPLNET... which is nicer than the bucket name.

Yes, just looking for consistency, rather than randomness. Also, makes it easier to find the exact catalog for a source if it has a consistent ID, which is what we want to do.

Enhanced the PipeCopy function to include logging of organization name and generation of named graphs with unique URIs for RDF datasets based on organization names. Included a helper function to generate date-based SHA256 hashes to ensure unique graph URIs.
@fils
Copy link
Member Author

fils commented Nov 15, 2024

@valentinedwv

here is what I have now. Noticed these were triples, not quads, so fixed that.

IRI for catalog:  <urn:gleaner.io:doos:datacatalog>
IRI for publisher:  <urn:gleaner.io:africaioc:publisher>
IRI for provider: <urn:gleaner.io:doos:provider>
IRI for named graph for these triples:  <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> 
<urn:gleaner.io:doos:datacatalog> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/dateCreated> "2024-11-15 14:03:37" <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/description> "GleanerIO Nabu generated catalog" <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/provider> <urn:gleaner.io:doos:provider> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/publisher> <urn:gleaner.io:africaioc:publisher> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:provider> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:provider> <https://schema.org/name> "doos" <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:africaioc:publisher> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:africaioc:publisher> <https://schema.org/name> "africaioc" <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:000db5913f586b6d06bc0e3f59f33e299878e9f1> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:00f8c810e243e27e3ee5f5f06bc792e1b4b4aec5> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .
<urn:gleaner.io:doos:datacatalog> <https://schema.org/dataset> <urn:gleaner.io:doos:africaioc:data:01db27cf260cd84f0d5074748070bde443c43985> <urn:gleaner.io:doos:africaioc:datacatalog:85856a8f3f88628e4e70450c623edfaf7a071eacf721d78115a742071a8d01d0> .

@fils fils requested a review from valentinedwv November 15, 2024 20:05
Copy link
Member

@valentinedwv valentinedwv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think pattern needs to be
urn:gleaner.io:ORG:SOURCE:object

if lastProcessed {

data := `<urn:gleaner.io:` + orgname + `:datacatalog> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/DataCatalog> .
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/description> "GleanerIO Nabu generated catalog" .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urn:gleaner.io:ORG:SOURCE:datacatalog

<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/publisher> <urn:gleaner.io:` + getLastElement(prefix) + `:publisher> .
<urn:gleaner.io:` + orgname + `:provider> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
<urn:gleaner.io:` + orgname + `:provider> <https://schema.org/name> "` + orgname + `" .
<urn:gleaner.io:` + getLastElement(prefix) + `:publisher> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urn:gleaner.io:ORG:SOURCE:publisher

<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/provider> <urn:gleaner.io:` + orgname + `:provider> .
<urn:gleaner.io:` + orgname + `:datacatalog> <https://schema.org/publisher> <urn:gleaner.io:` + getLastElement(prefix) + `:publisher> .
<urn:gleaner.io:` + orgname + `:provider> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://schema.org/Organization> .
<urn:gleaner.io:` + orgname + `:provider> <https://schema.org/name> "` + orgname + `" .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is one provider, this is good
urn:gleaner.io:eco:provider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants