Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOV data dump hosted as a git repo #80

Open
hoijui opened this issue Jun 15, 2024 · 4 comments
Open

LOV data dump hosted as a git repo #80

hoijui opened this issue Jun 15, 2024 · 4 comments

Comments

@hoijui
Copy link

hoijui commented Jun 15, 2024

Available here:
https://codeberg.org/elevont/lov-dump

I did this, because for software relying on this, it comes in handy to be able to include it as a git sub-module, instead of having to download it before or during the build process. it prevents needless re-downloads, security policy bells ringing, and many other, similar issues.

I did it on codeberg.org, because both GitHub and Gitlab.com have 100MB blob size limits, and this is 208MB

@VladimirAlexiev
Copy link

VladimirAlexiev commented Sep 7, 2024

@hoijui thanks! I listed your dump on Wikidata: https://www.wikidata.org/wiki/Q39392701#P4945 .

Please state your update policy. Your dump is 3m old, but this query at https://lov.linkeddata.es/dataset/lov/sparql

prefix dct:  <http://purl.org/dc/terms/>
select * { # (max(?upd) as ?updated) {
  ?x dct:modified ?upd
} order by desc(?upd) limit 20

shows newer stuff:

Is it because the LOV dump is 3m old, or you don't track it regularly?

@hoijui
Copy link
Author

hoijui commented Sep 7, 2024

Thank you for that.. Indeed, I completely neglected that!|
I think that happened so, because initially I planned to do this with GitHub Actions, but then moving to codeberg made this more cumbersome, and it got lost. Of course, it is of little use without this, so.. thank you!
How would you do it?
As codeberg has limited resources, I think it would be good to use a scheduled (e.g. once a day) GitHub action, and push eventual changes over to codeberg. It should be relatively straight-forward, as long as I don;t run into any size or access limitations...

@hoijui
Copy link
Author

hoijui commented Sep 7, 2024

It should now be updated daily (if there are changes) from this repos CI:
https://github.com/elevont/lov-dump-updater

... but ... it looks like there is an issue with the blank-nodes. :/
on each data dump, they get assigned different (random) IDs, and this shows up in the diff, of course. So about 1/3 of all lines show up as changed. That is of course not meaningful, nor maintainable over time.
Any idea for how to solve this?
The best way would be to have fixed Ids for blank-nodes (as in, they don't change between data dumps. Are you from the LOV team, by any chance?

@VladimirAlexiev
Copy link

VladimirAlexiev commented Sep 10, 2024

atextor/turtle-formatter#8 : there is active development on this tool, and stability of blank nodes is one of the issues being addressed.

You use it as described at https://atextor.de/owl-cli/main/snapshot/usage.html#write-command

I'm not from the LOV team, if indeed there is such.

  • @pyvandenbussche hasn't worked on it for a few years AFAIK (apologies if that is not so!)
  • @gatemezing has often posted and updated ontologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants