Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF Exports #4

Open
Livvi opened this issue Nov 14, 2018 · 4 comments
Open

RDF Exports #4

Livvi opened this issue Nov 14, 2018 · 4 comments

Comments

@Livvi
Copy link

Livvi commented Nov 14, 2018

Hi,

I am trying to create an RDF Export from a current Wikidata dump (20181105).

First I tried to use the toolkit client (v0.8.0) and I always got 31 triples, no matter what parameters I tried to use.

Now I am using the version 0.9.0 of the toolkit in eclipse, but I am getting some warnings and errors.

One Warning I am encountering for several language codes is:
Unknown Wikimedia language code "inh". Using this code in RDF now, but this might be wrong.

And for various properties I get the errors:
Count not export SomeValueSnak for property P1971: OWL range not known.
or
Could not fetch datatype of http://www.wikidata.org/entity/P883. Assuming type http://wikiba.se/ontology#String

Furthermore I am trying to filter the data by english and german using setLanguageFilter, but it has no effect. I added the following to the RdfSerializationExample but I get the same amount of triples with or without it:

Set<String> languageSet = new HashSet<String>();
languageSet.add("en"); 
languageSet.add("de");
dumpProcessingController.setLanguageFilter(languageSet);
@Tpt
Copy link
Collaborator

Tpt commented Nov 15, 2018

Hi! Thank you for the report!

WikidataToolkit v0.8 is indeed not working with recent Wikidata dumps.

One Warning I am encountering for several language codes is:
Unknown Wikimedia language code "inh". Using this code in RDF now, but this might be wrong.

It just means that "inh" is not in the conversion dictionary from Wikimedia language code to proper BCP47 language codes because it was not used by Wikidata at the time of the v0.9 release. You could safely ignore this warning. It is going to be added in the next release.

Could not fetch datatype of http://www.wikidata.org/entity/P883. Assuming type http://wikiba.se/ontology#String

It's because this property have been deleted and so the library is not able to fetch its datatype from the API: https://www.wikidata.org/wiki/Property:P883

And for various properties I get the errors: Count not export SomeValueSnak for property P1971: OWL range not known.

It looks like a bug. I have reported it in a specific issue: Wikidata/Wikidata-Toolkit#405

Furthermore I am trying to filter the data by english and german using setLanguageFilter, but it has no effect.

It's strange. I'm going to investigate it.

@Livvi
Copy link
Author

Livvi commented Nov 15, 2018

Thank you very much for the fast response!

When using maven, how do I install the newest version of the toolkit with the bugfix?

@Tpt
Copy link
Collaborator

Tpt commented Dec 4, 2018

Just change the version number to the latest one (0.9) in the pom.xml file

@Livvi
Copy link
Author

Livvi commented Dec 4, 2018

Thanks!

In the meantime i got another problem, it seems, that the toolkit does not support the type 'form' yet. Wikidata/Wikidata-Toolkit#407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants