Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on upload (Virtuoso only) - pending fix from virtuoso team #24

Open
manonthegithub opened this issue Mar 3, 2022 · 8 comments
Open
Labels
bug Something isn't working enhancement New feature or request question Further information is requested

Comments

@manonthegithub
Copy link
Collaborator

dbpedia/databus-transfer#1

@manonthegithub manonthegithub added the pick up first high priority label Mar 3, 2022
@manonthegithub
Copy link
Collaborator Author

Likely to be linked to:
openlink/virtuoso-opensource#571

@holycrab13
Copy link
Contributor

holycrab13 commented Mar 3, 2022

Very likely not a gstore issue

Could you make a gstore version where it logs out all the drop graph and insert graph statements? I will try to reproduce it with standalone virtuoso then

Probably linked to the issue you posted but it's unsolved since 2016 - let's hope for the best :D
Branch of databus-transfer repo to reproduce the issue:
https://github.com/dbpedia/databus-transfer/tree/insert-stopped-debug

@manonthegithub
Copy link
Collaborator Author

@holycrab13
added GSTORE_LOG_LEVEL env var,
you can now set GSTORE_LOG_LEVEL=DEBUG in docker-compose to enable looking of queries

@manonthegithub
Copy link
Collaborator Author

ok very strange issue…. so the query size doesnt matter. it happens on different queries, randomly.
I mean:
I i split the file into insrts of 100 triples, it fails randomly on different parts, sometime on first 200 triples, sometimes on 900 triples. No idea what is the problem, it is a bug in virtuoso, we can make a ticket there.

Restarting virtuoso and gstore and saving some other files first helps.

@manonthegithub manonthegithub added bug Something isn't working enhancement New feature or request question Further information is requested and removed pick up first high priority labels Mar 7, 2022
@manonthegithub
Copy link
Collaborator Author

Posted repro to openlink/virtuoso-opensource#571
hope they will be able to fix this soon

@manonthegithub manonthegithub changed the title Bug on upload Bug on upload (Virtuoso only) Apr 20, 2022
@manonthegithub manonthegithub changed the title Bug on upload (Virtuoso only) Bug on upload (Virtuoso only) - pending fix from virtuoso team Apr 20, 2022
@kurzum
Copy link
Member

kurzum commented May 4, 2022

ok, I also reproduced the bug now. I made a test set for bash:

isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2

i1.sparql.txt
i2.sparql.txt

@kurzum
Copy link
Member

kurzum commented May 4, 2022

I split the triples and ran them individually:

while read p; do
  echo "----------------"
  echo "$p"
  isql-vt 1111 dba password VERBOSE=ON exec="sparql INSERT IN GRAPH <http://localhost:3002/g/test/mappings-geo-coordinates-mappingbased-2018.09.12-dataid.jsonld> { $p } ;"
  echo "----------------"
done <triples.txt

seems like it is definitely the preview triples. When run individually they throw syntax errors:
ri2.txt

Then I split the triples of i2 into no preview (i3) and only preview (i4):
i3.sparql.txt
i4.sparql.txt

Then I tested it again:

# no preview triples are loaded first. This seems to initiate the DB properly and sets up the graph. Then loading i1 and i2 still throw an error "-- More than 0 parameters, ignoring all the rest of the statement #line 1 "i2.sparql.txt"" but they do not corrupt the store any more.
isql-vt 1111 dba password VERBOSE=ON i3.sparql.txt > 3.1.txt 2>3.2.txt
isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2.txt
# running i1 or i2 first which contain the preview property mess up the store:
isql-vt 1111 dba password VERBOSE=ON i1.sparql.txt > 1.1.txt 2>1.2.txt
isql-vt 1111 dba password VERBOSE=ON i2.sparql.txt > 2.1.txt 2>2.2.txt
isql-vt 1111 dba password VERBOSE=ON i3.sparql.txt > 3.1.txt 2>3.2.txt

Fazit:
Overall this seems to be an encoding thing. ODBC/JDBC have certain control and macro characters like $. The preview triple was originally created by me in the old maven upload client. back then I already had trouble with creating these as it is -- until now -- unclear to me, what I needed to escape/encode exactly when putting RDF in RDF as a Literal. This get's potentiated by the different available syntaxes (ntriples, ttl, rdf/xml) plus also the they have to go into SPARQL which is yet another syntax and I am not sure, if SPARQL INSERT is exactly like turtle or has different details.

Solution suggestions:

  • remove preview from the dataid
  • recode preview either by:
    • adding a core service to the databus similar to SHACL or SHA sum generation
    • out-sourcing it to mods

Still uncertain:

  • I can totally see that the "preview" prop can cause all kind of havok. The question is: when we remove it, is then everything ok or does the issues persist with other triples as well.

@manonthegithub
Copy link
Collaborator Author

manonthegithub commented May 4, 2022

@kurzum you should better post it there: openlink/virtuoso-opensource#571
it is really happening at different moments and even places in the same data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants