Graph_diff fails on presence of unrelated blank nodes #1610
Replies: 13 comments 1 reply
-
How should rdflib know that the blank node in g1 is the same as the one in g2? I don't think anything in rdf semantics make them the same, and not even in rdfs or owl semantics AFAIK. So I don't think this behaviour is incorrect, but I may be wrong, if you can provide some reference to the standards that dictate they should be the same I will have a look. Regardless of standards/specs however, if there is some way in which you think equivalence can be established consistently I think it would be good to know, it would be useful to have. I think it may actually be a bug that it works if there is only one blank node however, I don't think blank nodes in independent graphs should ever be equivalent with pure rdf semantics. I can see cases for OWL where that would be different though. |
Beta Was this translation helpful? Give feedback.
-
I guess this could be done: Blank Node Matching and RDF/S Comparison Functions - but it is not standard RDF semantics, so it should likely not go inside the normal graph diff function. |
Beta Was this translation helpful? Give feedback.
-
I see you import |
Beta Was this translation helpful? Give feedback.
-
Comparing skolemized canonical graphs seems to yield the result you expect: from pprint import pprint
from rdflib import Graph
from rdflib.compare import graph_diff, to_canonical_graph
g1t = """@prefix : <http://test.org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
:C1 a owl:Class ;
rdfs:subClassOf [ a owl:Restriction ;
owl:minCardinality "1"^^xsd:int ;
owl:onProperty rdfs:label
] ;
rdfs:label "C1"@en ."""
g2t = (
g1t
+ """ :C2 a owl:Class ;
rdfs:subClassOf [ a owl:Restriction ;
owl:minCardinality "1"^^xsd:int ;
owl:onProperty rdfs:comment
] ;
rdfs:label "C2"@en ."""
)
g1 = Graph()
g2 = Graph()
g1.parse(data=g1t, format="turtle")
# g1.parse(source="profile.ttl", format='turtle')
# g2.parse(source="source.ttl", format='turtle')
g2.parse(data=g2t, format="turtle")
g1c = to_canonical_graph(g1).skolemize()
g2c = to_canonical_graph(g2).skolemize()
g1cs = g1c.skolemize()
g2cs = g2c.skolemize()
in_both, in_first, in_second = graph_diff(g1cs, g1cs)
print("should be empty as all in g1 is in g2")
print(in_first.serialize(format="turtle").decode())
in_both, in_first, in_second = graph_diff(g1cs, g1cs)
print("should be empty as all in g1 is in g1")
print(in_first.serialize(format="turtle").decode()) output:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the feedback.. to_canonical_graph() is used within graph_diff the output of to_canonical graph illustrates the problem: (rdflib.term.URIRef('http://test.org#C1'), rdflib.term.URIRef('http://www.w3.org/2000/01/rdf-schema#subClassOf'), rdflib.term.BNode('cb0'))
I tested you code and it doesnt help - you have an error in you are comparing the first graph to itself.. graph_diff(g1cs, g1cs)
|
Beta Was this translation helpful? Give feedback.
-
PS - thanks for helping - if you are the original author could you please create a PR to drop in a comment saying where the algorithm came from and add some tests ? It looks cool - and I'm guessing there is some very interesting information about its performance relative to other options it would also be good to have explained in the docs :-) |
Beta Was this translation helpful? Give feedback.
-
Apologies. It seems like under some circumstances the hashing fails and results in a 0 value (i.e. If I add some dummy blank nodes from pprint import pprint
from rdflib import Graph
from rdflib.compare import graph_diff, to_canonical_graph
g1t = """@prefix : <http://test.org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
[] rdfs:seeAlso [].
:C1 a owl:Class ;
rdfs:subClassOf [ a owl:Restriction ;
owl:minCardinality "1"^^xsd:int ;
owl:onProperty rdfs:label
] ;
rdfs:label "C1"@en ."""
g2t = (
g1t
+ """ :C2 a owl:Class ;
rdfs:subClassOf [ a owl:Restriction ;
owl:minCardinality "1"^^xsd:int ;
owl:onProperty rdfs:comment
] ;
rdfs:label "C2"@en ."""
)
g1 = Graph()
g2 = Graph()
g1.parse(data=g1t, format="turtle")
# g1.parse(source="profile.ttl", format='turtle')
# g2.parse(source="source.ttl", format='turtle')
g2.parse(data=g2t, format="turtle")
#g1c = to_canonical_graph(g1).skolemize()
#g2c = to_canonical_graph(g2).skolemize()
#g1cs = g1c.skolemize()
#g2cs = g2c.skolemize()
#print(g1cs.serialize(format="turtle").decode())
#print(g2cs.serialize(format="turtle").decode())
in_both, in_first, in_second = graph_diff(g1, g2)
print("should be empty as all in g1 is in g2")
print(in_first.serialize(format="turtle").decode())
in_both, in_first, in_second = graph_diff(g1, g2)
print("should be empty as all in g1 is in g1")
print(in_first.serialize(format="turtle").decode()) (hopefully this time it is not just a bug again 😄) output:
So yes there is clearly a bug here.
Not the original author. I am not sure when I will get time to look at this, or if I will at all, but at least the problem is clearer now. |
Beta Was this translation helpful? Give feedback.
-
So this is a workaround until a fix is made - this could be used to replace the current graph_diff() code... def graph_diff_fix( g1: Graph, g2:Graph ) :
|
Beta Was this translation helpful? Give feedback.
-
@jimmccusker as far as I can tell you wrote the diff algorithm here, is it documented somewhere? I'm trying to make sense of it to see why its going wrong. |
Beta Was this translation helpful? Give feedback.
-
Okay I did some digging, the algorithm is explained in chapter 06 of this document: WEBSIG: A DIGITAL SIGNATURE FRAMEWORK FOR THE WEB
This references: Still not sure what is actually wrong with it. Once #1330 is merged I will merge some type annotations that made it slightly easier for me to understand some of what is happening and I will also merge some tests marked with I think it may be worth considering other graph canonicalization algorithms, for example: Canonical Forms for Isomorphic and Equivalent RDF Graphs: Algorithms for Leaning and Labelling Blank Nodes which was implemented by RDF4J |
Beta Was this translation helpful? Give feedback.
-
Yeah. I only wrote the test to see if the graphs are different, not the one
that checks to see what is different between the graphs. However, if two
bnodes are connected via another bnode, that will change their color. Do
you have an example graph pair that causes the problem?
Thanks,
Jamie
On Sun, May 30, 2021 at 6:33 PM Iwan Aucamp ***@***.***> wrote:
Okay I did some digging, the algorithm is explained in ch06 of this
document: http://tw.rpi.edu/media/2015/09/21/8275/websig_jpm_thesis.pdf
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1294 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEOHMM6BGLB3WMAVQHDTQK4JXANCNFSM43N7VMEQ>
.
--
Jamie McCusker (they<she)
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
***@***.*** ***@***.***>
http://tw.rpi.edu
|
Beta Was this translation helpful? Give feedback.
-
see the in-line examples earlier in this issue... |
Beta Was this translation helpful? Give feedback.
-
@gjhiggins this is an actual bug, it should not be a discussion. |
Beta Was this translation helpful? Give feedback.
-
If both graph has multiple blank nodes graph diff fails to detect matches. It succeeds for a single blank node case.
code:
Beta Was this translation helpful? Give feedback.
All reactions