Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verifying linked data graphs #10

Open
onthebreeze opened this issue Jan 12, 2024 · 20 comments
Open

Verifying linked data graphs #10

onthebreeze opened this issue Jan 12, 2024 · 20 comments
Assignees
Labels
Transparency Graphs Issues related to Trust Graphs

Comments

@onthebreeze
Copy link
Contributor

We've an important UNTP section - https://uncefact.github.io/spec-untp/docs/specification/TrustGraphs that needs quite a bit of discussion about how we should even think about doing this.

basically the problem is that links between related things can be technically valid but business invalid. Some examples.

  • A credential issuer claims to be ABN 3413288567 but the linked government identity credential subject is ABN 3455667788. So the issuer is lying about their business identity.
  • A conformity credential is issued by a certifier and the scope is about motorcycle helmet safety. The conformity credential has a linked accreditation credential from NATA which is technically valid but days the scope of accreditation is animal health. So the certificate issuer is certifying something they are not authorised to do.
  • You scan a barcode with GTIN : 12345678910 and it takes you to a DPP credential about GTIN 10987654321 - so the passport is about the wrong product.
  • and may more like this

all are about verifying a collection of 2 or more credentials and whether the links between then are valid in a business sense.

Is it possible to define the validation rules using something like shacl? Or some other way of specifying rules? And if so then who defines the rules? maybe the creator of a UNTP extension for a specific industry / geography like Australian Agriculture?

@onthebreeze onthebreeze added the Transparency Graphs Issues related to Trust Graphs label Jan 12, 2024
@Fak3
Copy link
Contributor

Fak3 commented Jan 12, 2024

A credential issuer claims to be ABN 3413288567 but the linked government identity credential subject is ABN 3455667788. So the issuer is lying about their business identity.

If understand correctly, the validation here must include retrieval of that linked government identity credential. That is already a step that cannot be formalized with just shacl or json-schema rules.

Some validation steps can be formalized in shacl or json-schema, but full validation process must include other steps for implementers to follow.

@onthebreeze
Copy link
Contributor Author

@Fak3 : yes the mechanism to retrieve a bundle of credentials is separate to the analysis of the graph of linked data that is created from the credentials. Credentials will be discoverable from product or entity identifiers via a link resolution protocol. And credentials may contain links to other credentials. But this issue can assume that a verifier has followed links, discovered a number of related credentials, and is holding the data in some kind of graph store - and now wants to do some verification of the graph.

@Fak3
Copy link
Contributor

Fak3 commented Jan 13, 2024

SHACL playground example validates that issuer of MotoGearSafetyCredential has capability "caps/MotoGearSafety": https://s.zazuko.com/3AJQNCR

@Fak3
Copy link
Contributor

Fak3 commented Jan 13, 2024

One potential issue that comes to mind is that individual subgraphs may contradict each other, and naively validating the result after merging them together conceals where the problem came from.

@Fak3
Copy link
Contributor

Fak3 commented Jan 13, 2024

For example the fraudulent VC can say that its issuer has the needed capability. And this VC is signed by the issuer himself.

If we blindly merge everything that forged VC says with the data (another VC) from the national authority, and then validate, verifier won't recognize the fraud.

@Fak3
Copy link
Contributor

Fak3 commented Jan 13, 2024

What I'm trying to say is that due to information loss, the majority of checks we have to perform on separate VC subgraphs, and only a few can be done on the merged graph afterwards.

@onthebreeze
Copy link
Contributor Author

The fake accreditation Vc is why we have the trust anchors section of UNTP. Any VC of type accreditation must be issued by a very short list of known and trusted authorities - eg did:web:NATA.com.Au

@ashleythedeveloper
Copy link
Collaborator

Is it possible to define the validation rules using something like shacl? Or some other way of specifying rules?

Based on @Fak3 's example and my research, Yes, it would seem that it is possible to use SHACL to perform the type of validation in the examples you provided with the addition of SPARQL in some cases. But by all means, it's not the only way.

For example, another option could be using the query language of a graph database like Cypher Query for neo4j.

MATCH (issuer:Issuer)-[:CLAIMS_IDENTITY]->(identity:GovernmentIdentity)
WHERE issuer.abn <> identity.abn
RETURN issuer.abn AS issuerABN, identity.abn AS identityABN`

@nissimsan
Copy link
Contributor

Related: https://medium.com/transmute-techtalk/the-united-nations-trust-graph-d65af7b0b678

@JohnOnGH
Copy link
Contributor

JohnOnGH commented Feb 4, 2024

I've just re-read Nis' paper, and agree with the concluding comments. Basically I think we're going to run into a problem that we all know exists: not only should we expect and guard against bad actors (to the extent possible), but also we cannot expect all items and all nodes in any supply chain to adopt the same standard at the same time (and perhaps, ever).

This rather clumsy phrase is my way of saying that the existing systems are complex, adaptative, and real-world messy. We cannot expect a single approach to be adopted by all participants, no matter how attractive. Even if we were remarkably optimistic, such adoption would be a gradual rollout, at the beginning no participants use the approach, then some (scattered and in pockets), and then more do until (wildly optimistic), they all do.

That is not to say that I am against this idea or approach. I am a (deep) fan of the graph concept, last year I proposed an approach to consider "Governance" as a "governance graph" concept within Trust over IP's "Governance Architecture Task Force" (https://docs.google.com/presentation/d/1vYUJW76BEK_CQotAZ5maXYwe3H3K3dKCPAcTqEgfGJQ/edit).

My observation is that we have to accept/expect that we will have imperfect information, and we have to decide what to do about that. My expectation is that the relying party / verifier will explore the graph until they have satisfied their need for proof, or until they have exhausted the ability to explore the graph. If exhaustion occurs before acceptable proof, then they need to seek alternative/additional proof and/or accept that the claims are not fully verified and make a decision based on that.

We can consider this in terms of hard rules and soft rules. The hard rules (regulations, law etc.) will demand that we must have proof of claims that are acceptable within the jurisdiction in which they (and we) are being tested. The soft rules might be best efforts, nice to haves, preferences etc. and may allow some "wiggle room".

We need wiggle room.

Basically my heuristic is to explore the graph until your needs are satisfied, or the graph is exhausted, then make a decision and/or ask for more information.

@onthebreeze
Copy link
Contributor Author

For sure we cannot assume that an entire t-shirt to cotton farm credentials graph exists on day 1, if ever. Our architecture must assume that there are only little snippets of graphs - and that quite often a link to a conformity credential (for example) will take you to a pdf not a vc

For me I think it's enough to start with just a few minimal use cases where there are just 2 or 3 nodes in a graph. For example :

a product passport links to a conformity credential which links to as accreditation credential. The graph verification should confirm that

  • the product SKU or other identifier in the passport is the same as the one in the conformity credential (ie the certificate is about the right product)
  • the attestation scope of the conformity credential is the same as the scope of the linked accreditation (ie the certifier is authorised to issue the atteststion)

I'd suggest that the way forward is to identify several more use cases, create several realistic sample graphs, write a validator (SHACL?) for each - and see if any useful patterns emerge that we can document as best practice / protocols in UNTP

@JohnOnGH
Copy link
Contributor

JohnOnGH commented Feb 5, 2024

Agreed, I was hoping/expecting that we were being pragmatic! The aim is that the linked graph will provide benefit, even if the graph is incomplete.

@nissimsan
Copy link
Contributor

Related, here's a link to the demo I presented a couple of meetings back:
https://trace.dpp.ni5.io/

@nissimsan
Copy link
Contributor

nissimsan commented Apr 18, 2024

To close this, we need a PR to https://uncefact.github.io/spec-untp/docs/specification/TrustGraphs with a trust graph. Must include:

  • Identity linked to a claim
  • Conformity linked to a claim
  • Accreditation Authority

@nissimsan
Copy link
Contributor

https://medium.com/transmute-techtalk/the-united-nations-trust-graph-d65af7b0b678

@nissimsan
Copy link
Contributor

@zachzeus :

  • Same identity across credentials
  • Conformity credential linked to claim. Conformity credential is talking about the same thing you are linking
  • Certifier is accredited by a trusted 3rd party

@onthebreeze
Copy link
Contributor Author

We can probably use TSM (Towards Sustainable Mining) for this - because Nancy in BC is already working with MAC (Mining Association of Canada) to do exactly that. So we could make our example 3 patterns real using TSM as a realistic example.

@zachzeus
Copy link
Contributor

This really will be explored in our test architecture. This will be a pull request added once we add the refactored UNTP site. We are also working on this kind of testing in a reference implementation. The key outcome for UNTP is that there are some simple well described test for UNTP and implementers will do a lot more.

My next steps are to describe the tests:
Done looks like:

  • List of test cases that define actor(s), inputs and outputs and test data (positive and negative).
  • Using UNCRM scenario as an example
  • Scenario map including prerequisites.

@onthebreeze onthebreeze self-assigned this May 3, 2024
@philarcher
Copy link
Contributor

I think SHACL might be a useful component but, for the reasons others have said, it's not a full validation tool. What we can say is that if the graph matches a SHACL pattern then it might be OK. It's an early-stage test before you get into what might be computationally-heavy inspecting of individual claims and the human assessment thereof.

Should it be helpful in this regard, the RDF Canonicalization spec is about to become a W3C Rec (WG co-chair's insider knowledge ;-) )

@zachzeus
Copy link
Contributor

We've been working on what the trust graph testing for UNTP looks like and this becomes a pattern that implementers can extend. The UNTP validation will be based on what how the links between the core UNTP schema are validated. The components that we will demonstrate links for are:

  1. DPP
  2. DCC
  3. DTE
  4. DIA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Transparency Graphs Issues related to Trust Graphs
Projects
None yet
Development

No branches or pull requests

7 participants