Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new query language (Approach 1) #220

Closed
wants to merge 15 commits into from

Conversation

danielfett
Copy link
Contributor

@danielfett danielfett commented Jul 25, 2024

Fixes #178

Todo:

  • Determine which "extensions" we want to specify ("intent_to_retain" etc.)
  • Determine which format-specific parameters we need
  • Define interaction with existing query lang (see co-existence of multiple query languages #211)
  • Define new format for the VP Token

Copy link

@decentralgabe decentralgabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear to me that this is an improvement over presentation exchange.

are presentation submissions from PE still being used? just a different presentation definition format defined here?

I would recommend not merging this as is. My recommendation is to create a new dependent spec that fully defines a query language.

]
},
"credentials": {
"pid": {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would someone specify claims from different formats?
if that is allowed, we should have an example.
if not allowed, then it seems the format property can be extracted to a higher level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, an example should be added. The other formats would just be separate credential request objects.

examples/query_lang/multi_credentials.json Show resolved Hide resolved
@@ -542,6 +545,266 @@ The Wallet then validates the request as specified in OAuth 2.0 [@RFC6749].

If the Verifier responds with any HTTP error response, the Wallet MUST terminate the process.

# Verifiable Presentation Query Language {#vp_query}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see this as a separate specification. It can be fully tested outside of OID4VP flows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the intention of this new query language has been to be protocol independent, so not sure what would be the benefit of "fully testing outside of OID4VP flows" unless I am misinterpreting it..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that a software implementation of a query language is distinct from an implementation for OID4VP. It could have its own conformance suite and open source tooling, and potentially, utility outside of this specification.

If those broader ambitions are shared amongst the group then breaking it out could be the right path.

optionality rule. The entries are listed in order of preference, with the
first entry being the most preferred.

More than two nested levels of optionality rules MUST NOT be used by a Verifier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if credentials have >2 layers of nesting what's the solution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for any depth of nesting; the optionality rules are just capped to enable easier processing, but this doesn't reduce expressiveness of the language.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "two nested levels of optionality rules" mean that there is a total of 2 or 3 instances of "required" in a single "claim" element?

openid-4-verifiable-presentations-1_0.md Outdated Show resolved Hide resolved

### JSON-based Claims Structures {#json_based_claims}

For credentials containing JSON-based claims structures, the following
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clearly define what credential formats use this section rather than saying credentials using JSON-based claims structures.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer using path for mdocs too to solve this weird deviation between the formats for this top-level parameter layer

Comment on lines 760 to 772
A purpose description array is an object that describes the purpose for which
the Verifier is requesting the Verifiable Credential. It can contain multiple
objects, each describing the purpose in a specific language. Each object
contains the following properties:

`lang:`
: REQUIRED. A string that specifies the language of the purpose
description. The value of this property MUST be a valid language tag as defined
in [@!BCP47].

`text`:
: REQUIRED. A string that contains the purpose description in the
language specified by the `lang` property.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, free text is not a good idea for a lot of reasons, e.g., disambiguation between different translations of the the same reason and actually providing different reasons -> prone to errors; privacy; i18n/l18n which is hard to scale with this approach. A more appropriate approach is to define reason codes. There was a lot of work in other standards orgs on that, e.g., Kantara.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the comment on not having free text because of the UI/UX and privacy concerns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please propose a change introducing reason codes so that we can discuss on the calls?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to provide reason codes that can help users guide in how to respond in a meaningful way, we need to find discuss the proper places where to define such reason codes. These reason codes are likely to be use case and/or document type specific. Therefore one approach could be to define reason codes in such a way that they can be scoped/namespaced according to the entity defining them.
Separate to that we should also decide whether we can define or reference a set of such codes that could apply more generically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit that I lack the imagination how a set of codes could ever be complete. "Complete" here meaning in particular that I can use them to refer to all possible reasons for obtaining data through a wallet. Just as a few examples:

  • "Required for account opening according to [law XYZ]" — we cannot enumerate all possible laws; also many laws and their applicability will be dependent on the region.
  • "Age verification according to [regulatory body XYZ]"
  • "Required only for obtaining [product XYZ]"
  • "Alternative when not in possession of [credential XYZ]"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it's difficult / unlikely to have reasons codes that are both very specific and can address every possible situation. However, allowing free text is not a solution to achieve that as it introduces privacy and security problems that now need to be addressed. it's a lot more difficult for the wallet to meaningfully use a free form text field to help guide the user compared to pre-defined reason codes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experience, providing a meaningful purpose to a user requires customization of the respective text to the particular RP and process, even the specific transaction in some cases. Yes, free text introduces security risks, but those are not unsolvable, developers cope with them every day when building applications with dynamic input. The multi language aspect is a true challenge, it would most likely require the RP sending the purpose string in multiple languages.
Pre-defined reasons code on the other hand posse their own challenges:

  1. There needs to be registries for those codes. Which parties operate these registries?
  2. What does a RP need to do in order to be able to send the first presentation request if this RP wants to implement a use case no one has implemented/standardized before? I think requiring pre-defined reason codes would create a huge administrative burden to RPs and delay deployment of new integrations in an inacceptable manner.
  3. How are those code distributed along with the respective text? What is the update policy?
  4. In the simplest case, there is no link possible to an actual transaction performed by a RP.

: OPTIONAL. A purpose description array as defined in
(#purpose_description).

`claims`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether it is a good idea to have claims being a polymorphic object with a format-dependent object type, or if it would be better to have everything nested inside a format identifier object, e.g., sd-jwt-vc: { claims ... }?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not put it into the format-specific part, as claims applies to either all JSON-based structures (which can be multiple formats) or all ISO-based structures (mdoc). We could rename to claims and claims_iso maybe.

openid-4-verifiable-presentations-1_0.md Show resolved Hide resolved
openid-4-verifiable-presentations-1_0.md Show resolved Hide resolved
openid-4-verifiable-presentations-1_0.md Outdated Show resolved Hide resolved
openid-4-verifiable-presentations-1_0.md Outdated Show resolved Hide resolved
openid-4-verifiable-presentations-1_0.md Show resolved Hide resolved
@@ -0,0 +1,43 @@
{
"rules": {
"required": 2,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on editors' call: add mapping to "and" and "or"


## Credential Rules {#credential_rules}

Just as for claims, the Verifier can define rules that specify which
Copy link

@sloops77 sloops77 Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using the PE submission_requirements feature. I see the intent here is to replace submission_requirements with rules. The features that we in HR use cases that are missing from rules are

  1. ability to require all credentials of a particular credential type (eg. all transcripts) vs one of a particular type (eg mDL). From my understanding both would expressed as { "required": 1, "from": [{"ref": "transcript"}]} and { "required": 1, "from": [{"ref": "mDL"}]}
  2. in HR use cases the user has more flexibility in what to share. therefore the ability to also require at least one is useful {"min": 1, "from": [{"ref": "email"}]}.
  3. explaining the purpose of a rule (e.g. the RP requires a mDL or national id card to get the applicants name)
  4. should there be a maximum depth to rules?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ability to require all credentials of a particular credential type (eg. all transcripts) vs one of a particular type (eg mDL). From my understanding both would expressed as { "required": 1, "from": [{"ref": "transcript"}]} and { "required": 1, "from": [{"ref": "mDL"}]}
in HR use cases the user has more flexibility in what to share. therefore the ability to also require at least one is useful {"min": 1, "from": [{"ref": "email"}]}.

As per a simiilar suggestion above required should be all, optional, one, at_least_one with the default either being all or optional. An alternative if a number is required for certain use cases would be to replace one with count which would then have an additional property with the expected count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The at_least_one case is not currently captured, but we should consider introducing it and therefore moving back to keywords instead of numbers. Do you have a concrete use case in mind for at_least_one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a purpose for the rule be exposed to the user?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielfett the current use case we have is in the application for a job. In the recruitment use case the holder can share whatever information they feel is relevant. For example, a person may have had studied and achieved a number of diplomas, but one or more maybe irrelevant and/or perhaps detrimental to them being successful in the application. The recruiter therefore will request at least one degree but leave it up to the holder to select the relevant one or two.

Again the recruitment use case means that certain types of credentials are needed for a purpose. We have a UI in the reference wallet in the HR use case that takes the user through a wizard-like process to meet each rule requirement. A purpose for that rule is required for this UX to make sense

A valid VP Query is defined as a JSON-encoded object with the following
properties:

`credentials`:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mentioned that the key of the credential object is a string. Since these strings are not meant for human consumption, it's cleaner and easier to use integers for these since no unintended meaning can de derived from these strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is pretty common to use strings even when they are not meant for human consumption. it has also been a convention throughout openid4vc specs: credential_configuration_id and credential_identifier in VCI just being some of the examples. I do not see why deviate from that here and would prefer to keep it a string.

Appendix A of [@!OpenID.VCI]. The value of this property MUST be one of the
supported Credential formats as defined in the Wallet's metadata.

Format-specific properties (using the format value as a key):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This states that the format-specific properties use the format value as key, but in the #format_specific_properties section, keys are defined that don't have the format value as a key.

- Claim query: An object describing a single claim that is to be returned, as
specified in (#claim_query).
- An optionality rule: An object containing the following properties:
- `required` (REQUIRED): defines a number of claims that must be returned from
Copy link

@martijnharing martijnharing Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there are any use cases where required is something else than either "1" or "all". Allowing other options adds complexity because it means more testing and more edge cases.
Propose to change the structure to only allow 1 or all.

- An optionality rule: An object containing the following properties:
- `required` (REQUIRED): defines a number of claims that must be returned from
the `from` set of claims, as specified next.
- `from` (REQUIRED): An array; each entry is either a claim query or another

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of the 'from' element is polymorphic, because it can be either a claim query or another optionality rule. That in itself can make implementations a bit more error-prone, since they need to check the contents of this field and check it against the possible fields to determine how it needs to process it. But in this case the claim query itself is also format specific, leading to relatively complex code.
One potential solution is to make the content of the structure explicit instead of polymorphic e.g. by having separate elements for claim query or optionality rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand the proposal - could you please give an example?

Copy link

@martijnharing martijnharing Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hereby some examples on what it could look like. Option 1 is the current proposal, option 2 and 3 are alternatives.

; option (1) What we have in the current proposal

claims = [+claim]   ; you can ask for multiple "claim", each "claim" MUST be satisfied

claim = claim_query / optionality_rule  ; claim_query OR optionality_rule
optionality_rule = {
    "required": uint,   ; number of claims to be returned from the 'from' set
    "from": [+claim]       ; recursively contains the claim structure
}

claim_query = claim_query_json / claim_query_mdoc   ; claim_query_json OR claim_query_mdoc

claim_query_json = {
    "path": tstr
}

claim_query_mdoc = {
    "namespace": tstr,
    "claim_name": tstr

}


; option (2) For each claim, define the main element, then define alternates for that claim

claims = [+claim]   ; you can ask for multiple "claim", each "claim" MUST be satisfied

claim = {
    "claim_query" = claim_query,            ; claim_query as defined in (1)
    "alternate_claims" = [+ claim_query]    ; Alternate claims, in order of preference
}


; option (3) For each claim, define an array 

claims = [+claim]       ; you can ask for multiple "claim", each "claim" MUST be satisfied
claim = [+claim_query]  ; to satisfy the "claim" array, one element from the array (e.g. one claim_query) must be returned

Note, what is the best solution also depends on the outcome of the other discussions. Specifically the ones regarding intent_to_retain, if_present and whether to have the claim_query be polymorphic or to have different versions of the "claims" element, e.g. claims_json and claims_mdoc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed yesterday, hereby some examples for these structures. Note that other solutions besides option 2 or option 3 may exist that are more appropriate, especially when we consider the other comments on this PR.

Example 3: given_name, age_over_18 OR birth_date OR age_birth_year

option(1)

claims = [
    { "path": ["given_name"] },
    {
        "required": 1,
        "from": [
            { "path": ["age_over_18"] },
            { "path": ["birth_date"] },
            { "path": ["age_birth_year"] }
        ] 
    }
]


option(2)

claims = [ 
    {
        "claim_query": { "path": ["given_name"] }   
    },
    {
        "claim_query": { "path": ["age_over_18"] },
        "alternate_claims": [
            { "path": ["birth_date"] },
            { "path": ["age_birth_year"] }
        ]  
    },  
]

option(3)

claims = [ 
    [ { "path": ["given_name"] } ]
    [             
        { "path": ["age_over_18"] },
        { "path": ["birth_date"] },
        { "path": ["age_birth_year"] }
    ]
]



Example 4: given_name, zip_code OR (city AND state)

option(1)

claims = [
    { "path": ["given_name"] },
    {
        "required": 2,
        "from": [
            { "path": ["zip_code"] },
            {
                "required": 1,
                "from": [
                    { "path": ["city"] },
                    { "path": ["state"] },
                ]
            }
        ] 
    }
]

option(2)

claims = [ 
    {
        "claim_query": { "path": ["given_name"] }   
    },
    {
        "claim_query": { "path": ["zip_code"] },
        "alternate_claims": [
            { "path": ["city"] },
        ]  
    },  
    {
        "claim_query": { "path": ["zip_code"] },
        "alternate_claims": [
            { "path": ["state"] },
        ]  
    }
]

option(3)

claims = [ 
    [ { "path": ["given_name"] } ]
    [             
        { "path": ["zip_code"] },
        { "path": ["city"] },
    ],
    [             
        { "path": ["zip_code"] },
        { "path": ["state"] },
    ]
]

Copy link
Contributor

@deshmukhrajvardhan deshmukhrajvardhan Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does , in the above mean AND/required? I think the description text should be more specific and informative.
Also, does the Example 4 mean this given_name AND (zip_code OR (city AND state))
seems like in option 3 zip_code OR city and zip_code OR state could result in zip_code and zip_code. Resulting in not returning city or state.

Can we redefine option 3 by swapping the meaning of things in the 1st and 2nd level [ ]?
E.g. given_name AND (zip_code OR (city AND state))

redefining option 3
claims = [ 
    [ 
        { "path": ["given_name"] },
       { "path": ["zip_code"] },
    ]
    [             
       { "path": ["given_name"] },
       { "path": ["state"] },
       { "path": ["city"] },
    ],
]

This would make it simpler as entries in the root level [ ] would mean OR and the 2nd level will mean MUST.
In the above it translates to (given_name and zipcode) OR (given_name and zip_code and city)

Similarly for Example 3: given_name and (age_over_18 OR birth_date OR age_birth_year)

redefining option 3
claims = [
  [ 
        { "path": ["given_name"] },
       { "path": ["age_over_18"] },
  ],
 [ 
        { "path": ["given_name"] },
       { "path": ["birth_date"] },
  ],
  [ 
        { "path": ["given_name"] },
       { "path": ["age_birth_year"] },
  ],

This also corresponds to all the potential credentials User can consent to send. So it won't result in cases where zip_code and zip_code are sent.
I recognize that this could cause the query to be longer than other methods in cases where there are a large number of OR's.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more clear to use a notation (req) instead of and
as in boolean "zip_code" and "zip_code" translates to just 1 "zip_code"

https://grace.bluegrass.kctcs.edu/~kdunn0001/files/Simplification/4_Simplification_print.html

Where as in (req)zip_code (req)zip_code it is sent twice?

optionality rule. The entries are listed in order of preference, with the
first entry being the most preferred.

More than two nested levels of optionality rules MUST NOT be used by a Verifier

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are limiting these two two levels, it can be easier to not have the structure reference itself, but define the different levels explicitly. While this makes the definition slightly bigger, it results in a lot less edge cases that we need to consider.


### Format `mso_mdoc` {#format_mso_mdoc}

TBD
Copy link

@martijnharing martijnharing Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that need to be added.

  • IssuerIdentifiers (using the AKI of one of the certificates in the MSO header)
  • Conditional elements
  • Intent to retain

Which of these elements should go in the format specific section and which of these in the general section? Once agreed on that, I can provide exact definitions of these elements.

Comment on lines 760 to 772
A purpose description array is an object that describes the purpose for which
the Verifier is requesting the Verifiable Credential. It can contain multiple
objects, each describing the purpose in a specific language. Each object
contains the following properties:

`lang:`
: REQUIRED. A string that specifies the language of the purpose
description. The value of this property MUST be a valid language tag as defined
in [@!BCP47].

`text`:
: REQUIRED. A string that contains the purpose description in the
language specified by the `lang` property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with the comment on not having free text because of the UI/UX and privacy concerns.

in the top-level property `rules`. The values is a credential rule object,
defined as having the following properties:

- `required`: An integer specifying the number of credentials that MUST be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments made on the rules for data elements also apply to the rules for documents. Likely the same solution can apply.

- a credential query reference, i.e., an object with a single key `ref` that
references a credential query defined in the `credentials` object.

Within `from`, the credentials are listed in order of preference, with the first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While for data elements having the elements listed in order of preference is a the most common scenario, for documents this is not the case in most scenarios. While there are situations where a document will be requested in order of preference, there are more situations where there is no such order of preference. Forcing this to be the case is not ideal.
I would propose to remove the statement about this implying an order of preference. While we could add structures that allow to indicate whether there is an order or not, the extra complexity that comes from that is likely not worth the benefits in use cases.

credentials MAY be returned by the Wallet, as defined in (#credential_rules). If
omitted, all credentials listed in `credentials` are requested for presentation.

## Credential Query {#credential_query}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The credential query structure is defined as a polymorphic structure that contains both elements that always have the same content (purpose), elements that are partly format specific (claims) and elements that are fully format specific (everything else).
Since there is only a single element that is not currently format specific (purpose), we should consider making the request fully profile specific to simplify implementations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming by profile specific, you meant format specific? I do not understand how making everything format specific would simplify things... I think the feedback we have been getting is actually in the opposite direction of removing "partly format specific" part and use path for mdocs too, instead of namespaces and data_elements claims. the rest of the structure of defining the framework in a format agnostic way, while allowing format specific content for certain claims (like for the algorithms supported) makes sense to me. path vs namespace/data_elements feels pretty unnatural.


`intend_to_retain`: TBD

`if_present`: TBD
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the claim that is intended to solve the following requirement?

  1. When Verifier wants to express that it wants to receive a credential only when a specific requirement is met (e.g. read_id claim is present)

@bc-pi bc-pi mentioned this pull request Aug 7, 2024
5 tasks

TBD

### Format `jwt_vp*`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Format `jwt_vp*`
### Format `jwt_vp`

Comment on lines 762 to 765
`lang:`
: REQUIRED. A string that specifies the language of the purpose
description. The value of this property MUST be a valid language tag as defined
in [@!BCP47].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the identifier locale as is done in OpenID Connect. The description there is

End-User's locale, represented as a BCP47 [RFC5646] language tag. This is typically an ISO 639 Alpha-2 [ISO639] language code in lowercase and an ISO 3166-1 Alpha-2 [ISO3166‑1] country code in uppercase, separated by a dash. For example, en-US or fr-CA. As a compatibility note, some implementations have used an underscore as the separator rather than a dash, for example, en_US

@danielfett
Copy link
Contributor Author

Please review this new approach: #260

@danielfett danielfett changed the title Introduce new query language Introduce new query language (Approach 1) Sep 18, 2024
@danielfett
Copy link
Contributor Author

I'll close this PR as it seems that there is consensus that #266 is the better approach.

@danielfett danielfett closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for new query language