Skip to content

enabling format-assertion via adding vocabulary to a new meta-schema (dialect?) #1355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bernhardreiter opened this issue May 2, 2025 · 7 comments

Comments

@bernhardreiter
Copy link

What is the supported way to check a JSON file via a (local) schema that wants to enforce the format assertions?
It is already supported?

My example is what oasis-tcs/csaf#962 is trying to do.

https://github.com/oasis-tcs/csaf/blob/78f4b61ba2ba05d375a452ef7938abde16949457/csaf_2.1/json_schema/csaf_json_schema.json#L2

sets a custom dialect:

  "$schema": "https://docs.oasis-open.org/csaf/csaf/v2.1/meta_json_schema.json",

which locally is
https://github.com/oasis-tcs/csaf/blob/78f4b61ba2ba05d375a452ef7938abde16949457/csaf_2.1/json_schema/meta_json_schema.json
and just aims for adding the necessary vocabulary to treat formats as assertion.

However as can be seen in oasis-tcs/csaf#962 (comment) and above, that is needs to explicitly set jsonschema.DRAFT202012 to just add the

resource = Resource(contents=schema, specification=jsonschema.DRAFT202012)

I haven't found a public API way to give the new dialect to the referencing.jsonschema._SPECIFICATIONS. (As we do not change the referencing behaviour, it may or not may be okay to use specification=jsonschema.DRAFT202012., it just feels not right as the meta schema is available.

The other concern is that adding the vocabularies does not seem to change assertion checking in the implementation, only adding format_checker=Draft202012Validator.FORMAT_CHECKER decides this. According to my (limited) understanding adding the vocabulary of format-assertion should enable the checks by default. Okay - Draft202012Validator is for the Draft 2020-12 where formation assertion is off by default, but can be enabled.

If it already works, then an example of how to create a new meta schema (dialect) that has format-assertion enable like it should be done in accordance with Draft 2020-12 would be very helpful.

@tschmidtb51
Copy link

@Julian I prepared a repo as small example for this issue.

@tschmidtb51
Copy link

I'm happy to provide the code to fix the issue - I just want to make sure, we are on the same page and have the same vision what it should look like.

@Julian
Copy link
Member

Julian commented May 19, 2025

Hi there, there's two potentially unrelated issues here I think though maybe both are happening together in your case:

The other concern is that adding the vocabularies does not seem to change assertion checking in the implementation, only adding format_checker=Draft202012Validator.FORMAT_CHECKER decides this. According to my (limited) understanding adding the vocabulary of format-assertion should enable the checks by default. Okay - Draft202012Validator is for the Draft 2020-12 where formation assertion is off by default, but can be enabled.

We (the library) fails exactly one test from the ~1300 tests in the test suite, and it's this (enabling vocabularies in a metaschema), so it's a known issue, one that's essentially waiting on a whole rewrite of the Validator object (one needed to support more recent dialects' statefulness anyhow). For the current Validator protocol I'm slightly reluctant even to look at a PR, because besides the trickiness in actually implementing this, the vocabulary system has been declared "unstable" which means it may change yet again in the next version of the specification meaning it's not clear we'll get the API right anyhow (the spec has made extremely poor backwards compatibility guarantees historically). If you take a shot at it I'll at least look at it if it's small and self contained, but there's honestly a chance I look at it and say it's not worth adding yet until the rewrite (one which is itself not necessarily imminent given I'm no longer really sponsored to work on the library full-time).

Pragmatically, I'm not sure even as a schema author that I'd really bother rather than just providing the format checker? But feel free to show what you have in mind anyhow!

I haven't found a public API way to give the new dialect to the referencing.jsonschema._SPECIFICATIONS. (As we do not change the referencing behaviour, it may or not may be okay to use specification=jsonschema.DRAFT202012., it just feels not right as the meta schema is available.

I agree with this, there's an open issue on referencing for part of this which is relevant, and then even once that's done it'll need a piece here in this library I believe as well.

@tschmidtb51
Copy link

@Julian Thank you for your fast response!

Pragmatically, I'm not sure even as a schema author that I'd really bother rather than just providing the format checker? But feel free to show what you have in mind anyhow!

When writing an ISO standard (or really any standard), we can provide the reference implementation. We might even provide a validation tool / service. But many people out there will implement their own tools. As format validations is optional, it is easy to miss that it is required which comes with all sorts of interoperability issues. Therefore, we want to enforce the validation. Does that clarify our special interest?

@tschmidtb51
Copy link

I agree with this, there's an open issue on referencing for part of this which is relevant, and then even once that's done it'll need a piece here in this library I believe as well.

So, if we fix that issue (and find a way to include that in this library), would that solve the error regarding the unknown specification (but not do the format validation), correct?

@tschmidtb51
Copy link

If you take a shot at it I'll at least look at it if it's small and self contained, but there's honestly a chance I look at it and say it's not worth adding yet until the rewrite

What about this idea: we check whether the metaschema uses the vocubulary to enforce the validation and set the validator... (Not sure, if that's possible - just an idea).

@Julian
Copy link
Member

Julian commented May 20, 2025

When writing an ISO standard (or really any standard), we can provide the reference implementation. We might even provide a validation tool / service. But many people out there will implement their own tools. As format validations is optional, it is easy to miss that it is required which comes with all sorts of interoperability issues. Therefore, we want to enforce the validation. Does that clarify our special interest?

Are you saying you're authoring a new dialect of JSON Schema and one where format validation is required essentially? Or are you writing some other standard and defining its behavior using JSON Schema?

Note that the format keyword inherentlly itself comes with interoperability issues -- the most notable one is that even with the format assertion vocabulary, the dialect of regular expressions (accepted by the "regex" format) is only RECOMMENDED to be ECMA 262, but other implementations of JSON SChema are free to use other ones, and most do -- they use the dialect of regexes available in the host language.

The email format also would be required to be implemented, but I don't really have intention of ever doing so in this library, as I don't think validation of emails is a sensical concept in a validation library in any context other than to show a warning in a frontend which a user can ignore -- which would essentially mean we could never validate instances of the format assertion vocabulary "out of the box" -- an end user or library on top of this one could of course register an email validator library with the format keyword and then it could succeed.

The last point is of course again independent -- meaning first we need to respect vocabularies declared in the metaschema, but then once we do so it would still need to fail by default unless someone registers a compliant email validator.

would that solve the error regarding the unknown specification (but not do the format validation), correct?

That would let you define the referencing behavior for a new (or extending) specification at least yep.

Just trying to make sure you understand all the pieces here.

As i say I'd love to make progress on some of these areas but not much is happening without funding right now, I'm mostly trying to keep the lights on and responding to issues and doing small things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants