Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support JSON Schema as input for a Type #1159

Draft
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

TizzySaurus
Copy link
Contributor

@TizzySaurus TizzySaurus commented Oct 2, 2024

Closes #729.

What

This PR adds @ark/jsonschema, which is a library enabling users to convert a valid JSON Schema schema into an ArkType type.

API

Below is a basic sample-use of @ark/jsonschema:

import { parseJsonSchema } from "@ark/jsonschema"

const type = parseJsonSchema({type: "string"})
// ^? 
// NB: `type` is correctly inferred as `Type<string>`, and can now be used as-per any other ArkType type

Remaining Work Before Merging

  • For some reason the innerParseJsonSchema const is giving a type error Type "string" is not assignable to type "never". It seems my explicit type definition is somehow wrong 🤷

  • For some reason parseJsonSchema({type: "array", items: {type: "string"}}) raises TypeError: this.Schema1Apply is not a function despite working against an earlier ArkType version (probably a bug in ArkType that I've not yet delved into to properly give details).

  • Similar to above, const t = parseJsonSchema({oneOf: [{type: "string"}, {type: "number"}]}) raises TypeError: this.CompositionKeywords1Apply is not a function:
    image

  • Tests are only partially implemented.

    • I wasn't entirely confident on what should/shouldn't be tested, and where things should/shouldn't be tested, so am waiting on feedback for this before finishing the tests.
      • For example, I have tests for {type: "string", "maxLength": 5} and {type: "string", "pattern": "es"}, but not {type: "string", "maxLength": 5, "pattern": "es"}. Is it worth having tests for the various "permutations"? I think this is perhaps overly excessive?
    • All test files also give this linting errors stating it can't find the @ark/jsonschema package:
      image
  • Haven't yet properly filled out ark/jsonschema/CHANGELOG.md.

  • Constraints are correctly inferred, but should ideally be displayed more "prettily" (see below image):

image
  • I'd like to get custom type-errors working again, however the current problem is that calling parseJsonSchema with an invalid JSON Schema always results in a excessively deep and possibly infinite error, making this [seemingly] impossible.
    image

Known Limitations (things I didn't implement from JSON Schema spec)

  • No support for JSON Schema dependencies keyword on objects.
  • No support for JSON Schema if/else/then.
    • Sort of not required, since you can write the same (admittedly more verbosely) by combining any/oneOf/allOf/not.
  • multipleOf on numbers only supports integer divisors, due to ArkType only supporting integer divisors.

ssalbdivad and others added 30 commits June 5, 2024 01:23
@TizzySaurus TizzySaurus changed the title Add JSON Schema to ArkType conversion feat: Add JSON Schema to ArkType conversion Oct 2, 2024
ark/jsonschema/json.ts Outdated Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this change got here... I guess it can probably be safely deleted?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you can remove this for sure, I'm not even using changesets anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in f7bd853 👍

Copy link
Member

@ssalbdivad ssalbdivad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really thorough and this will definitely provide a ton of value!

I didn't review the types in a ton of detail because I'm still worried it's a lot of added complexity for very marginal benefit, although I know it's fun to implement 😄

I think the best path forward regardless of whether we eventually decide to add inference is to get a version of this merged without the inference since there would likely need to be a lot of changes to get the same validation/perf/accuracy as the core parser.

Maybe keep the logic around on your branch so we can use it as a reference if we do eventually decide to include JSON schema inference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you can remove this for sure, I'm not even using changesets anymore

)
}
})
return errors.length === 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In each of these cases, instead of keeping track of the errors with a false[], you can just check ctx.hasError() when needed.

Also, by default it's important to try and use expected + actual for error messages so that they can compose in unions. message is only for the full error with contextualized path which generally should just be handled by the default.

It can be a bit tricky to word certain errors so they fit in Must be {expected} (was {actual}), but there's usually a reasonable way. Try to restructure as many of these errors as possible to use those.

if ("properties" in jsonSchema) {
if ("required" in jsonSchema) {
if (jsonSchema.required.length !== new Set(jsonSchema.required).size) {
ctx.reject({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd write this as a standalone function called duplicateElementsOf that iterates over an array and each time it encounters an element it's seen, it adds a DuplicateData like { value: unknown, indices: number[] }.

This could be added to @ark/util in arrays.ts since it would be general, but it would also help to write a much more descriptive error message here

return {
optionalKeys: optionalKeys.map(key => ({
key,
value: innerParseJsonSchema.assert(jsonSchema.properties![key]).internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, instead of needing to reassert this later, once you validate a property is present, I'd assign it to its own variable like properties right away.

Although in this case it's not clear from the logic in the function why properties would always be defined. If that's something external where we know either required or properties will be defined when we call this function or something, it's best to refactor it in a way where those external assumptions don't exist.

const parseAdditionalProperties = (jsonSchema: JsonSchema.ObjectSchema) => {
if (!("additionalProperties" in jsonSchema)) return

const properties = Object.keys(jsonSchema.properties ?? {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be clearer (and very slightly more performant) as something like:

const properties = jsonSchema.properties ? Object.keys(jsonSchema.properties) : []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return matchedValidator !== undefined
}) as Type<unknown>
)
// TODO: Theoretically this shouldn't be necessary due to above `ctx.mustBe` in narrow???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description can be useful for introspectability, but yeah generally you'd either want to use that or a custom error in narrow if you need additional context.

Since it looks like oneOfValidators is already known before the type is created, it's important that stuff like mustBe that would run on every failed validation is just evaluated when the type is created, and shouldn't duplicate the logic of the description.

}

if ("enum" in jsonSchema) {
return rootSchema(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best just use type.enumerated here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was done before that existed but migrated to it in 03ba515

"Provided JSON Schema cannot have both 'const' and 'enum' keywords."
)
}
return rootSchema({ unit: jsonSchema.const }) as unknown as Type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use type.unit (no reason to drop down into @ark/schema in a case like this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 03ba515


arktypeArraySchema.predicate = predicates

return rootSchema(arktypeArraySchema) as unknown as Type<JsonSchema.Json[]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ever are casting a return multiple times like this, best to just cast it to never and annotate the return type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this and other instances of the same thing in 01e296c

...inferJsonSchema<arraySchema["additionalItems"]>[]
]
>
: // JSON Schema spec explicitly says that additionalItems MUST be ignored if items is not an array, and it's NOT an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have an obligation to support everything that's technically valid JSON schema if it seems like a clear mistake.

I'd rather throw here

Copy link
Contributor Author

@TizzySaurus TizzySaurus Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was a "should" then I'd agree, but because it's "must" I'd prefer to keep it as-is.

If "items" is an array of schemas, validation succeeds if every instance element at a position greater than the size of "items" validates against "additionalItems".

Otherwise, "additionalItems" MUST be ignored, as the "items" schema (possibly the default value of an empty schema) is applied to all elements.

https://json-schema.org/draft-07/draft-handrews-json-schema-validation-01#rfc.section.6.4.2

@ssalbdivad ssalbdivad changed the title feat: Add JSON Schema to ArkType conversion feat: JSON Schema to ArkType conversion Oct 7, 2024
@ssalbdivad ssalbdivad changed the title feat: JSON Schema to ArkType conversion feat: support JSON Schema as input for a Type Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Support JSON-schema as an input format
2 participants