-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(json-schema): support propertyNames for sample generation #9739
base: master
Are you sure you want to change the base?
Conversation
Hi, Thanks for the PR! I have number of questions and suggestions around the PR. 1. FormatRegistryWhy creating FormatRegistry at all? 2. Modifying all generators to support arbitrary indexAre these changes necessary at all? Solving the case for idx inside generators is breaking separation of concerns.
When we have ability to turn on randomnes on on demand, it will help up adding What do you think? |
Thanks for the review and feedback! Let me first respond to the items and wait for your confirmation which direction you prefer with information of my reasoning in mind. Then I will change accordingly.
I wanted to add an options argument to the generator and I noticed I had to repeat it multiple times in this large case statement: https://github.com/swagger-api/swagger-ui/blob/master/src/core/plugins/json-schema-2020-12-samples/fn/types/string.js#L40 . Then I saw that format is the only place where such case statement is used, whereas for media type and encoder it's reusing the API with default values to make it look a lot cleaner: https://github.com/swagger-api/swagger-ui/blob/master/src/core/plugins/json-schema-2020-12-samples/fn/types/string.js#L145 . So I thought by doing this, we make it consistent between format/mediatype/encoder while removing a lot of case statements and imports in string.js. For the scenario of adding new formats, I didn't really see a difference in effort between having to add it to FormatRegistry or having to add a case-statement? So it looked to me more like an accidental inconsistency, but if you say it was designed in this specific way, then it may not make sense to change. Let me know your final verdict if it's better to keep the change or add back the case statement? You are certainly right: this change doesn't strictly relate to the feature at hand.
Thanks for pointing to json-schema-faker. I looked into it, and if I understand correctly, your suggestion would entail 3 parts. 1. Supply options via a separate OptionsAPI instead of via a function argumentI like the idea of an OptionsAPI in general, but the reason I thought it would be more logical to supply options via an argument in this specific case, is that for this feature it's only needed when generating propertyNames, so only for specific function calls. Configuring options through an OptionsAPI sounds like a good idea in general, but felt more appropriate for longer living options. It's still possible to use an OptionsAPI, but then I would store the current option setting as a backup, enforce it to be what I need, and put it back when done. Hope to hear your preference. (As for the separation of concerns, I guess that either way how the options are passed, the generator will need to be aware about them?) 2. Use randomness instead of indexingI did explicitly choose for indexing over full randomness, for these reasons:
This wouldn't rule out that in the future we can also support an option for full randomness, so options could be static, indexed, or random, but that could be a separate PR. Let me know your thoughts. 3. Make the option a function instead of a parameterIn json-schema-faker they don't enable/disable randomness through e.g. a boolean parameter, but rather let you supply a random function. It gives a bit more flexibility. Supposedly if you want no randomness you can supply a function returning a fixed value. I guess that makes sense so I'll look into it. Note that these 3 are independent, so we can mix and match. I could create an option to supply an indexing function and pass it through either an argument or OptionsAPI. Unrelated side questionWhile looking at this, I did notice the way string constraints are apply is not really great. For example if format is e-mail and the generator gives [email protected], and the schema also has maxLength: 10 which is applied after the generator, you would get john.doe@e which is not a valid e-mail. If instead the generator would be aware about the maxLength then it could generate [email protected] which is at least valid. I saw that the schema was supplied when using the API https://github.com/swagger-api/swagger-ui/blob/master/src/core/plugins/json-schema-2020-12-samples/fn/types/string.js#L37 but not in the case statement, and it is never used. Do you happen to know if this choice was made intentionally? I don't think it's a big deal, so not something I want to improve right now. But was wondering about the direction here, which maybe could help put the choices above in perspective. |
OK let's introduce
Yes, it's build for long living options. I've introduced
Yes, but json-schema-faker doesn't assume that the generators could generate constant values. This is how Draft 5 sample generator was used and people expect Draft 2020-12 to work in similar way. That's why I mentioned that when
It's not as simple as that. For example look at https://github.com/json-schema-faker/json-schema-faker/blob/master/src/lib/core/random.mjs#L31 If we supply fixed/constant value we're always get the
I've implemented the current code with that bug in mind. As the sample generator now generates constant/fixed values and there is now way for us to randomly reiterate and generate next random value. In order to fix this we should do the same thing that json-schema-faker does and that is not constraining the generated values of type string that came from I've addressed that in separate PR: #9796 Overall IMHO the idea of solving property names with 1.introduce proper randomness and turn it on when generating property names 2.create specialized propertyNames generator that will use underlying generators producing constant values and somehow modifying those generated constant values with I think option 2. is unnecessarily complex to implement. We would have to know where to add |
Ok will have a look at it again with the above in mind. @char0n One clarification on:
So you say it's build for long living options, but you also say you want to use it exclusively, so I assume that includes for short living options like during propertyName generation? In other words, you agree I would before starting to generate property names, store the current option setting as a backup, enforce it to be what I need for property name generation, and put it back when done. Or are you rather saying that for propertyName generation to work, this option should be enabled for the full schema, and if not enabled then propertyName generation should keep working in the broken way? And if so would you want it default enabled, or is there a UI to enable it? |
Yes. Sorry for contradiction. It's designed to handle long living options, but can also be used to solve ad-hoc problems like these. We want to introduce all of this in small PRs (layers). I can image following steps:
|
Ok started with adding FormatRegistry in #9799 |
Now added support for randomness in #9810 |
Use propertyNames schema to generate property names for samples with additionalProperties
Refs #8913
Resolves #5713
Description
Json Schema 2020-12 has support for supplying a propertyNames schema, which the names of the (additional) properties in the object have to adhere to. However when generating samples this was currently ignored and the the properties would always be named additionalProp1, additionalProp2 etc. With this PR, the property names in the samples will follow the propertyNames schema. Property names are always strings, so this is mostly reusing the existing code of sample generation for string values. Except objects cannot have multiple properties with the same name, so the string generation now supports adding in an index in the places where it was previously producing fixed values.
Motivation and Context
Many schemas describe dictionaries that map variable property names to property values, which is modeled with additionalProperties. As described in #5713 , the generated samples with additionalPropx have issues: 1) they are not good for readability as they don't provide any context; 2) they may actually be violating the propertyNames schema; and 3) the number of additional property samples cannot be controlled. Writing custom examples is not a good solution, because typically these additional properties have full-fledged object schemas as values, so then all the power of the very useful auto-generated samples is lost. This feature generates additional property names according to the propertyNames schema (partial implementation of #8913), and the propertyNames examples can optionally be used to control the names and number of sample properties (resolution for #5713).
How Has This Been Tested?
Added unit tests
Performed manual test
Screenshots (if appropriate):
Checklist
My PR contains...
src/
is unmodified: changes to documentation, CI, metadata, etc.)package.json
)My changes...
Documentation
Automated tests