-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Definition of non-strict mode #154
Comments
Ok, at this point I believe it would be not only beneficial but is actually needed to be able to properly deal with fixing strict mode. The fact how code for both modes is shared it would be easy to break one mode when trying to improve the other. Actually, I think now it wouldn't be such a bad idea to split jsmn into strict and non strict version, still selectable with a define. This should make fixing bugs for each mode a lot easier. |
I would hesitate to split the code for strict and non-strict mode. At that point, you start to violate the DRY (i.e. don't repeat yourself) principle. After the split, they would certainly have a certain amount of duplicated code, and then, that means making changes in two places whenever updating the common code. Suppose you accidentally make a mistake in one and not the other--now you have an undocumented difference between them that might go unnoticed. Instead, I would lean toward making non-strict into a few simple relaxations of requirements that might be useful, such as allowing flexible primitives and allowing primitives to be keys. (Similar to what @zserge suggested in another comment). After all, this is primarily a JSON parsing library. Parsing things that are not JSON is icing on the cake, but it isn't the main goal here. Non-strict mode should not make maintaining the code too complicated. In fact, those two relaxations are all I would provide in non-strict mode. To be more specific:
Allowing objects and arrays as keys starts adding extra complication (see issue #193 which I just created a little while ago), so I lean toward not even allowing that. It's useful in some cases, sure, but not valid JSON, so no reason to go out of the way to support it. Non-strict mode also currently allows strings and primitives as the root token, but since that is allowed under RFC-8259, I would say that should simply become part of strict mode. |
I certainly agree that objects and arrays should not be allowed as keys and your issue #193 perfectly states why; it leads to too much ambiguity with the tokens we have. Beyond these two relaxations we also already have the ability to parse multiple json objects in a single string. The last rule that I can think of now is being able to have key/value pairs and arrays at the root level. If you have a colon after a token, then the previous token becomes a key and the next token becomes a value. You can make sure you don't have two keys following each other Commas would become ambiguous at the root level as I think that the target I am shooting for are root level lists like the following with whatever white space the user desires.
With that and the ability to have primitives as keys and made of any non-special character, I think we would match but document what we currently have. Do we want to allow key/value pairs in arrays? This would make objects and arrays the same thing. |
I had this typed already, but we had a power outage last night before I could submit it. Since my previous comment here predates my pull request, allow me to revise my list of deviations. These are the ones I implemented in PR #194:
(3) is the new one. This actually wasn't a feature at all before, but it seemed reasonable since non-strict allows something similar for primitives. This way, applications can implement their own escape sequences or allow tabs, newlines, or other control characters in strings.
I discussed this a lot in #159, but I don't see this as a deviation from RFC 8259. Thus, even if we make single-parsing the default, I see multiple-parsing as totally separate from strict/non-strict mode.
The current version of jsmn allows these sorts of things at the root level, but personally, I think it shouldn't. I don't think there are many use cases for this. If you want key/value pairs, just use an object, and if you want a list of things, just use an array. (Note that I see multiple objects in sequence at the root level differently from arrays: they are separate, disconnected objects. E.g. you have a microcontroller that receives commands formatted as JSON objects. Multiple objects back to back aren't a list of related things, they are simply separate commands.)
Similar would be allowing unpaired elements in objects. (They wouldn't quite be the same thing unless we modified both.) I think doing that makes a little more sense. I can't quite put my finger on why, but unpaired elements in objects just feels less wrong than key/value pairs in arrays. I have something like this in the
Arguably, keyless values could be more useful than valueless keys, but valueless keys are much simpler to implement (e.g. if the parser sees an array opening after an object opening, with valueless keys, it can reject right away. With keyless values, it can't know whether an array is allowed until it sees whether a colon or comma/object closing follows). So, in the interest of minimizing maintenance headache, I stuck with valueless keys. Since each non-strict feature adds to the maintenance burden, we should also consider whether this is really a useful feature. A similar result could be accomplished with
Is this useful enough to be worth adding as a non-strict feature? |
I believe it would be really beneficial to create a proper definition of how non-strict mode works. I have looked at the explanation at https://zserge.com/jsmn.html but even example given there doesn't really work well with the rules stated above it. The idea is somewhere there but it's not as obvious as it should be and that can and does introduce problems when looking for bugs in non-strict mode.
The text was updated successfully, but these errors were encountered: