-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retaining byte string serialization variants #132
Comments
Looking at the implementation a bit more closely, I noticed when serializing into diagnostic notation, the tags that indicate special handling on the JSON side are conveniently used to also guide display in diagnostics. While it's perfectly practical to keep handling of those tags in there, a full solution to retaining serialization details would allow moving that code out into a more heuristic annotation step. It could look like this: Binary CBOR doesn't get any diagnostic-format hints at parsing time, and all unannotated byte strings are expresssed by the diagnostic encoder's default. But if the tree is passed to a mutating walker inbetween that fills hints, some being to interpret the tags, then that step would fill the gaps. As a benefit, there'd be the option for the user to either preserve the serialization types for data ingested from diagnostic notation, or to clear them all out to purely apply the encoder's preferences, or to replace the original versions with what the (or, moreover, some) annotator sets. |
One aspect of this is that not only do the strings have serialization variants, their diagnostic notation may also be a concatenation of differently encoded chunks. I'm not sure what would be a good level of modelling here. Full round-tripping of arbitrarily diagnostic notation strings may or may not be desirable; if it is not (and I wouldn't need it), preservation of diagnostic notation would be best-effort. (So for example, If we went for full roundtripping, options would be to have ByteString contain a single Vec and a parallel |
Byte strings have two wide-spread serialization variants: 'text' and h'74657874' (and the rarer b32, h32 and b64, which I personally don't care about but hey they're there) prefixes. It would be nice if this could be preserved, maybe as an extra Option property of ByteString.
Looking at RFC8610 Appendix G Extended Diagnostic Notation provides even more options (including internal whitespace and embedded CBOR); they are more complex and not really on my wish-list, but it might be good to be aware of it when implementing to not duplicate work if that later becomes relevant.
This would be especially convenient when building a diagnostic notation programmatically.
This would probably share patterns with #117, in that it is a property that is set when coming from DN, but unavailable when coming from CBOR. Filling out those gaps when going from arbitrary CBOR to DN could be done by the user at the AST stage by applying arbitrary heuristics, some of which may be provided by cbor-diag-rs, but that's ultimately application specific. (For example, a simple universal heuristic would be taking the ratio of printable ASCII characters; a more application specific choice might be guided by CDDL).
The text was updated successfully, but these errors were encountered: