diff --git a/covered-appendix/draft-ietf-cbor-update-8610-grammar.html b/covered-appendix/draft-ietf-cbor-update-8610-grammar.html new file mode 100644 index 0000000..7e898e7 --- /dev/null +++ b/covered-appendix/draft-ietf-cbor-update-8610-grammar.html @@ -0,0 +1,1978 @@ + + +
+ + + +Internet-Draft | +CDDL grammar updates | +June 2024 | +
Bormann | +Expires 21 December 2024 | +[Page] | +
The Concise Data Definition Language (CDDL), as defined in +RFC 8610 and RFC 9165, +provides an easy and unambiguous way to express structures for +protocol messages and data formats that are represented in CBOR or +JSON.¶
+The present document updates RFC 8610 by addressing errata and making +other small fixes for the ABNF grammar defined for CDDL there.¶
+This note is to be removed before publishing as an RFC.¶
++ The latest revision of this draft can be found at https://cbor-wg.github.io/update-8610-grammar/. + Status information for this document may be found at https://datatracker.ietf.org/doc/draft-ietf-cbor-update-8610-grammar/.¶
++ Discussion of this document takes place on the + CBOR Working Group mailing list (mailto:cbor@ietf.org), + which is archived at https://mailarchive.ietf.org/arch/browse/cbor/. + Subscribe at https://www.ietf.org/mailman/listinfo/cbor/.¶
+Source for this draft and an issue tracker can be found at + https://github.com/cbor-wg/update-8610-grammar.¶
++ This Internet-Draft is submitted in full conformance with the + provisions of BCP 78 and BCP 79.¶
++ Internet-Drafts are working documents of the Internet Engineering Task + Force (IETF). Note that other groups may also distribute working + documents as Internet-Drafts. The list of current Internet-Drafts is + at https://datatracker.ietf.org/drafts/current/.¶
++ Internet-Drafts are draft documents valid for a maximum of six months + and may be updated, replaced, or obsoleted by other documents at any + time. It is inappropriate to use Internet-Drafts as reference + material or to cite them other than as "work in progress."¶
++ This Internet-Draft will expire on 21 December 2024.¶
++ Copyright (c) 2024 IETF Trust and the persons identified as the + document authors. All rights reserved.¶
++ This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents + (https://trustee.ietf.org/license-info) in effect on the date of + publication of this document. Please review these documents + carefully, as they describe your rights and restrictions with + respect to this document. Code Components extracted from this + document must include Revised BSD License text as described in + Section 4.e of the Trust Legal Provisions and are provided without + warranty as described in the Revised BSD License.¶
+The Concise Data Definition Language (CDDL), as defined in +[RFC8610] and [RFC9165], +provides an easy and unambiguous way to express structures for +protocol messages and data formats that are represented in CBOR or +JSON.¶
+The present document updates [RFC8610] by addressing errata and making +other small fixes for the ABNF grammar defined for CDDL there.¶
+ +A number of errata reports have been made around some details of text +string and byte string literal syntax: [Err6527] and [Err6543]. +These are being addressed in this section, updating details of the +ABNF for these literal syntaxes. +Also, [Err6526] needs to be applied (backslashes have been lost during +RFC processing in some text explaining backslash escaping).¶
+These changes are intended to mirror the way existing implementations +have dealt with the errata. They also use the opportunity presented +by the necessary cleanup of the grammar of string literals for a +backward compatible addition to the syntax for hexadecimal escapes. +The latter change is not automatically forward compatible (i.e., CDDL +specifications that make use of this syntax do not necessarily work +with existing implementations until these are updated, which this +specification recommends).¶
+The ABNF used in [RFC8610] for the content of text string literals +is rather permissive:¶
+This allows almost any non-C0 character to be escaped by a backslash,
+but critically misses out on the \uXXXX
and \uHHHH\uLLLL
forms
+that JSON allows to specify characters in hex (which should be
+applying here according to Bullet 6 of Section 3.1 of [RFC8610]).
+(Note that we import from JSON the unwieldy \uHHHH\uLLLL
syntax,
+which represents Unicode code points beyond U+FFFF by making them look
+like UTF-16 surrogate pairs; CDDL text strings are not using UTF-16 or
+surrogates.)¶
Both can be solved by updating the SESC production.
+We use the opportunity to add a popular form of directly specifying
+characters in strings using hexadecimal escape sequences of the form
+\u{hex}
, where hex
is the hexadecimal representation of the
+Unicode scalar value.
+The result is the new set of rules defining SESC in Figure 2:¶
(Notes:
+In ABNF, strings such as "A"
, "B"
etc. are case-insensitive, as is
+intended here.
+We could have written %x62
as %s"b"
, but didn't, in order to
+maximize ABNF tool compatibility.)¶
Now that SESC is more restrictively formulated, this also requires an +update to the BCHAR production used in the ABNF syntax for byte string +literals:¶
+With the SESC updated as above, \'
is no longer allowed in BCHAR;
+this now needs to be explicitly included.¶
Updating BCHAR also provides an opportunity to address [Err6278], +which points to an inconsistency in treating U+007F (DEL) between SCHAR and +BCHAR. +As U+007F is not printable, including it in a byte string literal is +as confusing as for a text string literal, and it should therefore be +excluded from BCHAR as it is from SCHAR. +The same reasoning also applies to the C1 control characters, +so we actually exclude the entire range from U+007F to U+009F. +The same reasoning then also applies to text in comments (PCHAR). +For completeness, all these should also explicitly exclude the code +points that have been set aside for UTF-16's surrogates.¶
+(Note that, apart from addressing the inconsistencies, there is no +attempt to further exclude non-printable characters from the ABNF; +doing this properly would draw in complexity from the ongoing +evolution of the Unicode standard that is not needed here.)¶
+The above changes also cover [Err6543] and [Err6526]; see +Appendix B for details. +Please also consult Figures 8 and +9 for examples that use the updated string +syntax.¶
+Now move the rest of this section 2.2 to Appendix B.¶
+The ABNF used in [RFC8610] for the content of byte string literals +lumps together byte strings notated as text with byte strings notated +in base16 (hex) or base64 (but see also updated BCHAR production above):¶
+Errata report 6543 proposes to handle the two cases in separate +productions (where, with an updated SESC, BCHAR obviously needs to be +updated as above):¶
+This potentially causes a subtle change, which is hidden in the WS production:¶
+This allows any non-C0 character in a comment, so this fragment +becomes possible:¶
++foo = h' + 43424F52 ; 'CBOR' + 0A ; LF, but don't use CR! +' +¶ +
The current text is not unambiguously saying whether the three apostrophes
+need to be escaped with a \
or not, as in:¶
+foo = h' + 43424F52 ; \'CBOR\' + 0A ; LF, but don\'t use CR! +' +¶ +
... which would be supported by the existing ABNF in [RFC8610].¶
+This document takes the simpler approach of leaving the processing of
+the content of the byte string literal to a semantic step after
+processing the syntax of the bytes
/BCHAR
rules as updated by
+Figure 2 and Figure 4.¶
The rules in Figure 7 are therefore applied to the result of this
+processing where bsqual
is given as h
or b64
.¶
Note that this approach also works well with the use of byte strings
+in Section 3 of [RFC9165].
+It does require some care when copy-pasting into CDDL models from ABNF
+that contains single quotes (which may also hide as apostrophes
+in comments); these need to be escaped or possibly replaced by %x27
.¶
Finally, our approach lends support to extending bsqual
in CDDL
+similar to the way this is done for CBOR diagnostic notation in [I-D.ietf-cbor-edn-literals].
+(Note that the processing of string literals now is quite similar between
+CDDL and EDN, except that CDDL has ";
"-based end-of-line comments, while EDN has
+two comment syntaxes, in-line "/
"-based and end-of-line "#
"-based.)¶
The CDDL example in Figure 8 demonstrates various escaping
+techniques.
+Obviously in the literals for a
and x
, there is no need to escape
+the second character, an o
, as \u{6f}
; this is just for demonstration.
+Similarly, as shown in c
and z
there also is no need to escape the
+🁳
or ⌘
, but escaping them may be convenient in order to limit the character
+repertoire of a CDDL file itself to ASCII [STD80].¶
In this example, the rules a to c and x to z all produce strings with
+byte-wise identical content, where a to c are text strings, and x to z
+are byte strings.
+Figure 9 illustrates this by showing the output generated from
+the start
rule in Figure 8, using pretty-printed hexadecimal.¶
The two subsections in this section specify two small changes to the +grammar that are intended to enable certain kinds of specifications. +These changes are backward compatible, i.e., CDDL files that +comply to [RFC8610] continue to match the updated grammar, but not +necessarily forward compatible, i.e., CDDL specifications that make +use of these changes cannot necessarily be processed by existing [RFC8610] +implementations.¶
+[RFC8610] requires a CDDL file to have at least one rule.¶
+This makes sense when the file has to stand alone, as a CDDL data +model needs to have at least one rule to provide an entry point (start +rule).¶
+With CDDL modules [I-D.ietf-cbor-cddl-modules], CDDL files can also include directives, +and these might be the source of all the rules that +ultimately make up the module created by the file. +Any other rule content in the file has to be available for directive +processing, making the requirement for at least one rule cumbersome.¶
+Therefore, we extend the grammar as in Figure 11 +and make the existence of at least one rule a semantic constraint, to +be fulfilled after processing of all directives.¶
+The existing ABNF syntax for expressing tags in CDDL is:¶
+This means tag numbers can only be given as literal numbers (uints).
+Some specifications operate on ranges of tag numbers, e.g., [RFC9277]
+has a range of tag numbers 1668546817 (0x63740101) to 1668612095
+(0x6374FFFF) to tag specific content formats.
+This can currently not be expressed in CDDL.
+Similar considerations apply to simple values (#7.
xx).¶
This update extends the syntax to:¶
+For #6
, the head-number
stands for the tag number.
+For #7
, the head-number
stands for the simple value if it is in
+the ranges 0..23 or 32..255 (as per Section 3.3 of RFC 8949 [STD94]
+the simple values 24..31 are not used).
+For 24..31, the head-number
stands for the "additional
+information", e.g., #7.25
or #7.<25>
is a float16, etc.
+(All ranges mentioned here are inclusive.)¶
So the above range can be expressed in a CDDL fragment such as:¶
++ct-tag<content> = #6.<ct-tag-number>(content) +ct-tag-number = 1668546817..1668612095 +; or use 0x63740101..0x6374FFFF +¶ +
Notes:¶
+This syntax reuses the angle bracket syntax for generics;
+this reuse is innocuous as a generic parameter/argument only ever
+occurs after a rule name (id
), while it occurs after .
here.
+(Whether there is potential for human confusion can be debated; the
+above example deliberately uses generics as well.)¶
The updated ABNF grammar makes it a bit more explicit that the
+number given after the optional dot is special, not giving the CBOR
+"additional information" for tags and simple values as it is with
+other uses of #
in CDDL.
+(Adding this observation to Section 2.2.3 of [RFC8610] is the subject
+of [Err6575]; it is correctly noted in Section 3.6 of [RFC8610].)
+In hindsight, maybe a different character than the dot should have
+been chosen for this special case, however changing the grammar
+now would have been too disruptive.¶
The grammar fixes and updates in this document are not believed to +create additional security considerations. +The security considerations in Section 5 of [RFC8610] do apply, and +specifically the potential for confusion is increased in an +environment that uses a combination of CDDL tools some of which have +been updated and some of which have not been, in particular based on +Section 2.¶
+This document has no IANA actions.¶
+This appendix is normative.¶
+It provides the full ABNF from [RFC8610] with the updates +applied in the present document.¶
+Move most of the content of 2.2 here.¶
+Many thanks go to the submitters of the errata reports addressed in +this document. +In one of the ensuing discussions, Doug Ewell proposed to define an +ABNF rule NONASCII, of which we have included the essence. +Special thanks to the reviewers Marco Tiloca, Christian Amsüss (shepherd review), Orie Steele (AD review), and Éric Vyncke +(detailed IESG review).¶
+CDDL grammar updates | +plain text | +same as main | +
CDDL grammar updates | +plain text | +diff with main | +