Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permissive multipart parsing for compatibility with dendrite #1909

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

olivia-fl
Copy link
Contributor

@olivia-fl olivia-fl commented Sep 15, 2024

RFC 2046 is somewhat ambiguous on whether or not it's valid to omit the preceding CRLF for the first boundary. The prose on page 19 suggests that it is not:

The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.

NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no preamble:

dash-boundary := "--" boundary
                 ; boundary taken from the value of
                 ; boundary parameter of the
                 ; Content-Type field.

multipart-body := [preamble CRLF]
                  dash-boundary transport-padding CRLF
                  body-part *encapsulation
                  close-delimiter transport-padding
                  [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF for the first boundary ([dendrite/#3414][2]), which were rejected by the previous ruma parsing logic.

RFC 2046[1] is somewhat ambiguous on whether or not it's valid to omit the
preceding CRLF for the first boundary. The prose on page 19 suggests
that it is not:

> The boundary delimiter MUST occur at the beginning of a line, i.e.,
> following a CRLF, and the initial CRLF is considered to be attached
> to the boundary delimiter line rather than part of the preceding
> part. The boundary may be followed by zero or more characters of
> linear whitespace. It is then terminated by either another CRLF and
> the header fields for the next part, or by two CRLFs, in which case
> there are no header fields for the next part. If no Content-Type
> field is present it is assumed to be "message/rfc822" in a
> "multipart/digest" and "text/plain" otherwise.
>
> NOTE: The CRLF preceding the boundary delimiter line is conceptually
> attached to the boundary so that it is possible to have a part that
> does not end with a CRLF (line break). Body parts that must be
> considered to end with line breaks, therefore, must have two CRLFs
> preceding the boundary delimiter line, the first of which is part of
> the preceding body part, and the second of which is part of the
> encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no
preamble:

> dash-boundary := "--" boundary
>                  ; boundary taken from the value of
>                  ; boundary parameter of the
>                  ; Content-Type field.
>
> multipart-body := [preamble CRLF]
>                   dash-boundary transport-padding CRLF
>                   body-part *encapsulation
>                   close-delimiter transport-padding
>                   [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF
for the first boundary[2], which were rejected by the previous ruma
parsing logic.

[1]: https://datatracker.ietf.org/doc/html/rfc2046
[2]: matrix-org/dendrite#3414
@olivia-fl
Copy link
Contributor Author

@Xiretza those were both good suggestions, applied!

@zecakeh
Copy link
Contributor

zecakeh commented Sep 17, 2024

Thanks, this looks good!

@zecakeh zecakeh merged commit 61f5150 into ruma:main Sep 17, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants