Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

Federation authenticated media endpoints have invalid multipart response #3414

Closed
olivia-fl opened this issue Sep 8, 2024 · 6 comments
Closed

Comments

@olivia-fl
Copy link

Background information

  • Dendrite version or git SHA: 0.13.7+7a4ef24
  • SQLite3 or Postgres?: unknown (not my server)
  • Running in Docker?: unknown
  • go version: unknown
  • Client used (if applicable): Grapevine b4fecbc51719a33d09be1e76d55ae0eec11fb71a (over federation API)

Description

The multipart spec states that every encapsulation boundary in the body should be preceded by a CRLF. Instead, dendrite's response to the federation auth media download endpoint has the -- of the first boundary starting directly at the beginning of the body, with no CRLF.

Ruma expects a preceding CRLF here when parsing the response. As a result, homeserver implementations using ruma are unable to fetch media from dendrite servers over the authed media endpoints. This affects grapevine, and likely also affects conduwuit although I have not tested it.

relevant text from RFC 1341

Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).

NOTE: The CRLF preceding the encapsulation line is considered part of the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, should have two CRLFs preceding the encapsulation line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary.

The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs. This is indeed how such entities should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such entities.

Steps to reproduce

  • issue a /_matrix/federation/v1/media/download/:mediaid request to a dendrite server
  • inspect the response body. Every multipart boundary should start with a CRLF. Instead, the first boundary begins immediately.
@CobaltCause
Copy link

Seems like this is actually a bug in Go's standard library. It actually used to include a leading CRLF but was changed ~13 years ago as a workaround for a bug in a different program: https://codereview.appspot.com/4635063#msg4.

@S7evinK
Copy link
Contributor

S7evinK commented Sep 9, 2024

This might be a problem in Ruma, the Spec wants RFC 2046 for the boundary. (Yea, the MSC linked to RFC 1341, which may had different "rules" for boundaries)

The multipart package, which Dendrite uses, implements RFC 2046.

@olivia-fl
Copy link
Author

@S7evinK Good catch! RFC 2046 seems to have similar text stating that boundaries must be preceded by a CRLF though. On page 19:

The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.

NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.

I haven't read the whole thing in detail, it's definitely possible that I'm missing something. It would be at least somewhat surprising this was a bug sitting in the go standard library for 13 years without anybody else noticing it.

@f0x52
Copy link

f0x52 commented Sep 12, 2024

Coincidentally I read Go's mime/multipart source recently, and noticed that their parser specifically allows the CRLF to be missing at the very start of the response source. It's unclear to me though if this is leniency similar to allowing LF without CR as a terminator, or something that implementations MUST accept

@f0x52
Copy link

f0x52 commented Sep 12, 2024

on page 22 of RFC2046 the multipart body is specified as:

 dash-boundary := "--" boundary
                  ; boundary taken from the value of
                  ; boundary parameter of the
                  ; Content-Type field.

 multipart-body := [preamble CRLF]
                   dash-boundary transport-padding CRLF
                   body-part *encapsulation
                   close-delimiter transport-padding
                   [CRLF epilogue]

Which seems to me like the leading CRLF is only required when a preamble is present, which is optional

@olivia-fl
Copy link
Author

Oh fun. I agree with your reading that the BNF clearly states the first CRLF is not required unless there is a preamble. That seems to conflict with the earlier text in the RFC to me, but either way the next step is to change ruma to be more permissive.

olivia-fl added a commit to olivia-fl/ruma that referenced this issue Sep 15, 2024
RFC 2046 is somewhat ambiguous on whether or not it's valid to omit the
preceding CRLF for the first boundary. The prose on page 19 suggests
that it is not:

> The boundary delimiter MUST occur at the beginning of a line, i.e.,
> following a CRLF, and the initial CRLF is considered to be attached
> to the boundary delimiter line rather than part of the preceding
> part. The boundary may be followed by zero or more characters of
> linear whitespace. It is then terminated by either another CRLF and
> the header fields for the next part, or by two CRLFs, in which case
> there are no header fields for the next part. If no Content-Type
> field is present it is assumed to be "message/rfc822" in a
> "multipart/digest" and "text/plain" otherwise.
>
> NOTE: The CRLF preceding the boundary delimiter line is conceptually
> attached to the boundary so that it is possible to have a part that
> does not end with a CRLF (line break). Body parts that must be
> considered to end with line breaks, therefore, must have two CRLFs
> preceding the boundary delimiter line, the first of which is part of
> the preceding body part, and the second of which is part of the
> encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no
preamble:

> dash-boundary := "--" boundary
>                  ; boundary taken from the value of
>                  ; boundary parameter of the
>                  ; Content-Type field.
>
> multipart-body := [preamble CRLF]
>                   dash-boundary transport-padding CRLF
>                   body-part *encapsulation
>                   close-delimiter transport-padding
>                   [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF
for the first boundary[2], which were rejected by the previous ruma
parsing logic.

[1]: https://datatracker.ietf.org/doc/html/rfc2046
[2]: matrix-org/dendrite#3414
olivia-fl added a commit to olivia-fl/ruma that referenced this issue Sep 15, 2024
RFC 2046[1] is somewhat ambiguous on whether or not it's valid to omit the
preceding CRLF for the first boundary. The prose on page 19 suggests
that it is not:

> The boundary delimiter MUST occur at the beginning of a line, i.e.,
> following a CRLF, and the initial CRLF is considered to be attached
> to the boundary delimiter line rather than part of the preceding
> part. The boundary may be followed by zero or more characters of
> linear whitespace. It is then terminated by either another CRLF and
> the header fields for the next part, or by two CRLFs, in which case
> there are no header fields for the next part. If no Content-Type
> field is present it is assumed to be "message/rfc822" in a
> "multipart/digest" and "text/plain" otherwise.
>
> NOTE: The CRLF preceding the boundary delimiter line is conceptually
> attached to the boundary so that it is possible to have a part that
> does not end with a CRLF (line break). Body parts that must be
> considered to end with line breaks, therefore, must have two CRLFs
> preceding the boundary delimiter line, the first of which is part of
> the preceding body part, and the second of which is part of the
> encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no
preamble:

> dash-boundary := "--" boundary
>                  ; boundary taken from the value of
>                  ; boundary parameter of the
>                  ; Content-Type field.
>
> multipart-body := [preamble CRLF]
>                   dash-boundary transport-padding CRLF
>                   body-part *encapsulation
>                   close-delimiter transport-padding
>                   [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF
for the first boundary[2], which were rejected by the previous ruma
parsing logic.

[1]: https://datatracker.ietf.org/doc/html/rfc2046
[2]: matrix-org/dendrite#3414
olivia-fl added a commit to olivia-fl/ruma that referenced this issue Sep 17, 2024
RFC 2046[1] is somewhat ambiguous on whether or not it's valid to omit the
preceding CRLF for the first boundary. The prose on page 19 suggests
that it is not:

> The boundary delimiter MUST occur at the beginning of a line, i.e.,
> following a CRLF, and the initial CRLF is considered to be attached
> to the boundary delimiter line rather than part of the preceding
> part. The boundary may be followed by zero or more characters of
> linear whitespace. It is then terminated by either another CRLF and
> the header fields for the next part, or by two CRLFs, in which case
> there are no header fields for the next part. If no Content-Type
> field is present it is assumed to be "message/rfc822" in a
> "multipart/digest" and "text/plain" otherwise.
>
> NOTE: The CRLF preceding the boundary delimiter line is conceptually
> attached to the boundary so that it is possible to have a part that
> does not end with a CRLF (line break). Body parts that must be
> considered to end with line breaks, therefore, must have two CRLFs
> preceding the boundary delimiter line, the first of which is part of
> the preceding body part, and the second of which is part of the
> encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no
preamble:

> dash-boundary := "--" boundary
>                  ; boundary taken from the value of
>                  ; boundary parameter of the
>                  ; Content-Type field.
>
> multipart-body := [preamble CRLF]
>                   dash-boundary transport-padding CRLF
>                   body-part *encapsulation
>                   close-delimiter transport-padding
>                   [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF
for the first boundary[2], which were rejected by the previous ruma
parsing logic.

[1]: https://datatracker.ietf.org/doc/html/rfc2046
[2]: matrix-org/dendrite#3414
zecakeh pushed a commit to ruma/ruma that referenced this issue Sep 17, 2024
…boundary

RFC 2046[1] is somewhat ambiguous on whether or not it's valid to omit the
preceding CRLF for the first boundary. The prose on page 19 suggests
that it is not:

> The boundary delimiter MUST occur at the beginning of a line, i.e.,
> following a CRLF, and the initial CRLF is considered to be attached
> to the boundary delimiter line rather than part of the preceding
> part. The boundary may be followed by zero or more characters of
> linear whitespace. It is then terminated by either another CRLF and
> the header fields for the next part, or by two CRLFs, in which case
> there are no header fields for the next part. If no Content-Type
> field is present it is assumed to be "message/rfc822" in a
> "multipart/digest" and "text/plain" otherwise.
>
> NOTE: The CRLF preceding the boundary delimiter line is conceptually
> attached to the boundary so that it is possible to have a part that
> does not end with a CRLF (line break). Body parts that must be
> considered to end with line breaks, therefore, must have two CRLFs
> preceding the boundary delimiter line, the first of which is part of
> the preceding body part, and the second of which is part of the
> encapsulation boundary.

But the BNF on page 22 suggests that it is, as long as there is no
preamble:

> dash-boundary := "--" boundary
>                  ; boundary taken from the value of
>                  ; boundary parameter of the
>                  ; Content-Type field.
>
> multipart-body := [preamble CRLF]
>                   dash-boundary transport-padding CRLF
>                   body-part *encapsulation
>                   close-delimiter transport-padding
>                   [CRLF epilogue]

Dendrite currently generates multipart responses without a preceding CRLF
for the first boundary[2], which were rejected by the previous ruma
parsing logic.

[1]: https://datatracker.ietf.org/doc/html/rfc2046
[2]: matrix-org/dendrite#3414
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants