Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federation authenticated media endpoints have invalid multipart response #3414

Closed
matrixbot opened this issue Nov 2, 2024 · 6 comments
Closed

Comments

@matrixbot
Copy link
Collaborator

This issue was originally created by @Benjamin-L at matrix-org/dendrite#3414.

Background information

  • Dendrite version or git SHA: 0.13.7+7a4ef24
  • SQLite3 or Postgres?: unknown (not my server)
  • Running in Docker?: unknown
  • go version: unknown
  • Client used (if applicable): Grapevine b4fecbc51719a33d09be1e76d55ae0eec11fb71a (over federation API)

Description

The multipart spec states that every encapsulation boundary in the body should be preceded by a CRLF. Instead, dendrite's response to the federation auth media download endpoint has the -- of the first boundary starting directly at the beginning of the body, with no CRLF.

Ruma expects a preceding CRLF here when parsing the response. As a result, homeserver implementations using ruma are unable to fetch media from dendrite servers over the authed media endpoints. This affects grapevine, and likely also affects conduwuit although I have not tested it.

relevant text from RFC 1341

Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain).

NOTE: The CRLF preceding the encapsulation line is considered part of the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, should have two CRLFs preceding the encapsulation line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary.

The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs. This is indeed how such entities should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such entities.

Steps to reproduce

  • issue a /_matrix/federation/v1/media/download/:mediaid request to a dendrite server
  • inspect the response body. Every multipart boundary should start with a CRLF. Instead, the first boundary begins immediately.
@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @CobaltCause at matrix-org/dendrite#3414 (comment).

Seems like this is actually a bug in Go's standard library. It actually used to include a leading CRLF but was changed ~13 years ago as a workaround for a bug in a different program: https://codereview.appspot.com/4635063#msg4.

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @S7evinK at matrix-org/dendrite#3414 (comment).

This might be a problem in Ruma, the Spec wants RFC 2046 for the boundary. (Yea, the MSC linked to RFC 1341, which may had different "rules" for boundaries)

The multipart package, which Dendrite uses, implements RFC 2046.

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @Benjamin-L at matrix-org/dendrite#3414 (comment).

@S7evinK Good catch! RFC 2046 seems to have similar text stating that boundaries must be preceded by a CRLF though. On page 19:

The boundary delimiter MUST occur at the beginning of a line, i.e.,
following a CRLF, and the initial CRLF is considered to be attached
to the boundary delimiter line rather than part of the preceding
part. The boundary may be followed by zero or more characters of
linear whitespace. It is then terminated by either another CRLF and
the header fields for the next part, or by two CRLFs, in which case
there are no header fields for the next part. If no Content-Type
field is present it is assumed to be "message/rfc822" in a
"multipart/digest" and "text/plain" otherwise.

NOTE: The CRLF preceding the boundary delimiter line is conceptually
attached to the boundary so that it is possible to have a part that
does not end with a CRLF (line break). Body parts that must be
considered to end with line breaks, therefore, must have two CRLFs
preceding the boundary delimiter line, the first of which is part of
the preceding body part, and the second of which is part of the
encapsulation boundary.

I haven't read the whole thing in detail, it's definitely possible that I'm missing something. It would be at least somewhat surprising this was a bug sitting in the go standard library for 13 years without anybody else noticing it.

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @f0x52 at matrix-org/dendrite#3414 (comment).

Coincidentally I read Go's mime/multipart source recently, and noticed that their parser specifically allows the CRLF to be missing at the very start of the response source. It's unclear to me though if this is leniency similar to allowing LF without CR as a terminator, or something that implementations MUST accept

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @f0x52 at matrix-org/dendrite#3414 (comment).

on page 22 of RFC2046 the multipart body is specified as:

 dash-boundary := "--" boundary
                  ; boundary taken from the value of
                  ; boundary parameter of the
                  ; Content-Type field.

 multipart-body := [preamble CRLF]
                   dash-boundary transport-padding CRLF
                   body-part *encapsulation
                   close-delimiter transport-padding
                   [CRLF epilogue]

Which seems to me like the leading CRLF is only required when a preamble is present, which is optional

@matrixbot
Copy link
Collaborator Author

This comment was originally posted by @Benjamin-L at matrix-org/dendrite#3414 (comment).

Oh fun. I agree with your reading that the BNF clearly states the first CRLF is not required unless there is a preamble. That seems to conflict with the earlier text in the RFC to me, but either way the next step is to change ruma to be more permissive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant