Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF #352

Open
enigmathix opened this issue Apr 4, 2023 · 0 comments · May be fixed by #380

Comments

@enigmathix
Copy link

enigmathix commented Apr 4, 2023

This error appears even though there is no requirement in RFC 2046 to have the body end with 2 CR-LF. From https://www.rfc-editor.org/rfc/rfc2046.html#section-5.1.1:

Overall, the body of a "multipart" entity may be specified as
   follows:

     dash-boundary := "--" boundary
                      ; boundary taken from the value of
                      ; boundary parameter of the
                      ; Content-Type field.

     multipart-body := [preamble CRLF]
                       dash-boundary transport-padding CRLF
                       body-part *encapsulation
                       close-delimiter transport-padding
                       [CRLF epilogue]

     transport-padding := *LWSP-char
                          ; Composers MUST NOT generate
                          ; non-zero length transport
                          ; padding, but receivers MUST
                          ; be able to handle padding
                          ; added by message transports.

     encapsulation := delimiter transport-padding
                      CRLF body-part

     delimiter := CRLF dash-boundary

     close-delimiter := delimiter "--"

     preamble := discard-text

     epilogue := discard-text

     discard-text := *(*text CRLF) *text
                     ; May be ignored or discarded.

     body-part := MIME-part-headers [CRLF *OCTET]
                  ; Lines in a body-part must not start
                  ; with the specified dash-boundary and
                  ; the delimiter must not appear anywhere
                  ; in the body part.  Note that the
                  ; semantics of a body-part differ from
                  ; the semantics of a message, as
                  ; described in the text.

     OCTET := <any 0-255 octet value>

As per this spec, the simplest multipart would look like this:

--boundary CRLF
MIME-part-headers
[CRLF MIME-part-headers]
[*]
CRLF --boundary--

There is one CRLF required at the end of the body, not two. In fact, the Google App Engine posts data internally that contains only 1 CRLF when a form field is left empty (the example below is using the data it generates).

Step to reproduce:

from requests_toolbelt.multipart import decoder
data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'

decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')

output:

Traceback (most recent call last):
  File "/Users/christophe/toolbelt.py", line 4, in <module>
    decoder.MultipartDecoder(data, 'multipart/form-data; boundary="foo"')
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 111, in __init__
    self._parse_body(content)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in _parse_body
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 150, in <genexpr>
    self.parts = tuple(body_part(x) for x in parts if test_part(x))
                       ^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 141, in body_part
    return BodyPart(fixed, self.encoding)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/requests_toolbelt/multipart/decoder.py", line 63, in __init__
    raise ImproperBodyPartContentException(
requests_toolbelt.multipart.decoder.ImproperBodyPartContentException: content does not contain CR-LF-CR-LF

For comparison, here is the same data processed with cgi:

from io import BytesIO
import cgi

data = b'--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=empty\r\n\r\n--foo\r\nContent-Type: text/plain; charset="UTF-8"\r\nContent-Disposition: form-data; name=text\r\n\r\nSome Text\r\n--foo--'
environ = {'CONTENT_LENGTH': str(len(data)),
        'CONTENT_TYPE': 'multipart/form-data; boundary="foo"',
        'REQUEST_METHOD': 'POST',
        'boundary': b'foo'}

stream = BytesIO(data)
print(cgi.parse_multipart(stream, environ))

Output:

{'empty': [''], 'text': ['Some Text']}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant