Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary #971

Closed
wants to merge 2 commits into from

Conversation

Quidge
Copy link

@Quidge Quidge commented Apr 28, 2021

Description

This PR re-implements the same functional change that was shown in a PR from the pre-fork Miserlou/zappa repository.

The change in that PR didn't make it into the Zappa 52 release from what I can tell. My company has been using that PR's code (in combination with the flask_compress library) for a couple weeks without issue so I'm opening this PR that copies that code.

Additional tests are added for coverage. I'm not confident that the test organization that I used is correct or idiomatic for this codebase -- please critique and I'll change if wanted!

GitHub Issues

#908

Checklist I found in the contributors documentation:

Before you submit this PR, please make sure that you meet these criteria:

  • Did you read the contributing guide?

    • Yes.
  • If this is a non-trivial commit, did you open a ticket for discussion?

  • Did you put the URL for that ticket in a comment in the code?

    • Yes.
  • If you made a new function, did you write a good docstring for it?

    • NA.
  • Did you avoid putting "_" in front of your new function for no reason?

    • NA.
  • Did you write a test for your new code?

    • Yes.
  • Did the Travis build pass?

    • Yes.
  • Did you improve (or at least not significantly reduce) the amount of code test coverage?

    • Yes.
  • Did you make sure this code actually works on Lambda, as well as locally?

    • Yes. My company has been using these functional changes in a fairly standard lambda environment for a couple weeks.
  • Did you test this code with all of Python 3.6, Python 3.7 and Python 3.8 ?

  • Does this commit ONLY relate to the issue at hand and have your linter shit all over the code?

    • The changes introduced in the commits are not outside the scope of the issue.

@Quidge Quidge changed the title Write tests confirming Binary Support response behavior Update BINARY_SUPPORT to use Content-Encoding to identify if data is binary May 5, 2021
@paulnicolet
Copy link

Awesome, thanks!

Is there anything preventing to merge this ?

@Quidge
Copy link
Author

Quidge commented May 11, 2021

@paulnicolet There's nothing blocking that I'm aware of. @jneves could this be merged?

@colinhoernig
Copy link

Hi @jneves -- we're using this code from @Quidge in a forked version of Zappa because of a need to enable compression, and it's working very well for us. If this PR could be merged in, that would allow us to move off of our fork and back onto the official Zappa project which we're really looking forward to so that we don't have to maintain our own fork. 🙂

@Quidge
Copy link
Author

Quidge commented Jun 3, 2021

@jneves rebased this PR atop the (great!) newer black changes to the codebase.

@travnels
Copy link

travnels commented Jun 9, 2021

@jneves is this getting any movement? Anything I can do to help get this merged?

@travnels
Copy link

travnels commented Jul 9, 2021

@jneves we've been using this code on a fork (https://github.com/tackle-io/Zappa) for over 3 months and are eager to get it merged. This enhancement has helped with large payload that exceed the lambda max response size (6 MB), since it allows for us to zip the response in the lambda. Let me know if there's anything I can do to help getting the approved and merge. Thanks!

@colinhoernig
Copy link

@jneves It's been several months -- we'd really love to get this merged in so we can cut off of our own Zappa fork.

tests/utils.py Outdated Show resolved Hide resolved
zappa/handler.py Outdated Show resolved Hide resolved
zappa/handler.py Show resolved Hide resolved
tests/test_handler.py Outdated Show resolved Hide resolved
@Quidge Quidge force-pushed the quidge/908_2 branch 3 times, most recently from d073683 to 3501ea2 Compare November 29, 2021 00:25
Quidge and others added 2 commits November 28, 2021 19:31
No functional change to the Zappa codebase is introduced by this commit.

This area of the application was untested. Tests introduced to ensure new
behavior discussed in zappa#908 does not cause
a regression.
When using _whitenoise_ for caching, which provides compression, binary types
may include mimetypes, "text/", "application/json":

- response.mimetype.startswith("text/")
- response.mimetype == "application/json"

Assuming that Content-Encoding will be set (as whitenoise apparently does)
this allows compression to be applied by the application for "text/" and
"application/json".

About Content-Encoding:
developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

---

The above commit message and functional code change to zappa/handler.py was
written by GH user @monkut and a PR was provided with
Miserlou/Zappa#2170.

That PR was not merged in before the fork from Miserlou/zappa to Zappa/zappa.

This commit copies the code from that PR, adds a comment line referencing the
new migrated issue, and additionally adds tests to confirm behavior.

Co-authored-by: monkut <[email protected]>
@Quidge
Copy link
Author

Quidge commented Nov 29, 2021

@javulticat would you please re-review when you get a chance?

Behavior I started thinking about when fiddling with this: what do you think should happen when a csv file is sent back to the client, say via flask.send_file()?

A vanilla Flask application (no additional after_request stuff tacked on)'s use of flask.send_file() will not add a Content-Encoding header to the response, but the mimetype would start with text/ if the mimetype is passed manually to the function or if filename ends in .csv (flask.send_file() tries to infer the mimetype from the filename).

The end result is that a file uploaded through flask.send_file('some/file/path.csv') will not be base64 encoded due to the conditional logic that's being added in this PR.

I haven't seen any issues raised around this in the last year or so, but it seems wrong. The csv file would be binary so I think we'd want to encode as base64 for that too. Thoughts?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 73.606% when pulling eb2b50f on tackle-io:quidge/908_2 into b5b80cf on zappa:master.

@Quidge
Copy link
Author

Quidge commented Dec 7, 2021

:| I'm trying to 'resolve' a pending unresolved conversation because it appears to be a merge blocker:

Screen Shot 2021-12-07 at 12 09 14 PM

I addressed this with an update to the test files then rebased my changes to keep the PR to two commits (that may have been a mistake -- I was trying to keep the commits atomic).

Now I can no longer reference the conversation to mark it as resolved.
Screen Shot 2021-12-07 at 12 09 09 PM

Not sure what to do here?

@javulticat
Copy link
Member

@Quidge conversation resolved - thanks for improving the tests!

I'm not sure I fully understand the question being asked around CSVs. Could you try restating?

One point that may or may not help answer the question is that a CSV is a plain text file. So I'd imagine the handling of CSVs (text/csv) should probably be the same as the handling of any other plain text file (text/plain).

@zeevt
Copy link

zeevt commented Dec 8, 2021

CSV (and plain text) can be in any character encoding, it is not guaranteed that it's ASCII or UTF-8.
If just .decode() works, great.
If there is a header like Content-Type: text/csv; charset=windows-1252 (there usually isn't) then it can be decoded to str without relying on default charset.
But if neither of those is true, then you have bytes that should be base64 encoded.

@javulticat
Copy link
Member

@zeevt that's my understanding as well, so I believe it would be fair to say that this is a potential concern for any text file, correct? I was confused because @Quidge's message implied this was a concern specific to CSVs (which I definitely may have just misinterpreted - sorry!)

If so, would a solution be to sniff the encoding of plain text and use that to decode it using the proper encoding (and, if that fails, fall back on using base64-encoded bytes)? Something like:

import chardet

# If plain text
try:
    encoding = chardet.detect(response.data)["encoding"]
    decoded = response.data.decode(encoding)
except Exception:
    # base64 encode the bytes

?

@zeevt
Copy link

zeevt commented Dec 8, 2021

@javulticat You understood me correctly, yes. I have two concerns with this, now that I thought about it a tiny bit more:

  1. Maybe someone is trying to return an http response with "content-type: text/foo" and with a body encoded with whatever encoding on purpose, and the proposed code would decode it and re-encode it in UTF-8 if chardet guesses correctly. This would be a bug. They should be able to ensure the exact bytes they intended are received by the HTTP client, without forcing application/octet-stream content type.

  2. How fast is chardet? I would hesitate running it on every response without measuring first.

I now think it's only safe to decode bytes to str if it actually decodes as UTF-8.

@monkut
Copy link
Collaborator

monkut commented Aug 12, 2022

Migrated to:
#1155

closing

@monkut monkut closed this Aug 12, 2022
monkut added a commit that referenced this pull request Nov 11, 2022
…ata is binary (#1155)

* 🔧 migrate #971 to lastest master

* 🎨 run black/isort

* ♻️ refactor to allow for other binary ignore types based on mimetype. (currently openapi schema can't be passed as text.

* 🎨 run black/fix flake8

* 🔧 add EXCEPTION_HANDLER setting

* 🐛 fix zappa_returndict["body"] assignment

* 📝 add temp debug info

* 🔥 delete unnecessary print statements

* ♻️ Update comments and minor refactor for clarity

* ♻️ refactor for ease of testing and clarity

* 🎨 fix flake8

* ✨ add `additional_text_mimetypes` setting
✅ add testcases for additional_text_mimetypes handling

* 🔧 Expand default text mimetypes mentioned in #1023
♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py

* 🎨 run black/isort

* 🎨 run black/isort

* 🎨 remove unnecesasry comment (black now reformats code)
🎨 change commented lines to docstring for test app
Ian288 pushed a commit to tackle-io/Zappa that referenced this pull request Jul 11, 2023
… if data is binary (zappa#1155)

* 🔧 migrate zappa#971 to lastest master

* 🎨 run black/isort

* ♻️ refactor to allow for other binary ignore types based on mimetype. (currently openapi schema can't be passed as text.

* 🎨 run black/fix flake8

* 🔧 add EXCEPTION_HANDLER setting

* 🐛 fix zappa_returndict["body"] assignment

* 📝 add temp debug info

* 🔥 delete unnecessary print statements

* ♻️ Update comments and minor refactor for clarity

* ♻️ refactor for ease of testing and clarity

* 🎨 fix flake8

* ✨ add `additional_text_mimetypes` setting
✅ add testcases for additional_text_mimetypes handling

* 🔧 Expand default text mimetypes mentioned in zappa#1023
♻️ define "DEFAULT_TEXT_MIMETYPES" and move to utilities.py

* 🎨 run black/isort

* 🎨 run black/isort

* 🎨 remove unnecesasry comment (black now reformats code)
🎨 change commented lines to docstring for test app
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants