Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

response.content.text in HAR is undecodable base64 #80

Open
n-kb opened this issue Jan 28, 2018 · 2 comments
Open

response.content.text in HAR is undecodable base64 #80

n-kb opened this issue Jan 28, 2018 · 2 comments

Comments

@n-kb
Copy link

n-kb commented Jan 28, 2018

When I do a plain-vanilla test, using this code:

from browsermobproxy import Server
server = Server("venv/bin/browsermob-proxy-2.1.4/bin/browsermob-proxy", options={'port':8008})
server.start()
proxy = server.create_proxy()

from selenium import webdriver
profile  = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

proxy.new_har("google", options={"captureContent":True, "captureBinaryContent":True})
driver.get("http://www.google.co.uk")

server.stop()
driver.quit()

Some responses have response.content.text base64 encoded but doesn't decode to what it should. In this example, what should be an HTML page decodes to gibberish: https://gist.github.com/n-kb/8b8818230c54be998007ee855e037404#file-google-har-L191

Because the problem only arises with text and not with images, I'm hypothesizing that the issue comes from this (from RFC1341): "A CRLF sequence in base64 data should be converted to a quoted-printable line break, but ONLY when converting text data" but I'm not familiar at all with these things.

Any idea how this could be solved?

@Fireclunge
Copy link

Fireclunge commented Aug 14, 2018

I hope this helps anyone with the same issue (after spending way too much time looking for a solution)

It appears to be caused by brotli compression garbling the output before the encoding. Reversing the process worked for me :)

import brotli
import base64

decoded_text = brotli.decompress(
    base64.b64decode(entry['response']['content']['text'])
    ).decode()

@ericbeland
Copy link

If anyone would like to try, we have a fork of the BrowserMob proxym renamed as the BrowserUp Proxy with Brotli support merged. It should be a drop in replacement for the binary, and should be compatible when used via REST, the only exception being that we dropped the deprecated legacy routes. We are actively maintaining this and adding to it, whereas development on the BrowserMob proxy itself has been dead for a few years.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants