Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Caused by: UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 while reading with Proxy Configuration #320

Open
pavan-kumar-chalamcharla opened this issue Jun 5, 2023 · 5 comments
Assignees

Comments

@pavan-kumar-chalamcharla
Copy link
Contributor

pavan-kumar-chalamcharla commented Jun 5, 2023

When using the proxy configuration such as http_proxy and https_proxy environment variables when using the open delt-sharing for reading the data the proxy environment variables are not picked up causing the below error when using the bucket policy which allows only proxy IPs:

FileReadException: Error while reading file delta-sharing:/dbfs%253A%252FFileStore%252F059501f2aeb8fad0607470b70008727a/62477. Caused by: UnexpectedHttpStatus: HTTP request failed with status: HTTP/1.1 403 Forbidden <?xml version="1.0" encoding="UTF-8"?>
**DEBUG:fsspec.http**:Cannot connect to host [<bucket name>.s3.us-east-1.amazonaws.com](https://<bucketname>.s3.us-east-1.amazonaws.com/):443 ssl:default [Connect call failed ('', 443)]

when debugging it further we see that the fsspec.http/aiohttp is used when reading the pre-signed URLs and those libraries are not using the HTTP_PROXY env variables that are set and causing the failure while reading the data.

We are looking for support of proxy with open delta sharing while reading data via delta-sharing python libraries.

@linzhou-db
Copy link
Collaborator

Do they already have an idea of how to fix the issue?
If so, feel free to send out a PR, as this is oss code.

@linzhou-db linzhou-db self-assigned this Jun 7, 2023
linzhou-db pushed a commit that referenced this issue Jun 14, 2023
* Update reader.py

* Update reader.py

* Update reader.py
@quertenmont
Copy link

facing the same issue....

@quertenmont
Copy link

After your fix #326 , I can go one step further with my proxy configuration, but I am still having troubles

  1. if I use a https_proxy, I get the following error emitted from aiohttp
    HTTPS proxies https://mitmproxy:8080/ are not supported, ignoring

  2. If I replace my proxy configuration to use http instead of https,
    then it's my proxy server that complain, because the TLS handshake is failling

[09:08:48.665][10.244.13.19:51180] client connect
[09:08:48.860][10.244.13.19:51180] server connect open-delta-sharing.s3.us-west-2.amazonaws.com:443 (52.218.196.225:443)
[09:08:49.207][10.244.13.19:51180] Client TLS handshake failed. The client does not trust the proxy's certificate for open-delta-sharing.s3.us-west-2.amazonaws.com (tlsv1 alert unknown ca)
[09:08:49.208][10.244.13.19:51180] client disconnect

Any idea how I can sort this out ?
Thanks in advance
Loic

@pavan-kumar-chalamcharla
Copy link
Contributor Author

pavan-kumar-chalamcharla commented Jun 30, 2023

looks like a limitation from the aiohttp. The below link from aiohttp mentions that it supports "HTTP proxies and HTTP proxies that can be upgraded to HTTPS via the HTTP CONNECT method".
https://github.com/aio-libs/aiohttp/blob/master/docs/client_advanced.rst#proxy-support

check if the workaround mentioned below comment works for you and make sure you use the aiohttp v3.8:
aio-libs/aiohttp#6044 (comment)

@quertenmont
Copy link

setattr(asyncio.sslproto._SSLProtocolTransport, "_start_tls_compatible", True)

does not make any difference for me.

But, I was able to make a connection via:
ALGO-->http-->MITM-PROXY-->HTTPS-->DELTASHARE-DATA
if the deltashare-data host is included in the --ignore-hosts argument of mitm proxy
See here for the doc: https://docs.mitmproxy.org/dev/howto-ignoredomains/

Not ideal... but better than nothing.
Would be nice to have https from end to end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants