Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[import] CDN protected sites cannot be proxied #2050

Open
kptdobe opened this issue Aug 11, 2022 · 3 comments
Open

[import] CDN protected sites cannot be proxied #2050

kptdobe opened this issue Aug 11, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@kptdobe
Copy link
Contributor

kptdobe commented Aug 11, 2022

Run hlx import and try to import page https://www.globe.com.ph/business/enterprise.html. Nothing happens, the proxy returns 403.

Try to open proxy page in another browser tab: http://localhost:3001/business/enterprise.html?host=https%3A%2F%2Fwww.globe.com.ph and you get this:

image

The proxy request has the exact same headers than the browser request. Cloudflare puts a lot of effort from proxy / scripts / bots (well, no humans...) to not access the site. I do not think we can workaround this.

@kptdobe kptdobe added the bug Something isn't working label Aug 11, 2022
@tripodsan
Copy link
Contributor

maybe it's possible to detect this, and then open the captcha.page, somehow steal the token and send it it along the requests...

@trieloff
Copy link
Contributor

trieloff commented Aug 15, 2022

curl 'https://www.globe.com.ph/business/enterprise.html' \
-X 'GET' \
-H 'Cookie: __cf_bm=52rI9_zlhPg5F8ggmY2k74e2WngRqBcnkZxFJK9Szfs-1660555243-0-AVRprjhOVAUorYu8H231NlLOP0DQ37QUC7ttgc8ELRf2bM6KG57rR8FXwUYlwAC36PQUhfII0OGD6o5b+4+1Xo1NJ4TBfg64v8CNV/HPspLbIoFuFHmL4bTan8uzwdLODevMYw6NZc0AkASSqujJQLM8NaA2EHBZA1AA0VAFoVBr; cas_globe_previous_url=https://www.globe.com.ph/business/enterprise.html; policy=true; AWSELB=A1B125F1125C8DEEC3E5547E6F45EDCD90C6005B09A7E4ECA99D4520B2712C3EE6A9F70C5DB9AD2BC4E481D67EA0B261FCB3F41AC317CF068D3D6D7964D471101F690D5CA5; AWSELBCORS=A1B125F1125C8DEEC3E5547E6F45EDCD90C6005B09A7E4ECA99D4520B2712C3EE6A9F70C5DB9AD2BC4E481D67EA0B261FCB3F41AC317CF068D3D6D7964D471101F690D5CA5' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
-H 'Host: www.globe.com.ph' \
-H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.6 Safari/605.1.15' \
-H 'Accept-Language: en-GB,en;q=0.9' \
-H 'Accept-Encoding: gzip, deflate, br' \
-H 'Connection: keep-alive' --output - | gunzip

The __cf_bm Cookie is essential. Maybe adding this as a CLI option could work (you'd still need to go into dev tools to steal the cookie)

@ghost
Copy link

ghost commented Dec 22, 2023

#Please can someone help me understand what this is all about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants