Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executeClientScripts semantics #1128

Open
Phyks opened this issue Jan 8, 2025 · 2 comments
Open

executeClientScripts semantics #1128

Phyks opened this issue Jan 8, 2025 · 2 comments

Comments

@Phyks
Copy link

Phyks commented Jan 8, 2025

Hi,

executeClientScripts is actually used at the moment not to simply "execute client scripts", but rather to upgrade from a simple node-fetch to a full browser fetcher. This has the side effect of enabling to run client scripts, but this does much more than this and seems to be about as much used for bot detection evasion (puppeteer browser having a stealthier signature than node-fetch) as for actually running the client scripts.

This has multiple implications:

For these reasons, I believe that executeClientScripts should not be a simple binary parameter but a set of two binary parameters: useFullBrowser / executeClientScripts. These would have the following semantics:

executeClientScripts / useFullBrowser false (default) true
false curl-backed fetcher / no script execution Puppeteer without JS
true (default) curl-backed fetcher / script execution with JSDom Puppeteer

This would likely mean a backward-incompatible change to the services declarations, so not sure how to properly handle such an evolution.

Best,

@Ndpnt
Copy link
Member

Ndpnt commented Jan 22, 2025

Hi @Phyks,

Thank you so much for your valuable feedback on this option.
You’ve raised some very relevant points about the current implementation and how it could be improved.

At the moment, the team is focused on other priorities, so I can’t commit to a timeline or guarantee when we will investigate more on this question, but I keep your suggestions in mind.

Thanks again for taking the time to share this feedback, I really appreciate your input!

@MattiSG
Copy link
Member

MattiSG commented Jan 24, 2025

Interesting idea, thanks!

If I remember correctly, we had sampling data that proved that executeClientScripts was not so prevalent and that the basic fetcher worked in most cases. Adding an intermediary state could be interesting if it significantly both:

  1. Increased bot blockers evasion.
  2. Decreased resource consumption compared to starting a full Puppeteer.

Before adding complexity to the codebase and config, I believe it would be critical to gather the following data:

  1. Prevalence of executeClientScripts: true in a large sample (ideally whole federation, otherwise I'd suggest Contrib).
  2. Proportion of failures that are corrected by both full Puppeteer and non-JS Puppeteer in a large sample.
  3. Resource consumption difference between full Puppeteer and non-JS Puppeteer (CPU cycles, RAM usage, time to start).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants