You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
executeClientScripts is actually used at the moment not to simply "execute client scripts", but rather to upgrade from a simple node-fetch to a full browser fetcher. This has the side effect of enabling to run client scripts, but this does much more than this and seems to be about as much used for bot detection evasion (puppeteer browser having a stealthier signature than node-fetch) as for actually running the client scripts.
This has multiple implications:
Spawning a full browser is quite cost intensive, so it should probably be avoided as much as possible.
Running client scripts can also be handled with a plain DOM fetcher given that JSDom is already embedded in the codebase. This requires setting parameters in JSDom to enable in-HTML script execution as well as external resources loading (https://github.com/jsdom/jsdom?tab=readme-ov-file#executing-scripts).
For these reasons, I believe that executeClientScripts should not be a simple binary parameter but a set of two binary parameters: useFullBrowser / executeClientScripts. These would have the following semantics:
executeClientScripts / useFullBrowser
false (default)
true
false
curl-backed fetcher / no script execution
Puppeteer without JS
true (default)
curl-backed fetcher / script execution with JSDom
Puppeteer
This would likely mean a backward-incompatible change to the services declarations, so not sure how to properly handle such an evolution.
Best,
The text was updated successfully, but these errors were encountered:
Thank you so much for your valuable feedback on this option.
You’ve raised some very relevant points about the current implementation and how it could be improved.
At the moment, the team is focused on other priorities, so I can’t commit to a timeline or guarantee when we will investigate more on this question, but I keep your suggestions in mind.
Thanks again for taking the time to share this feedback, I really appreciate your input!
If I remember correctly, we had sampling data that proved that executeClientScripts was not so prevalent and that the basic fetcher worked in most cases. Adding an intermediary state could be interesting if it significantly both:
Increased bot blockers evasion.
Decreased resource consumption compared to starting a full Puppeteer.
Before adding complexity to the codebase and config, I believe it would be critical to gather the following data:
Prevalence of executeClientScripts: true in a large sample (ideally whole federation, otherwise I'd suggest Contrib).
Proportion of failures that are corrected by both full Puppeteer and non-JS Puppeteer in a large sample.
Resource consumption difference between full Puppeteer and non-JS Puppeteer (CPU cycles, RAM usage, time to start).
Hi,
executeClientScripts
is actually used at the moment not to simply "execute client scripts", but rather to upgrade from a simple node-fetch to a full browser fetcher. This has the side effect of enabling to run client scripts, but this does much more than this and seems to be about as much used for bot detection evasion (puppeteer browser having a stealthier signature than node-fetch) as for actually running the client scripts.This has multiple implications:
For these reasons, I believe that
executeClientScripts
should not be a simple binary parameter but a set of two binary parameters:useFullBrowser
/executeClientScripts
. These would have the following semantics:This would likely mean a backward-incompatible change to the services declarations, so not sure how to properly handle such an evolution.
Best,
The text was updated successfully, but these errors were encountered: