Closed
Description
When working with large proxy pools (e.g., Apify RESIDENTIAL), I observe significant RAM usage growth. Memory consumption increases by more than a gigabyte within a few minutes during active scraping.
My version:
The issue appears to be related to creating an HTTP session for each proxy https://github.com/apify/crawlee-python/blob/master/src/crawlee/http_clients/_httpx.py#L132 combined with the high default maximum SessionPool size of 1000 sessions. This results in the crawler creating a new HTTP session for almost every new request during its initial run.
The absence of cleanup logic for created HTTP sessions will likely worsen the situation when the proxy pool contains a large number of "bad" proxies.