High RAM consumption by any HttpCrawlers when working with large proxy pools

When working with large proxy pools (e.g., Apify RESIDENTIAL), I observe significant RAM usage growth. Memory consumption increases by more than a gigabyte within a few minutes during active scraping.

My version:
The issue appears to be related to creating an HTTP session for each proxy https://github.com/apify/crawlee-python/blob/master/src/crawlee/http_clients/_httpx.py#L132 combined with the high default maximum SessionPool size of 1000 sessions. This results in the crawler creating a new HTTP session for almost every new request during its initial run.

The absence of cleanup logic for created HTTP sessions will likely worsen the situation when the proxy pool contains a large number of "bad" proxies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High RAM consumption by any HttpCrawlers when working with large proxy pools #895

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

High RAM consumption by any HttpCrawlers when working with large proxy pools #895

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions