Skip to content

High RAM consumption by any HttpCrawlers when working with large proxy pools #895

Closed
@Mantisus

Description

@Mantisus

When working with large proxy pools (e.g., Apify RESIDENTIAL), I observe significant RAM usage growth. Memory consumption increases by more than a gigabyte within a few minutes during active scraping.

My version:
The issue appears to be related to creating an HTTP session for each proxy https://github.com/apify/crawlee-python/blob/master/src/crawlee/http_clients/_httpx.py#L132 combined with the high default maximum SessionPool size of 1000 sessions. This results in the crawler creating a new HTTP session for almost every new request during its initial run.

The absence of cleanup logic for created HTTP sessions will likely worsen the situation when the proxy pool contains a large number of "bad" proxies.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions