Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm.ResourceMgr.MaxMemory vs GOMEMLIMIT #10621

Open
3 tasks done
lidel opened this issue Dec 10, 2024 · 1 comment
Open
3 tasks done

Swarm.ResourceMgr.MaxMemory vs GOMEMLIMIT #10621

lidel opened this issue Dec 10, 2024 · 1 comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization

Comments

@lidel
Copy link
Member

lidel commented Dec 10, 2024

Checklist

Installation method

dist.ipfs.tech or ipfs-update

Version

v0.32.1

Config

https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgrmaxmemory

Description

Problem

Swarm.ResourceMgr.MaxMemory is often interpreted and used as "memory limit for Kubo", while it is only passed to go-libp2p and used for limited set of things (transports, but not bitswap).

Ref. https://github.com/libp2p/go-libp2p/tree/master/p2p/host/resource-manager#readme

The default Swarm.ResourceMgr.MaxMemory is dynamic and conservative:

Default: [TOTAL_SYSTEM_MEMORY]/2 Type: optionalBytes

The actual memory limit people want to set is for entire Kubo, and that is likely GOMEMLIMIT but we do not set it by default.

This means:

  • some OOM could be avoided, but are not
  • some leaks could be mitigated, but are not
  • users who set Swarm.ResourceMgr.MaxMemory are still getting OOM, because it does not cover all the bases
  • bad ux, and perception of "Kubo being memory hog"

Solution brainstorm

  • Have a sensible (dynamic) limit by default
  • Allow user to override default
    • Respect GOMEMLIMIT if is present in env, it should always take precedence, and implicit default for Swarm.ResourceMgr.MaxMemory should be calculated from that ceiling.
    • Refuse to start if user manually set Swarm.ResourceMgr.MaxMemory higher than implicit (or explicit) GOMEMLIMIT
    • TBD: to solve UX problem of user searching for memory in config and only finding
    • Detect when users are running with less memory, or limits than https://github.com/ipfs/kubo?tab=readme-ov-file#minimal-system-requirements
      ResourceMgr.MaxMemory
      • Perhaps have clear Limits.MaxMemory ? Or generic Env.* and support setting Env.GOMEMLIMIT via config?

cc @sukunrt @ipfs/kubo-maintainers for feedback / ideas.

@lidel lidel added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Dec 10, 2024
@hsanjuan
Copy link
Contributor

hsanjuan commented Dec 11, 2024

I cannot think of a program in my computer that has a "MaxMemory" setting and starts torpedoing itself when crossed, so I think this type of setting is not such a good idea, even though it looked pretty good when introduced as a way of making automated decision on the actual limits.

MaxMemory is a bad proxy for something that is mostly a "MaxConnections" thing. I think traditionally p2p programs offered settings like:

  • Max connections - offered by proxy by MaxMemory
  • Max up/down bandwidth - inexistent
  • Max download workers (i.e. parallel downloads) - 10 different knobs related to bitswap, reprovider etc, some exposed and some not

GOMEMLIMIT on the other side forces go to GC more often and it is not a real limit, possibly just makes performance worse and it's going to confuse users in the same way. Users can adjust it though if memory usage pattern needs it.

Also, it might be that the problem is that Kubo IS a memory hog due to constant bugs in QUIC. I am not sure if there was ever a Kubo release that did not suffer from QUIC leaking memory in one way or other. I might be wrong but my impression is that QUIC is a permanent suspect in all discussions, release after release. As for ideas:

  • Fix QUIC
  • Remove MaxMemory and expose instead MaxConnections, MaxStreamsPerConnection (inbound/outbound limits can probably be derived, or exposed directly). This would at least lower expectations and be more explicit on what gets limited.

lidel added a commit that referenced this issue Dec 11, 2024
related to ux problem described in #10621
lidel added a commit that referenced this issue Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

2 participants