Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move node-exporter to its own container #2290

Open
Tracked by #1943
zimnx opened this issue Dec 20, 2024 · 8 comments
Open
Tracked by #1943

Move node-exporter to its own container #2290

zimnx opened this issue Dec 20, 2024 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@zimnx
Copy link
Collaborator

zimnx commented Dec 20, 2024

Today, it is running part of the Scylla container. This is not good for multiple reasons:

  • It could interfere with Scylla's work (resulting in higher latency), it consumes part of the memory allocated to it, etc.
  • We don't, but we should be able to independently update them.

It's a process to get it done, but let's design first HOW we'd do it, then see what needs to be changed in the Scylla container.

@zimnx zimnx added kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 20, 2024
@mykaul
Copy link
Contributor

mykaul commented Dec 23, 2024

I don't know how to mark it as NOT priority/important-soon - it's not that important to be completed soon. Unless we can provide evidence it has a noticeable impact on latency.

@tnozicka
Copy link
Member

I don't know how to mark it as NOT priority/important-soon

/priority important-longterm

unless we can provide evidence it has a noticeable impact on latency.

All the processes running in the same container impact the ScyllaDB latency by stealing CPU cycles from it. We have traced down an instance when it was the sidecar operator cache updating, I'd expect a similar ones occur as well, but this is quite hard to catch at the system level.

@scylla-operator-bot scylla-operator-bot bot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Dec 23, 2024
@tnozicka
Copy link
Member

/remove-priority important-soon

@scylla-operator-bot scylla-operator-bot bot removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 23, 2024
@gecube
Copy link

gecube commented Dec 23, 2024

@tnozicka Hi!

All the processes running in the same container impact the ScyllaDB latency by stealing CPU cycles from it. We have traced down an instance when it was the sidecar operator cache updating, I'd expect a similar ones occur as well, but this is quite hard to catch at the system level.

I think even if they would be running in the same pod (but in different containers) on the same host you will observe the latency issues... not?

@tnozicka
Copy link
Member

I think even if they would be running in the same pod (but in different containers) on the same host you will observe the latency issues... not?

the cgroup limits are applied per container

@gecube
Copy link

gecube commented Dec 23, 2024

from this perspective - I agree, but still we have shared CPU/Mem/disk on the same node.

@tnozicka
Copy link
Member

with a guaranteed class, croup limits and pinned cores it's pretty much isolated

@mykaul
Copy link
Contributor

mykaul commented Dec 24, 2024

BTW, we did recently removed some (many) of the default collectors from node-exporter, so there should be somewhat less noise. Perhaps when running on K8S we can use even less collectors? (but I think the path forward should be focused on splitting it to its own container).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests

4 participants