Replies: 11 comments 12 replies
-
If I'm going to hazard a guess its the metrics collection for kubernetes logs thats doing all the allocating. If that could not be optimized we would happily give up per pod metrics, we already remove that cardinality because our monitoring system can't handle the sheer amount of data for the cluster. |
Beta Was this translation helpful? Give feedback.
-
Generally speaking, we're aware that Vector allocates more than it would need to given an optimal design of each subsystem. As the authors, we don't love this fact. Vector could definitely perform better and use resources more efficiently. That said, many customers/users often report increased performance and better resource efficiency after switching to Vector. We try to focus our time where there seems to be the most demand/desire, which has been firmly in the "features" camp for a while now. Internal telemetry has always been an annoying thorn in our sides due to the allocations required to get all of the labels/tags into metrics that we need to exist. We have one long-term initiative to port as much of our internal telemetry over to a new design that eschews most, if not all, of the runtime allocations that you might currently see happening for internal telemetry. This work has to happen on a component-by-component basis, though, and the We're always open to PRs, or detailed issue reports, for things that could clearly be improved. Please feel free to open an issue specifically for the |
Beta Was this translation helpful? Give feedback.
-
Threw together a hackish build today and installed it on our development cluster. All this build does is removes the per pod cardinality from within kubernetes metrics. I know, not an upstreamable solution. However I beleive the experiment proves the resource waste point well. master...X4BNet:vector:exp/remove-kuberentes-pod-metrics The nodes in this cluster tend to send more logs messages on average as they run development builds and can be a bit overloaded at times (expecially with vector eating a third of each nodes resources typically). Before (600 - 800m, saturated cpu):
After (19m - 60m over 20 minutes, 29m avg):
Same or higher log volume (I tested stressing some loggy services). And with that development cluster is no longer saturated just handling vector :) @tobz you mentioned an alternative telemetry API. Is there details on this or an example component for which it has been implemented? |
Beta Was this translation helpful? Give feedback.
-
To note - I achieved 2.5x performance boost in the scenario "File source -> Blackhole sink" just by commenting out two metrics from the File source: For us File source is the main use case, so we are interested a lot in the improvements in this area as well. |
Beta Was this translation helpful? Give feedback.
-
For anyone curious what the next big consumers of resources are after metrics:
So something doing memory copies in place of memmoves / ptr logic? I think in order to get data on that a rust level capture would need to be used. |
Beta Was this translation helpful? Give feedback.
-
Is it just me or does this profile result not look like the typical output of a release build?
Its compiled with Why does a release build spend 19% of it's time (after kubernetes metrics waste removed) capturing short stack traces? Did I screw up and somehow build for debug? |
Beta Was this translation helpful? Give feedback.
-
I found this thread while looking into vector performance If there are any tools or instructions for profiling vector, I would be happy to also contribute some reports |
Beta Was this translation helpful? Give feedback.
-
@splitice, Hi! I wanted to write some rust code to this project on my last weekends. Moreover, vector is transfering logs in my production at this moment, and memory consumption is quite big, much more, than I would expect. So, I found this issue, and tried to implement new internal metrics flow with register. I created HashMap with registered counters and started to call them directly, as @zamazan4ik did in his PR to file source. This change, as I thought, must improve memory consumption for kubernetes_logs source. I finally started patched code, and literally 0 changes occurred on my grafana dashboard, which was showing container_memory_working_set_bytes at the same level, as build without any changes. But I continued to investigate the memory issue. After that, I added heaptrack to the Dockerfile to track heap memory. First thing I noticed: heaptrack showed, that vector uses only 100MB of heap, while grafana showed me, that vector uses more than 900MB of memory :) Second one: I found, that metric container_memory_working_set_bytes may include memory, that is used for file caching in kernel. Third: I was so disappointed and sad, so I tried to just use your changes (removing any pod_info labels in metrics) So, @splitice, could you please check your hacky build again, on fresh master version. I just wanna know what I've done wrong, or, maybe, some optimizations in vector code were added, and this behaviour is expected. I want to add some context: I was generating logs in my minikube cluster using simple log generator written in go, and the log rate was set to 7k/sec. Vector configuration was very simple: only one input with kubernetes_logs type, and one output to blackhole. Looking at heaptrack flamegraph, I haven't found any metrics mention, and that was confusing. The most memory was consumed by cloning strings from kubernetes annotators, but that fact seems ok, because that is an event content, that should be cloned (maybe Rc or Arc should help here, but then we need to use Rc in all usages of event struct) And, I want to gain this performance issue, so I'm ready to work around it on my weekends, but I need some help.. I know about discord channel, but maybe participants of this issue could give me some feedback / help about this performace issue, maybe I'm missing something? |
Beta Was this translation helpful? Give feedback.
-
Unfortunately actually fixing this requires someone with real rust knowledge. I'm definitely not a rust developer. There's certainly need for better handling here than what this hack does. The way metrics are handled currently is absurdly bad for anyone doing cronjobs or jobs on their cluster. I'm still running my hacky patch for the cluster collector. Most of those run at around 100mb of ram. But that's all that vector instance does, sinks to another vector cluster for processing and storage which is more up to date. I haven't tested my hack against latest master. Hopefully eventually some development will be done on vector performance and I won't need to rebase it. |
Beta Was this translation helpful? Give feedback.
-
Just wanted to chime in on this thread as I'm currently doing a proof-of-concept to replace fluent-bit agent + custom golang app as aggregator with Vector as both agent and aggregator in our Kubernetes clusters. People on the team were very excited because of the findings of this blog post (bare metal) but currently our real-world experience in Kubernetes is that while throughput does seem better than fluent-bit, it comes at a considerable resource cost in the I don't have much else to add at this point. Just wanted to chime in that we noticed this and would love it if something could be done to improve performance in kubernetes. |
Beta Was this translation helpful? Give feedback.
-
I am still hoping at some point for a performance iteration, the performance of vector can still be quite low at times. There is definately alot of low hanging fruit in general.
|
Beta Was this translation helpful? Give feedback.
-
How are people finding vectors performance?
We have been running vector for quite a while now and quite like it. However we can't help but note it's not exactly performant.
I've created a few issues with areas noted for improvements over the past year. Is the current focus strictly moving towards new features or is there room for a performance orientated milestone release? It seems like one would go a miss.
Would it be helpful to receive some perf reports?
My first finding from that is that (if I'm reading that right)
On a system with vector is sitting at 41% CPU memory allocation is costing 11.5%?
This is a simple kubernetes_logs -> vector configuration with no real processing. promethetheus_metrics (cardinality limited, not that metric cardinality seems to be the issue here)
A configuration sample can be provided.
Beta Was this translation helpful? Give feedback.
All reactions