Skip to content

Add ability to throttle streaming sinks #1051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
s1msn opened this issue Oct 18, 2019 · 8 comments
Closed

Add ability to throttle streaming sinks #1051

s1msn opened this issue Oct 18, 2019 · 8 comments
Labels
domain: networking Anything related to Vector's networking needs: requirements Needs a a list of requirements before work can be begin sink: file Anything `file` sink related sink: socket Anything `socket` sink related sink: statsd Anything `statsd` sink related sink: vector Anything `vector` sink related source: vector Anything `vector` source related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@s1msn
Copy link

s1msn commented Oct 18, 2019

Currently there seems to be no mechanism to limit the bandwidth used by vector when sending events.
In large environments, this could potentially lead to problems, especially when one or more vector instances start streaming buffered events or resume reading from a file source.

@binarylogic binarylogic added domain: networking Anything related to Vector's networking type: enhancement A value-adding code change that enhances its existing functionality. sink: vector Anything `vector` sink related source: vector Anything `vector` source related labels Oct 18, 2019
@LucioFranco
Copy link
Contributor

@s1mpS Hi! I believe using a combination of the batch_size and rate_limit_* options to limit the amount of outgoing data so that you don't get the stampeding herd effect should work. This should limit the total amount of data able to leave that instance of vector via the configured sink.

@s1msn
Copy link
Author

s1msn commented Oct 18, 2019

@LucioFranco Hi Lucio, thanks for the quick reply. I noticed the rate_limit options for the elasticsearch sink, but there does not seem to be one for the vector sink yet. As this is the recommended sink when deploying vector as an agent, it would be great to have this as an option in this sink too.

@LucioFranco
Copy link
Contributor

@s1mpS yup, this is because sinks like the vector and tcp sink are "streaming" based and we would need to throttle. This should be possible though. I will update the issue to reflect this.

@LucioFranco LucioFranco changed the title Bandwidth limiting for communication between vector instances Add ability to throttle streaming sinks Oct 18, 2019
@LucioFranco LucioFranco added sink: file Anything `file` sink related sink: statsd Anything `statsd` sink related sink: socket Anything `socket` sink related sink: udp labels Oct 18, 2019
@LucioFranco
Copy link
Contributor

I've added a couple more sinks to the list here but most likely what we will want to do is use some form of throttle https://docs.rs/tokio/0.2.0-alpha.6/tokio/stream/trait.StreamExt.html#method.throttle like this one. That said, the tokio throttle seems a bit limited and we might want to allow users to specify something similar to how our rate limits are configured by setting an amount of data that should be sent over a window and any more should apply back pressure.

@binarylogic
Copy link
Contributor

by setting an amount of data that should be sent over a window

👍 , although our rate_limit* options do not do this. They're quantity based not bandwidth based. I think the latter would also be useful in batching sinks, I'm just unsure exactly how that would work. I think it would be worth agreeing on what the options look like before beginning work on this.

@binarylogic binarylogic added the needs: requirements Needs a a list of requirements before work can be begin label Oct 18, 2019
@ghost
Copy link

ghost commented Oct 18, 2019

Is it possible to add a rate_limiter transform that would do the rate limiting before the events reach sinks? In that case it would be possible to keep sinks' configuration and code simple, yet do the rate limiting. It might also be necessary to do rate limiting before some transforms, especially if we implement transforms doing some external API requests (see #1041 for examples of such transforms).

@lukesteensen
Copy link
Member

@a-rodin Yes, and we even have #258 already to represent it 😄

I think I'd prefer using the transform over adding options to every sink. The downside would be that we couldn't really integrate it with encoders for precise bytesize-based limits. Or we can at least implement the transform first, and then integrate into sinks/encoders in a second pass.

@binarylogic
Copy link
Contributor

I agree, closing this in favor of #258.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: networking Anything related to Vector's networking needs: requirements Needs a a list of requirements before work can be begin sink: file Anything `file` sink related sink: socket Anything `socket` sink related sink: statsd Anything `statsd` sink related sink: vector Anything `vector` sink related source: vector Anything `vector` source related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

4 participants