Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Request: allow targeting subset of destination service #843

Open
dasch opened this issue Mar 7, 2024 · 8 comments
Open

User Request: allow targeting subset of destination service #843

dasch opened this issue Mar 7, 2024 · 8 comments

Comments

@dasch
Copy link

dasch commented Mar 7, 2024

Is your feature request related to a problem? Please describe.
A typical scenario I want to test is how service A response to its direct dependency service B being partially unavailable; I basically want to verify that A has proper timeouts and retries in place to be able to gracefully handle e.g. a single B pod being overloaded or in a bad state.

Describe the solution you'd like
I see that it's possible to scope a network disruption to just a list of specific IP addresses with the network.hosts field. However, I do not know the IP addresses of the B pods at the time of writing the Disruption. I would like to instead be able to provide a count of the destination service's pods that should be in scope for the disruption, with a percentage allowed. This would be dynamically translated to a list of IPs.

Describe alternatives you've considered
I can create a disruption on B instead of A, and set the count as I wish. However, that causes a disruption to all clients of B, whereas I want to limit the scope to A, which is the subject under test. We do not have dedicated environments for this, so limiting the impact of disruptions is key to staying popular with my colleagues :D

@Devatoria
Copy link
Contributor

Hey, just to clarify: you would like to drop all packets but only for a subset of hosts behind the hostname you provide to the disruption right?

In other words and with an example, your use case would be: I want to drop 100% of packets going to 50% of the hosts behind the provided hostname.

And it is a different use case than: I want to drop 50% of packets going to the provided hostname.

@dasch
Copy link
Author

dasch commented Mar 7, 2024

Yup; in my concrete case I probably want to delay rather than drop, but it's only for a subset of hosts behind the hostname, yes.

@Devatoria
Copy link
Contributor

Ok, I think there's a simplistic way to implement such a feature by resolving the given hostname and picking x% of returned IPs in the injector component.

@ptnapoleon wdyt?

@ptnapoleon
Copy link
Contributor

Is it literally just x% of returned IPs, or is there any other filtering you want to do on those hosts? Do you need the same x% of IPs to be picked across all injectors, or is it fine if they're all just picking a random x%? Do you need this to work for the spec.network.hosts field or also spec.network.services?

@dasch
Copy link
Author

dasch commented Mar 8, 2024

It would probably be better if it's the same IPs across all the selected pods, but that's not a hard requirement. But I'm thinking it would be relatively easy to do with a consistent hash? Doesn't have to be the same across runs, so maybe throw some Disruption specific value in there.

By the way, this would also work well for testing resilience to e.g. a single Aurora database reader being unavailable; there's a single hostname for the reader endpoint, with the configured number of reader instances behind it, so you can't target disruptions on just the hostname. It's another case where it's valuable to test that application retry connections, for example.

@Devatoria
Copy link
Contributor

Sounds good and easy to integrate within the host filters doing the resolution of hostnames: https://github.com/DataDog/chaos-controller/blob/main/injector/network_disruption.go#L1177

Passing a percentage of resolved IPs to keep would probably be enough and it is a valuable feature.

@ptnapoleon
Copy link
Contributor

I'll open a ticket for this internally for us to track

@dasch
Copy link
Author

dasch commented Nov 18, 2024

Are there any updates on the status of this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants