-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ allow setting watchTimeoutPeriod when creating informers #2738
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ccding The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Welcome @ccding! |
Hi @ccding. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @FillZpp Can you take a look? Thanks! |
pkg/cache/cache.go
Outdated
// times out, the cache will close the watch and reconnect. | ||
// | ||
// Defaults to a random duration between 5 and 10 minutes if unset. | ||
WatchTimeoutPeriod *time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG. Will it be easier to understand if name it like RewatchPeriod
?
A WatchTimeoutPeriod
field in cache.Options
seems to be the timeout that waits for the cache starting to watch?
WDYT @xiang90 @sbueringer @vincepri
pkg/cache/internal/informers.go
Outdated
@@ -354,6 +360,9 @@ func (ip *Informers) addInformerToMap(gvk schema.GroupVersionKind, obj runtime.O | |||
WatchFunc: func(opts metav1.ListOptions) (watch.Interface, error) { | |||
ip.selector.ApplyToList(&opts) | |||
opts.Watch = true // Watch needs to be set to true separately | |||
if ip.watchTimeoutPeriod != nil { | |||
opts.TimeoutSeconds = ptr.To(int64(ip.watchTimeoutPeriod.Seconds())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a certain time without any random, it might bring a large number of watch requests to apiserver when all the old watches closed and try to reconnect, if your operator watch a lot of resources as you described in this pr.
This is why the default timeout is a random number between 5 and 10 minutes. So how about add a little random time base on the given period?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That makes sense. Fixed it following the style of https://github.com/kubernetes/client-go/blob/8c4efe8d079e405329f314fb789a41ac6af101dc/tools/cache/reflector.go#L418
/lgtm /assign @alvaroaleman @vincepri @sbueringer |
/ok-to-test |
The majority of the server-side cost to a watch is in sending the data, not in a client establishing a connection. Clients resume watches from where they left off using the ResourceVersion, so the only thing this could possibly save is some TLS handshakes, which seems neglible to me. Have you done any benchmarking to see the impact of this? This is an extremely low-level setting and people might not understand the implications of changing it. If we are going to expose something like this, I want to see some data that this actually makes a meaningful difference. |
/hold |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/close as there was no further feedback |
@sbueringer: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
We use kubebuilder to create operators. These operators watch a lot of resources. The default watch timeout is a random number between 5 and 10 minutes, which triggers a re-watch every 7.5 minutes on average. However, this behavior causes a log of QPS to our k8s API server. We want to increase the watch timeout to reduce the overhead of the API server, thus this PR makes it configurable.