-
-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchers on Custom Resources throw RuntimeError("Session is closed") and permanently die #980
Comments
Noticing the same behaviour on EKS with Python 3.11.2, aiohttp 3.8.4, kopf 1.36.0. In my case I am monitoring delete events for Job resources. What seems to happen is: request fails because of token expiry -> goes to sleep because of backoff -> while sleeping the session gets closed by vault when invalidating previous credentials -> upon waking up and retrying the request it fails with |
We are also noticing the same issues but in AKS. Anybody found a workaround or solution for this problem? Maybe a way to include this in the probes? python version 3.11.3 |
Thanks for this hint! Indeed, this is a possible scenario in some (rare?) cases of API failing several concurrent requests. Can you please try a fix from #1031 — the branch is Depending on your packaging system, that can be installed like this (pip or requirements.txt; different for Poetry):
That would be Kopf 1.36.1 plus this fix only. |
Hi @nolar I think the patch doesn't work, I have introduced a couple of probes to check whether the watcher is alive and it seems to always fail. This happens to us regularly, maybe depending on cluster setup. In particular i think it all started when we configured: And started mounting the token using volumen projection with a duration of 3600s. Seems like kopf doesn't handle the refresh well. below my logs:
These are my dependencies:
|
Faced it as well, it's a big problem for us @Spacca |
We ended up loading the re-authentication hook with a very rudimentary check to force the operator to restart. For us this was every 10 mins. Mileage may vary.
|
I started seeing this issue after upgrading to Azure Kubernetes Service (AKS) 1.30. The release notes for AKS 1.30 say:
Anyone else who tries to use kopf on AKS 1.30+ will also experience this issue. AKS 1.29 goes end-of-life in March 2025, so fixing this is becoming more urgent. Alternatively, is there a workaround that doesn't require modifying |
Long story short
Operator starts up and runs normally as expected. After for running for some time some of the watch streams will throw a
RuntimeError("Session is closed")
. Once this happens that watch stream will never restart until the operator is restarted. This only appears to happen with custom resources (configmaps are fine).Kopf version
1.35.6
Kubernetes version
v1.23.13 eks
Python version
3.9.13
Code
Logs
Additional information
This is an EKS cluster and is using aws eks get-token in the kubeconfig to authenticate.
Using aiohttp version 3.8.3
using kubernetes client version: 24.2.0
aws-cli/2.2.43 Python/3.8.8 (used by kubeconfig)
Not all operators watch streams die at the same time. This is not running in a container on the cluster but on a server outside of aws.
The text was updated successfully, but these errors were encountered: