-
-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Improve health check by checking redis connection #766
Comments
Hello @Aohzan What is your Centrifugo configuration for Redis Sentinel? Do you use |
centrifugo 4.1.2-0 redis configuration part in centrifugo "engine": "redis",
"redis_sentinel_address": "localhost:26379",
"redis_sentinel_master_name": "mymaster", there is:
so 7 sentinel, the failover of the master has been handled correctly on all sentinel, but just 1 of 4 centrifugo didn't change its redis server |
Thanks for the details. In your case, I'd concentrate on trying to fix the root cause - I suppose first step here is upgrading to the latest Centrifugo version and trying to reproduce with it – there were many improvements in the underlying Redis library since v4.1.2. I think adding Redis connection check in health endpoint may not fully solve the problem. It may hide the problem, until no Centrifugo nodes left. I suppose you have just a Haproxy without Kubernetes? In this case Haproxy will remove failed node from the balancing, but the issue on Node will persist until someone restarts Centrifugo node. Is it right? Or it works differently? Probably there is some automation to restart failing node? In Kubernetes liveness probe failure results into app restart – in that case Redis connection check could make more sense. |
Yes, we will see to update to the latest version. Yes, no k8s, what we want it's that the HAProxy will exclude the problematic Centrifugo from the pool as soon as it can't handle request properly due to Redis issue. So it let me the time to debug and restart Centrifugo. |
Hello |
Probably you can experiment with new version first regarding Sentinel failover to make sure it solves the problem. I mean without upgrading rest of the app. While upgrade is recommended anyway, this may help to iterate on issue faster.
I think optional Redis check could make sense in general, though prefer it being disabled by default – when dealing with many connections removing node from balancing may be not better that waiting for connection issues to go away. Looks like Centrifugo needs to issue some write request to Redis to make sure connection is working and writable. UPD. Though not sure what to do with Redis Cluster case, where many Redis shards must be checked. In this perspective I'd invest to proper failover as I said before |
Hello,
We had an issue on our platform, we perform a HAProxy check on /health on centrifugo servers, which was still ok while centrifugo raised errors because Redis connection issued
READONLY You can't write against a read only replica
errors.centrifugo didn't change its redis server when sentinel announced the new primary, we had the issue on 1 of 4 servers.
Describe the solution you'd like
It would be great if the health check will return an error if there is any redis connection error
Thank you
The text was updated successfully, but these errors were encountered: