Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add built-in health check functionality for systems like Kubernetes #308

Closed
wants to merge 1 commit into from

Conversation

jbdietrich
Copy link
Contributor

@jbdietrich jbdietrich commented Nov 21, 2022

Description

We want to integrate the racecar consumer health-check readiness probe into racecar itself. As a result, the health check functionality will be part of racecar so that racecar users do not need to implement the logic as well as the bash scripts by themselves. In another word, to check the heathiness of racecar, the racecar users only need to include the built-in probe in there K8s manifest.

The readiness probe touches the file to update the creation timestamp in every iteration of the racecar main loop. When the racecar is down, the timestamp gets expired so that the readiness probe which checks the timestamp fails.

We also need to update the handbook of racecar to introduce how to make use of the integrated readiness probe. The probe is a bash script and two env vars are needed.

[edit: description provided by @BingkunWu]

@meqif
Copy link

meqif commented Dec 5, 2022

@jbdietrich This is a nice improvement, thank you!

However, I found that due to a bug in rdkafka, sometimes the consumer stops processing one or more partitions. That sort of issue is not detectable through a consumer-wide liveness check because the consumer keeps making progress in some but not all assigned partitions.

What worked for me was tracking the liveness of each assigned partition by 1. refreshing files that represent the liveness of each assigned partition and 2. registering a rebalance listener that manages (deletes and/or creates) those partition liveness files. This solution has already saved my bacon at least once. :)

I can open a draft pull request with that strategy, if you think that'd be valuable.

@jbdietrich
Copy link
Contributor Author

Thanks for your response @meqif!

I can open a draft pull request with that strategy, if you think that'd be valuable.

That sounds great, please do!

@meqif
Copy link

meqif commented Dec 12, 2022

Hi @jbdietrich, sorry for the delay! I've opened a draft PR (#309) documenting how a liveness probe as I described might be implemented.

@jbdietrich
Copy link
Contributor Author

@meqif no problem, thanks for putting up the PR. I am not sure whether we will get to it before the holidays unfortunately. If not, I'll have the team review early in 2023.

@meqif
Copy link

meqif commented Dec 15, 2022

@jbdietrich That's okay, there's no rush on my end. I'll also be leaving for the holidays soon. :)

After all, that PR is simply adding documentation for an approach we've been using (plus a small code change which would allow us to remove a monkey patch). We're not blocked in any way.

@jbdietrich
Copy link
Contributor Author

jbdietrich commented Mar 16, 2023

Superseded by #319

@jbdietrich jbdietrich closed this Mar 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants