-
Notifications
You must be signed in to change notification settings - Fork 7
Kubernetes Liveness and Readiness Probes
Applications can exhibit instability due to various factors, including temporary loss of connectivity, misconfigurations, or internal application faults. Kubernetes ensures application health through liveness and readiness probes, facilitating automatic container restarts when necessary. However, for detailed insights into application status, resource usage, and errors, developers should integrate monitoring and observability tools alongside Kubernetes.
A probe in Kubernetes is a mechanism that performs periodic health checks on containers to manage their lifecycle. These checks help determine when to restart a container (liveness probe) or when a container is ready to handle traffic (readiness probe). Developers can define probes in their service deployments using YAML configuration files or the kubectl
command-line interface, with the YAML approach being highly recommended for its clarity and version control capabilities.
- Liveness Probe
- This determines whether an application running in a container is in a healthy state. If the liveness probe detects an unhealthy state, then Kubernetes kills the container and WILL RESTART the container.
- Readiness Probe
- This determines whether a container is ready to handle requests or receive traffic. A failure in this probe leads Kubernetes to stop sending traffic to that container by removing its IP from the service endpoints WITHOUT restarting the container. Instead, it's expected that the application will eventually pass the readiness probe through internal recovery, configuration changes, or by completing its initialization tasks (basically we have to troubleshoot to get the K8s readiness probe to pass).
- This is useful when waiting for an application to perform time-consuming initial tasks, such as establishing network connections, loading files, and warming caches.
- Startup Probe
- This determines whether the application within a container has started. They are crucial for applications that have lengthy startup times, ensuring that liveness and readiness probes do not interfere prematurely.
- Startup probes run before any other probes, and, unless it finishes successfully, disables other probes. If a container fails its startup probe, then the container is killed and follows the pod’s restartPolicy.
All VRO Spring Boot Applications (BIE Kafka and BIP API), API gateway, and Lighthouse API are configured the same way. This part of this documentation provides a comprehensive guide on configuring Kubernetes liveness and readiness probes for Spring Boot applications.
Before configuring liveness and readiness probes for our Spring Boot applications, we need the following information:
- Health Check Port: Identify the port used for health checks. Found in either the application.yaml file or gradle.properties file of the VRO microservice.
-
Health Check URL Path: Determine the path to the health check URL. This is often specified in the Dockerfile using a
HEALTHCHECK CMD
directive like this:HEALTHCHECK CMD curl --fail http://localhost:${HEALTHCHECK_PORT}/actuator/health || exit 1
-
Actuator Dependency: Verify if the VRO application includes the Spring Boot Actuator dependency by checking the build.gradle file for this line:
implementation 'org.springframework.boot:spring-boot-starter-actuator'
.
Step 1: Configure Liveness and Readiness probe endpoints
In the application.yaml file (located in the resources directory of the Spring Boot VRO microservice), configure the existing Spring Boot Actuator's health endpoint to include liveness and readiness probes, which are then accessible via specific paths.
Step 2: Helm Chart values.yaml Configuration
In the Helm chart for the application, modify the values.yaml file to configure the ports, livenessProbe, and readinessProbe. This is where we specify the specific paths for liveness checks (/actuator/health/liveness
) and readiness checks (/actuator/health/readiness
).
initialDelaySeconds: This setting delays the start of the liveness probe checks by 120
seconds after the container has started. This delay allows the application within the container enough time to initialize and start up before Kubernetes begins checking its liveness.
periodSeconds: This configuration specifies the frequency at which the liveness probe checks are performed. With periodSeconds set to 10
, Kubernetes will check the liveness of the container every 10
seconds.
timeoutSeconds: This parameter defines the time after which the probe check is considered failed if no response is received. Here, if the liveness probe does not receive a response from the /actuator/health/liveness
endpoint within 10
seconds, the check will fail. Setting an appropriate timeout prevents false positives in situations where the application or system is temporarily slow.
failureThreshold: This setting determines the number of consecutive failures required to consider the probe failed. With a failureThreshold of 3
, Kubernetes will mark the liveness probe as failed and restart the container only after three consecutive failures. This threshold helps in avoiding unnecessary restarts for transient or short-lived issues.
Step 3: Helm Chart deployment.yaml Configuration
In the deployment.yaml file, add the configurations for ports, livenessProbe, and readinessProbe within spec.containers
:
Step 4: Incrementing Chart.yaml and appVersion
To track changes and updates, increment this chart version in the Chart.yaml file with every change to the chart and Helm templates. VRO is still to discuss how to automate this process (future work).
Step 5: Testing endpoints
Ensure the application's health endpoints (/actuator/health/liveness
and /actuator/health/readiness
) are correctly implemented and return the expected statuses. Locally, spin up the specific VRO microservice and confirmed the service container is running in Docker Desktop. Go on the browser and follow this format:
http://localhost:${HEALTHCHECK_PORT}/actuator/health
For example, http://localhost:10301/actuator/health
It is best practice to regularly review and test the liveness and readiness configurations to ensure they accurately reflect the application's health and readiness states.
bIS-api (formerly BGS-api) pending...