Skip to content

Commit

Permalink
feat: Frank/build reliable java with aca (#56)
Browse files Browse the repository at this point in the history
## Purpose
<!-- Describe the intention of the changes being proposed. What problem
does it solve or functionality does it add? -->
add guidance to build reliable Java, including
- Graceful shutdown
- Health probes 

## Does this introduce a breaking change?
<!-- Mark one with an "x". -->
```
[ ] Yes
[ ] No
```

## Pull Request Type
What kind of change does this Pull Request introduce?

<!-- Please check the one that applies to this PR using "x". -->
```
[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:
```

## How to Test
*  Get the code

```
git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install
```

* Test the code
<!-- Add steps to run the tests suite and/or manually test -->
```
```

## What to Check
Verify that the following are valid
* ...

## Other Information
<!-- Add any other helpful information that may be needed here. -->

---------

Co-authored-by: Frank Liu <[email protected]>
  • Loading branch information
frankliu20 and FrankLiu4138 authored Sep 25, 2024
1 parent cff9669 commit 2633429
Show file tree
Hide file tree
Showing 8 changed files with 279 additions and 0 deletions.
121 changes: 121 additions & 0 deletions docs/10_lab_reliable_application/1001.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
---
title: '1. Add shutdown hook to a Java application'
layout: default
nav_order: 1
parent: 'Lab 10: Build reliable Java application on ACA'
---

# Graceful shutdown on Java Container Apps
To achieve zero-downtime during a rolling update, gracefully shutting down a Java application is essential. Graceful shutdown refers to the "window of opportunity" an application has to programmatically clean up resources between the time a `SIGTERM` signal is sent to a container app and the time the app actually shuts down (receiving `SIGKILL`). See [Container App Lifecycle Shutdown](https://learn.microsoft.com/en-us/azure/container-apps/application-lifecycle-management#shutdown).

The cleanup behavior may include logic such as:
- Closing database connections
- Waiting for any long-running operations to finish
- Clearing out a message queue
- Etc.

`SIGTERM`can be sent to containers during various shutdown events, including management operations (such as scale in/down or any [Revision-scope changes](https://learn.microsoft.com/en-us/azure/container-apps/revisions#revision-scope-changes)) and internal platform upgrade(e.g. node upgrade). Therefore, it is essential to properly handle the `SIGTERM` signal in the Java applications to achieve zero-downtime.


## Step by step guidance

### 1. Handle SIGTERM signal
In this lab, we will guide you on how to properly handle Eureka cache issues and long HTTP requests when receiving the `SIGTERM` signal.

#### 1.1 Add a Shutdown Hook for Spring Cloud Applications

Eureka is designed to be an eventually consistent system. When an upstream service app is shut down, the service-consuming app (i.e., client app) will not see the registry update immediately and will continue to send requests to the upstream app. If the upstream app shuts down immediately, the client app will receive `5xx` network/IO exceptions.

To gracefully shut down an upstream app that offers services via Eureka service discovery, you need to catch the `SIGTERM` signal and follow the deregister-then-wait pattern:
1) Deregister the instance from the Eureka server.
2) Wait until all client apps refresh their Eureka cache.
3) Shut down the application.

To implement this, you may refer the sample code [EurekaGracefulShutdown.java](https://github.com/Azure-Samples/java-microservices-aca-lab/blob/main/src/spring-petclinic-customers-service/src/main/java/org/springframework/samples/petclinic/customers/shutdown/EurekaGracefulShutdown.java)

```java
@Component
@Slf4j
public class EurekaGracefulShutdown {

@Autowired
private EurekaInstanceConfigBean eurekaInstanceConfig;
private static final String STATUS_DOWN = "DOWN";
private static final int WAIT_SECONDS = 30;

@EventListener
public void onShutdown(ContextClosedEvent event) {
log.info("Caught shutdown event");
log.info("De-register instance from eureka server");
eurekaInstanceConfig.setStatusPageUrl(STATUS_DOWN);

// Wait to continue serve traffic before all Eureka clients refresh their cache
try {
log.info("wait {} seconds before shutting down the application", WAIT_SECONDS);
Thread.sleep(1000 * WAIT_SECONDS);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("Shutdown the application now.");
}
}
```

Note, when setting the `WAIT_SECONDS`, consider the maximum possible Eureka cache intervals, including
- eureka server cache `eureka.server.responseCacheUpdateIntervalMs`
- eureka client cache `eureka.client.registryFetchIntervalSeconds`
- ribboin load balacer cache `ribbon.ServerListRefreshInterval` if ribboin is used

In the Pet Clinic sample, we have already added `EurekaGracefulShutdown` to all the micro-services using eureka as service discovery server.


{: .note }
> Azure Container Apps provides built-in service discovery for microservice applications within the same container environment. You can call a container app by sending a request to `http(s)://<CONTAINER_APP_NAME>` from another app in the environment. For more details, see [Call a container app by name](https://learn.microsoft.com/en-us/azure/container-apps/connect-apps?tabs=bash#call-a-container-app-by-name). If your microservice applications are in the same container environment, you can use this feature to avoid the Eureka cache issue.
#### 1.2 Config graceful shutdown for Spring-boot application
Another common scenario is handling long HTTP operations. Before shutting down the application, the web server needs to finish processing all received HTTP requests.

If the Java application is a Spring Boot application and does not offer services via Eureka, you can simply use Spring Boot’s built-in [graceful shutdown support](https://docs.spring.io/spring-boot/reference/web/graceful-shutdown.html). Otherwise, use the above approach.

```yaml
server.shutdown=graceful
spring.lifecycle.timeout-per-shutdown-phase=30s
```
This configuration uses a timeout to provide a grace period during which existing requests are allowed to complete, but no new requests will be permitted.

In the Pet Clinic sample, we are using Spring Boot’s graceful shutdown support for the `gateway` application.


### 2. Config terminationGracePeriodSeconds
After adding proper shutdown hooks, configure the `terminationGracePeriodSeconds` in Container Apps to match the cleanup wait time. The `terminationGracePeriodSeconds` defaults to 30 seconds, update it to 60s for app `customers-service`.

- Portal: Go to the `Revisions blade` -> Create a new revision -> Save this new revision.
![lab 10 grace periods](../../images/lab10-grace.png)

{: .note }
> You can set a maximum value of 600 seconds (10 minutes) for terminationGracePeriodSeconds. If an application needs upwards of 10 minutes for cleanup, it is highly recommended to revisit the application’s design to reduce this time.
### 3. Test the application
Use [wrk](https://github.com/wg/wrk) or any traffic emiting tool to verify there is no down-time during application shutdown.

Open a terminal, and emitting traffic with wrk
```bash
endpoint=$api_gateway_FQDN/api/customer/owners
wrk -t1 -c1 -d300s $endpoint
```

Open a new terminal, and restart the revision
```bash
revision=$(az containerapp revision list \
--name customers-service \
--resource-group $RESOURCE_GROUP | jq -r .[].name)

az containerapp revision restart \
--revision $revision \
--resource-group $RESOURCE_GROUP
```

In the first terminal, you should see there is no `5xx` error during the whole application restart time window.
![lab 10 no dontime](../../images/lab10-no-downtime.png)


98 changes: 98 additions & 0 deletions docs/10_lab_reliable_application/1002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
title: '2. Config health probes for a Java Application'
layout: default
nav_order: 2
parent: 'Lab 10: Build reliable Java application on ACA'
---

# Config health probes for a Java Application
Azure Container Apps health probes allow the Container Apps runtime to regularly inspect the status of your container apps.

Container Apps supports the following probes:

- `Startup`. Checks if your application has successfully started. This check is separate from the liveness probe and executes during the initial startup phase of your application.
- `Liveness`. Checks if your application is still running and responsive.
- `Readiness`. Checks to see if a replica is ready to handle incoming requests.

You can find more info on Azuzre doc [Health probes in Azure Container Apps](https://learn.microsoft.com/en-us/azure/container-apps/health-probes?tabs=arm-template).

Health probes can help work around performance issues related to timeouts during container startup, deadlocks when running the container, and serving traffic when the container is not ready to accept traffic.

## Step by step guidance

### 1. Expose health probes in Spring Boot Application

Spring Boot has out-of-the-box support to [manage your application availability state](https://docs.spring.io/spring-boot/docs/2.3.0.RELEASE/reference/html/production-ready-features.html#production-ready-kubernetes-probes).

Add bellow configuration into `customers-service.yml`

```yml
management:
health:
livenessState:
enabled: true
readinessState:
enabled: true
endpoint:
health:
probes:
enabled: true

```
With the above configration, two heath endpoints will be exposed via spring-boot actuator
- `/actuator/health/liveness` for application liveness
- `/actuator/health/readiness` for application readiness

### 2. Define a customized health indicator in Spring Boot Application
In Spring Boot app, you can define a customized `HealthIndicator`. Here is a `HealthIndicator` sample in project `customers-service`.
```java
public class ServiceHealthIndicator implements HealthIndicator {

private final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
private boolean isHealthy = false;

private OwnerRepository ownerRepo;

public ServiceHealthIndicator(OwnerRepository ownerRepo) {
this.ownerRepo = ownerRepo;
scheduler.scheduleAtFixedRate(() -> {
checkDatabaseStatus();
if (isHealthy) {
scheduler.shutdown();
}
}, 10, 5, TimeUnit.SECONDS);
}

private void checkDatabaseStatus() {
boolean databaseReady = ownerRepo.findAll().size() > 0;
if (databaseReady) {
isHealthy = true;
log.info("Database is healthy. Stopping checks.");
} else {
log.info("Database is not healthy. Checking again in 5 seconds.");
}
}

@Override
public Health health() {
return isHealthy ? Health.up().build() : Health.down().build();
}
}
```
In this sample, the code `ServiceHealthIndicator` will report health status `UP` only after some db operation is ready. This can be helpful in some scenarios where you application needs some warmup (e.g. cache/db preload) time before receiving traffic.


### 3. Config Health probes in Azure Container Apps
Health probes can be configed via either Portal or [ARM template](https://learn.microsoft.com/en-us/azure/container-apps/health-probes?tabs=arm-template).


- Portal: Find the application `customers-service` -> Go to the `Revisions blade` -> Create a new revision -> Save this new revision.
![lab 10 health probes](../../images/lab10-liveness-probe.png)
![lab 10 readiness probes](../../images/lab10-readiness-probe.png)

Here, we set the `initial delay seconds` in readiness probe to 10 seconds, which align with the above health check logic in `ServiceHealthIndicator`.

{: .note }
> Azure Container Apps is built ontop of Kubernetes, the health probes feature maps closely with Kubernetes Probes, you may gain a deeper understanding on probes from [kubernetes probes](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).

18 changes: 18 additions & 0 deletions docs/10_lab_reliable_application/1003.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: '3. Review'
layout: default
nav_order: 3
parent: 'Lab 10: Build reliable Java application on ACA'
---

# Review

In this lab, you implemented some design to build reliable Java applications with Azure Container Apps. In this lab you

- Add appropiate shutdown hook to a Java application
- Config health probes for a Java application


The below image illustrates the end state you have build in this lab.

![lab 5 overview](../../images/lab5.png)
42 changes: 42 additions & 0 deletions docs/10_lab_reliable_application/10_reliable_java_aca.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: 'Lab 10: Build reliable Java application on ACA'
layout: default
nav_order: 12
has_children: true
---

# Lab 10: Build reliable Java application on ACA

# Student manual

## Lab scenario

Azure Container Apps has a richful set of features available to help you build reliable Java applications. In this Lab, you will learn how to design and maintain your app for long-term health and stability. You can find more infomation on
- [Management and operations for the Azure Container Apps - Landing Zone Accelerator](https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/scenarios/app-platform/container-apps/management)


## Objectives

After you complete this lab, you will be able to:

- Gracefully shut down a Java application
- Config health probes for a Java Application

The below image illustrates the end state you will be building in this lab.

![lab 5 overview](../../images/lab5.png)

## Lab Duration

- **Estimated Time**: 60 minutes

## Instructions

During this lab, you will:

- Add appropiate shutdown hook to a Java application
- Config health probes for a Java application


{: .note }
> The instructions provided in this exercise assume that you successfully completed the previous exercise and are using the same lab environment, including your Git Bash session with the relevant environment variables already set.
Binary file added images/lab10-grace.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/lab10-liveness-probe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/lab10-no-downtime.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/lab10-readiness-probe.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2633429

Please sign in to comment.