|
| 1 | +--- |
| 2 | +title: '1. Add shutdown hook to a Java application' |
| 3 | +layout: default |
| 4 | +nav_order: 1 |
| 5 | +parent: 'Lab 10: Build reliable Java application on ACA' |
| 6 | +--- |
| 7 | + |
| 8 | +# Graceful shutdown on Java Container Apps |
| 9 | +To achieve zero-downtime during a rolling update, gracefully shutting down a Java application is essential. Graceful shutdown refers to the "window of opportunity" an application has to programmatically clean up resources between the time a `SIGTERM` signal is sent to a container app and the time the app actually shuts down (receiving `SIGKILL`). See [Container App Lifecycle Shutdown](https://learn.microsoft.com/en-us/azure/container-apps/application-lifecycle-management#shutdown). |
| 10 | + |
| 11 | +The cleanup behavior may include logic such as: |
| 12 | +- Closing database connections |
| 13 | +- Waiting for any long-running operations to finish |
| 14 | +- Clearing out a message queue |
| 15 | +- Etc. |
| 16 | + |
| 17 | +`SIGTERM`can be sent to containers during various shutdown events, including management operations (such as scale in/down or any [Revision-scope changes](https://learn.microsoft.com/en-us/azure/container-apps/revisions#revision-scope-changes)) and internal platform upgrade(e.g. node upgrade). Therefore, it is essential to properly handle the `SIGTERM` signal in the Java applications to achieve zero-downtime. |
| 18 | + |
| 19 | + |
| 20 | +## Step by step guidance |
| 21 | + |
| 22 | +### 1. Handle SIGTERM signal |
| 23 | +In this lab, we will guide you on how to properly handle Eureka cache issues and long HTTP requests when receiving the `SIGTERM` signal. |
| 24 | + |
| 25 | +#### 1.1 Add a Shutdown Hook for Spring Cloud Applications |
| 26 | + |
| 27 | +Eureka is designed to be an eventually consistent system. When an upstream service app is shut down, the service-consuming app (i.e., client app) will not see the registry update immediately and will continue to send requests to the upstream app. If the upstream app shuts down immediately, the client app will receive `5xx` network/IO exceptions. |
| 28 | + |
| 29 | +To gracefully shut down an upstream app that offers services via Eureka service discovery, you need to catch the `SIGTERM` signal and follow the deregister-then-wait pattern: |
| 30 | +1) Deregister the instance from the Eureka server. |
| 31 | +2) Wait until all client apps refresh their Eureka cache. |
| 32 | +3) Shut down the application. |
| 33 | + |
| 34 | +To implement this, you may refer the sample code [EurekaGracefulShutdown.java](https://github.com/Azure-Samples/java-microservices-aca-lab/blob/main/src/spring-petclinic-customers-service/src/main/java/org/springframework/samples/petclinic/customers/shutdown/EurekaGracefulShutdown.java) |
| 35 | + |
| 36 | +```java |
| 37 | +@Component |
| 38 | +@Slf4j |
| 39 | +public class EurekaGracefulShutdown { |
| 40 | + |
| 41 | + @Autowired |
| 42 | + private EurekaInstanceConfigBean eurekaInstanceConfig; |
| 43 | + private static final String STATUS_DOWN = "DOWN"; |
| 44 | + private static final int WAIT_SECONDS = 30; |
| 45 | + |
| 46 | + @EventListener |
| 47 | + public void onShutdown(ContextClosedEvent event) { |
| 48 | + log.info("Caught shutdown event"); |
| 49 | + log.info("De-register instance from eureka server"); |
| 50 | + eurekaInstanceConfig.setStatusPageUrl(STATUS_DOWN); |
| 51 | + |
| 52 | + // Wait to continue serve traffic before all Eureka clients refresh their cache |
| 53 | + try { |
| 54 | + log.info("wait {} seconds before shutting down the application", WAIT_SECONDS); |
| 55 | + Thread.sleep(1000 * WAIT_SECONDS); |
| 56 | + } catch (InterruptedException e) { |
| 57 | + Thread.currentThread().interrupt(); |
| 58 | + } |
| 59 | + log.info("Shutdown the application now."); |
| 60 | + } |
| 61 | +} |
| 62 | + ``` |
| 63 | + |
| 64 | +Note, when setting the `WAIT_SECONDS`, consider the maximum possible Eureka cache intervals, including |
| 65 | +- eureka server cache `eureka.server.responseCacheUpdateIntervalMs` |
| 66 | +- eureka client cache `eureka.client.registryFetchIntervalSeconds` |
| 67 | +- ribboin load balacer cache `ribbon.ServerListRefreshInterval` if ribboin is used |
| 68 | + |
| 69 | +In the Pet Clinic sample, we have already added `EurekaGracefulShutdown` to all the micro-services using eureka as service discovery server. |
| 70 | + |
| 71 | + |
| 72 | +{: .note } |
| 73 | +> Azure Container Apps provides built-in service discovery for microservice applications within the same container environment. You can call a container app by sending a request to `http(s)://<CONTAINER_APP_NAME>` from another app in the environment. For more details, see [Call a container app by name](https://learn.microsoft.com/en-us/azure/container-apps/connect-apps?tabs=bash#call-a-container-app-by-name). If your microservice applications are in the same container environment, you can use this feature to avoid the Eureka cache issue. |
| 74 | +
|
| 75 | +#### 1.2 Config graceful shutdown for Spring-boot application |
| 76 | +Another common scenario is handling long HTTP operations. Before shutting down the application, the web server needs to finish processing all received HTTP requests. |
| 77 | + |
| 78 | +If the Java application is a Spring Boot application and does not offer services via Eureka, you can simply use Spring Boot’s built-in [graceful shutdown support](https://docs.spring.io/spring-boot/reference/web/graceful-shutdown.html). Otherwise, use the above approach. |
| 79 | + |
| 80 | +```yaml |
| 81 | +server.shutdown=graceful |
| 82 | +spring.lifecycle.timeout-per-shutdown-phase=30s |
| 83 | +``` |
| 84 | +This configuration uses a timeout to provide a grace period during which existing requests are allowed to complete, but no new requests will be permitted. |
| 85 | + |
| 86 | +In the Pet Clinic sample, we are using Spring Boot’s graceful shutdown support for the `gateway` application. |
| 87 | + |
| 88 | + |
| 89 | +### 2. Config terminationGracePeriodSeconds |
| 90 | +After adding proper shutdown hooks, configure the `terminationGracePeriodSeconds` in Container Apps to match the cleanup wait time. The `terminationGracePeriodSeconds` defaults to 30 seconds, update it to 60s for app `customers-service`. |
| 91 | + |
| 92 | +- Portal: Go to the `Revisions blade` -> Create a new revision -> Save this new revision. |
| 93 | + |
| 94 | + |
| 95 | +{: .note } |
| 96 | +> You can set a maximum value of 600 seconds (10 minutes) for terminationGracePeriodSeconds. If an application needs upwards of 10 minutes for cleanup, it is highly recommended to revisit the application’s design to reduce this time. |
| 97 | +
|
| 98 | +### 3. Test the application |
| 99 | +Use [wrk](https://github.com/wg/wrk) or any traffic emiting tool to verify there is no down-time during application shutdown. |
| 100 | + |
| 101 | +Open a terminal, and emitting traffic with wrk |
| 102 | +```bash |
| 103 | +endpoint=$api_gateway_FQDN/api/customer/owners |
| 104 | +wrk -t1 -c1 -d300s $endpoint |
| 105 | +``` |
| 106 | + |
| 107 | +Open a new terminal, and restart the revision |
| 108 | +```bash |
| 109 | +revision=$(az containerapp revision list \ |
| 110 | + --name customers-service \ |
| 111 | + --resource-group $RESOURCE_GROUP | jq -r .[].name) |
| 112 | + |
| 113 | +az containerapp revision restart \ |
| 114 | + --revision $revision \ |
| 115 | + --resource-group $RESOURCE_GROUP |
| 116 | +``` |
| 117 | + |
| 118 | +In the first terminal, you should see there is no `5xx` error during the whole application restart time window. |
| 119 | + |
| 120 | + |
| 121 | + |
0 commit comments