Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The CRT detects the beginning of JVM destroy but doesn’t manage graceful shutdown properly #861

Open
1 task
yk-littlepay opened this issue Jan 16, 2025 · 3 comments

Comments

@yk-littlepay
Copy link

yk-littlepay commented Jan 16, 2025

Describe the bug

If the JVM receives a shutdown signal, the client does not stop properly.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

after the shutdown signal, it should cancel the current connections and inform the user about it.

Current Behavior

nothing happens after the shutdown signal, and it's stuck forever.

Reproduction Steps

Here is an example of the quarkus application with sqs sync client using aws-crt

package com.example;

import io.quarkus.runtime.Quarkus;
import io.quarkus.runtime.Shutdown;
import io.quarkus.runtime.Startup;
import jakarta.inject.Singleton;
import lombok.RequiredArgsConstructor;
import lombok.extern.jbosslog.JBossLog;
import software.amazon.awssdk.services.sqs.SqsAsyncClient;
import software.amazon.awssdk.services.sqs.model.DeleteMessageRequest;
import software.amazon.awssdk.services.sqs.model.ReceiveMessageRequest;
import software.amazon.awssdk.services.sqs.model.ReceiveMessageResponse;

import java.util.concurrent.Semaphore;

import static java.util.concurrent.CompletableFuture.runAsync;
import static software.amazon.awssdk.services.sqs.model.MessageSystemAttributeName.APPROXIMATE_RECEIVE_COUNT;

@JBossLog
@Singleton
@RequiredArgsConstructor
public final class Application {
    final SqsAsyncClient sqs;
    // It is necessary to wait until the message is received and processed correctly before shutting down the application.
    private final Semaphore round = new Semaphore(1);

    @Startup
    public void onStart() throws InterruptedException {
        log.info("Starting the application...");

        round.acquire(); // acquire a lock on a semaphore
        runAsync(() -> {
            try {
                var url = sqs.listQueues().join().queueUrls().stream().findAny().orElse(null);
                log.info("url = " + url);
                var messageResponse = sqs.receiveMessage(ReceiveMessageRequest.builder()
                                .queueUrl(url)
                                .waitTimeSeconds(20) // maximum time to wait for a message to arrive
                                .visibilityTimeout(40)
                                .maxNumberOfMessages(10)
                                .messageSystemAttributeNames(APPROXIMATE_RECEIVE_COUNT)
                                .build())
                        .join(); // aws-crt never releases if a shutdown signal is  sent

                processMessage(url, messageResponse);

            } catch (Exception error) {
                log.errorv(error, "Error occurred: ");
            } finally {
                round.release();
                Quarkus.asyncExit();
            }
        });
    }

    @Shutdown
    public void onStop() throws InterruptedException {
        log.info("Stopping...");
        round.acquire();
        log.info("Stopped");
    }


    private void processMessage(String url, ReceiveMessageResponse response) {
        log.info("Received message: " + response.messages());
        response.messages().forEach(message -> {
            log.info("Processing message: " + message.body());
            sqs.deleteMessage(DeleteMessageRequest.builder()
                            .queueUrl(url)
                            .receiptHandle(message.receiptHandle())
                            .build())
                    .join();  // aws-crt never releases if a shutdown signal is sent
        });
    }
}

Possible Solution

No response

Additional Information/Context

No response

aws-crt-java version used

2.29.23

Java version used

Corretto 21.0.5-amzn

Operating System and version

MacOS 15.2

@yk-littlepay yk-littlepay added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 16, 2025
@yk-littlepay
Copy link
Author

PS. If I change to software.amazon.awssdk:netty-nio-client then it works as expected

@jasonstack
Copy link

jasonstack commented Jan 17, 2025

I might have encountered similar issue.

My codebase has an operation to read s3 file using S3CrtAsyncClient. If this operation is called after JVM receives shutdown signal (eg. testcontainers close), the method will timeout, like hanging until timeout is reached. The same operation works fine with non-crt S3 client or when executed before receiving JVM shutdown signal.

Does CRT eventloop depend on JVM shutdown signal? how can we debug it?

I am using software.amazon.awssdk.crt:aws-crt:0.33.7 with java 11.0.19 2023-04-18 LTS. AwsEventLoop is RUNNABLE from threadump.

"AwsEventLoop 1" #148 [110] daemon prio=5 os_prio=0 cpu=0.53ms elapsed=11.73s tid=0x0000ffff0c00f660 nid=110 runnable  [0x0000000000000000]
 java.lang.Thread.State: RUNNABLE

 Locked ownable synchronizers:
- None

....

Thanks


Update: CRT.acquireShutdownRef() will prevent CRT shutdown before JVM shutdown hook

@bretambrose
Copy link
Contributor

bretambrose commented Jan 17, 2025

Functionality related to the CRT is ref-counted. By default, the ref count starts at one (https://github.com/awslabs/aws-crt-java/blob/main/src/main/java/software/amazon/awssdk/crt/CRT.java#L406) and is decremented by a shutdown hook (https://github.com/awslabs/aws-crt-java/blob/main/src/main/java/software/amazon/awssdk/crt/CRT.java#L64C1-L70C12). If you want CRT logic to run past the initial shutdown signal, you need to add your own reference (https://github.com/awslabs/aws-crt-java/blob/main/src/main/java/software/amazon/awssdk/crt/CRT.java#L419-L426) and release it when no more work remains.

@bretambrose bretambrose removed bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants