Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) #2284

Open
5 of 8 tasks
ThiagoAndrad opened this issue Aug 14, 2024 · 2 comments

Comments

@ThiagoAndrad
Copy link

Description

I have a producer using AWS MSK with the OAuthBearer authentication method. For some reason, some errors occur intermittently in the log related to SASL OAUTHBEARER.
I didn't notice any errors in publishing the messages.

.Net 8.0
Docker image: mcr.microsoft.com/dotnet/sdk:8.0
Confluent.Kafka: Version 2.5.2
AWS.MSK.Auth: Version 1.0.0

Error:

%3|1723592361.909|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)
%3|1723592361.909|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Disconnected (after 1ms in state AUTH_HANDSHAKE)
%3|1723592361.909|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)

Debug Log:

%7|1723592362.046|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Connected to ipv4#XX.X.X.XXX:9098
%7|1723592362.046|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Connecting to ipv4#XX.X.X.XXX:9098 (sasl_ssl) with socket 335
%7|1723592362.043|STATE|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Broker changed state TRY_CONNECT -> CONNECT
%7|1723592362.043|CONNECT|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: broker in state TRY_CONNECT connecting
%7|1723592361.912|RECV|rdkafka#producer-1| [thrd:sasl_ssl://b164-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://b164-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/164: Received MetadataResponse (v11, 3900 bytes, CorrId 25, rtt 2.63ms)
%7|1723592361.909|SEND|rdkafka#producer-1| [thrd:sasl_ssl://b164-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://b164-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/164: Sent MetadataRequest (v11, 65 bytes @ 0, CorrId 25)
%7|1723592361.909|RECONNECT|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Delaying next reconnect by 134ms
%7|1723592361.909|STATE|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Broker changed state INIT -> TRY_CONNECT
%7|1723592361.909|STATE|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Broker changed state DOWN -> INIT
%3|1723592361.909|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)
%3|1723592361.909|ERROR|rdkafka#producer-1| [thrd:app]: rdkafka#producer-1: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: Disconnected (after 1ms in state AUTH_HANDSHAKE)
%3|1723592361.909|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaw]: sasl_ssl://boot-aaaaaa.c3.kafka-serverless.us-east-1.amazonaws.com:9098/bootstrap: SASL OAUTHBEARER mechanism handshake failed: Local: Broker transport failure: broker's supported mechanisms: (n/a) (after 0ms in state DOWN)

My config:

    public KafkaClient(
        ILogger<KafkaClientHandle> logger,
    )
    {
        _logger = logger;
        var conf = new ProducerConfig
        {
            BootstrapServers = "server address here"
        };


        conf.SecurityProtocol = SecurityProtocol.SaslSsl;
        conf.SaslMechanism = SaslMechanism.OAuthBearer;
        conf.Debug = "broker,security,protocol,feature";

        _kafkaProducer = new ProducerBuilder<byte[], byte[]>(conf)
            .SetOAuthBearerTokenRefreshHandler(OauthCallback)
            .Build();
    }

    private async void OauthCallback(IClient client, string cfg)
    {
        try
        {
            AWSMSKAuthTokenGenerator mskAuthTokenGenerator = new AWSMSKAuthTokenGenerator();
            var (token, expiryMs) = await mskAuthTokenGenerator.GenerateAuthTokenAsync(Amazon.RegionEndpoint.USEast1, awsDebugCreds: true);
            client.OAuthBearerSetToken(token, expiryMs, "DummyPrincipal");
        }
        catch (Exception e)
        {
            _logger.LogError(e, "[Kafka] [OauthCallback] Failed to set OAuth token");
            client.OAuthBearerSetTokenFailure(e.ToString());
        }
    }

How to reproduce

Checklist

Please provide the following information:

  • A complete (i.e. we can run it), minimal program demonstrating the problem. No need to supply a project file.
  • Confluent.Kafka nuget version.
  • Apache Kafka version.
  • Client configuration.
  • Operating system.
  • Provide logs (with "debug" : "..." as necessary in configuration).
  • Provide broker log excerpts.
  • Critical issue.
@buseynehannes
Copy link

We are experiencing the same issue with a setup with 3 brokers and using confluent-kafka-python

Both libraries use librdkafka.
the FAQ of librdkafka mentions why the consumer/producer disconnects often.

For us, the problem seems to be idle connections. When we’ve listened to a single broker for more than the idle connection timeout, and rebalance (or other reason to connect the other broker), the broker connection will fail because of the connection timeout. Normally the connection will re-establish itself automatically, but since this connection fails on SASL authentication (instead of anything else) an error is thrown. I think this might be a bug in librdkafka.

You can investigate yourself by setting the config “debug”: “protocol,security”,. If your debug logs show failures on connection to a new broker after not connecting to that broker for + 10 minutes, this seems likely to be the issue
Or you can increase connections.max.idle.ms on the broker and see if that makes it happen less often, but know that this will increase resource usage on the broker.

I haven’t found a configuration-wise fix myself atm.
One more question about your use case though, are you using rack awareness in the client?

@codyspeck
Copy link

I am experiencing the same issue with a .NET app deployed in AWS but connecting to Confluent Cloud. Have not been able to identify a cause. Once this failure occurs, the only recourse is to restart the entire application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants