Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netty: Per-rpc call option authority verification against peer cert subject names #11724

Open
wants to merge 77 commits into
base: master
Choose a base branch
from

Conversation

kannanjgithub
Copy link
Contributor

No description provided.

@kannanjgithub kannanjgithub requested a review from ejona86 December 4, 2024 13:23
@kannanjgithub
Copy link
Contributor Author

I don't know what this error is about:

java/netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java:621:18: 'public' modifier out of order with the JLS suggestions. [ModifierOrder]

@ejona86
Copy link
Member

ejona86 commented Dec 9, 2024

synchronized public boolean mayBeVerifyAuthority should have "public" first: public synchronized boolean mayBeVerifyAuthority.

JLS == Java Language Specification

The relevant part of the style guide:
https://google.github.io/styleguide/javaguide.html#s4.8.7-modifiers
But checkstyle also links to some useful things:
https://checkstyle.sourceforge.io/checks/modifier/modifierorder.html

netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiator.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiator.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/build.gradle Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/NoopSslEngine.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiator.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
netty/src/main/java/io/grpc/netty/ProtocolNegotiators.java Outdated Show resolved Hide resolved
@kannanjgithub
Copy link
Contributor Author

Can we bypass the animal sniffer check for the tests

Yeah, that's not a problem. We don't run the netty/okhttp unit tests on Android. Only in interop-testing/src/main (it's tests don't run on Android either) do you need to be more careful, like with AbstractInteropTest.

How to disable animal sniffer for tests? I see there is an excludeDependencies but the supported artifact patterns don't have groupId:artifactId::scope that we can use to exclude the test scope.

Can you help with with this?

@ejona86
Copy link
Member

ejona86 commented Jan 16, 2025

Use @org.codehaus.mojo.animal_sniffer.IgnoreJRERequirement on a class or method.

@kannanjgithub
Copy link
Contributor Author

Use @org.codehaus.mojo.animal_sniffer.IgnoreJRERequirement on a class or method.

The usage of X509ExtendedTrustManager in the ProtocolNegotiatorsTest was for testing the cache but now after I moved the cache to NettyClientTransport, it is no longer required because the cache is not testable, since it doesn't use mocks for protocol negotiator.

FakeClientTransportListener fakeClientTransportListener = new FakeClientTransportListener();
callMeMaybe(transport.start(fakeClientTransportListener));
synchronized (fakeClientTransportListener) {
fakeClientTransportListener.wait(10000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait is allowed to spuriously return early, which means this could be flaky. It also looks like isConnected should be @GuardedBy("this") and all reads/writes to it happen within the lock, since you can't guarantee the wakeup was because of the notify().

Dealing with the spurious wakeup is slightly annoying when there's a timeout. I suggest using a Future<Void> connected = SettableFuture.create() and completing the future instead of setting isConnected to true. Or use a CountDownLatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new ClientStreamTracer[]{new ClientStreamTracer() {
}});

assertThat(stream).isInstanceOf(FailingClientStream.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instanceOf is a bit of a code smell, but within acceptability, but abusing the timeout insight is not great. The nice opaque/black-box testing here is to pass a Listener (a mock is fine) to stream.start() and see that the stream fails (closed() is called, and you can see the passed status).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@ejona86
Copy link
Member

ejona86 commented Jan 16, 2025

The usage of X509ExtendedTrustManager in the ProtocolNegotiatorsTest was for testing the cache but now after I moved the cache to NettyClientTransport, it is no longer required because the cache is not testable, since it doesn't use mocks for protocol negotiator.

You can easily use a forwarding protocol negotiator (or mock+delegatesTo()) by injecting it on one of the newTransport(result.negotiator.newNegotiator()) lines. With the mock, you could then easily verify how many times verifyAuthority() was called.

(mock+delegatesTo() is actually really nice in general.)

@kannanjgithub
Copy link
Contributor Author

The usage of X509ExtendedTrustManager in the ProtocolNegotiatorsTest was for testing the cache but now after I moved the cache to NettyClientTransport, it is no longer required because the cache is not testable, since it doesn't use mocks for protocol negotiator.

You can easily use a forwarding protocol negotiator (or mock+delegatesTo()) by injecting it on one of the newTransport(result.negotiator.newNegotiator()) lines. With the mock, you could then easily verify how many times verifyAuthority() was called.

(mock+delegatesTo() is actually really nice in general.)

Done.

}

public void setSslEngine(SSLEngine sslEngine) {
this.sslEngine = sslEngine;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad news.

Looking at "unnecessary" changes to the tests made me realize what's happening here. We can't do this. The ProtocolNegotiator instance is shared across transports/connections. ChannelHandler returned by ProtocolNegotiator.newHandler() is per-connection, not the negotiator itself. The authority check has to be done for a specific connection; the server's certificate can be different on a different connection. That means we need to plumb the negotiator's handler (result of ProtocolNegotiator.newHandler()) or the negotiator handler results instead of the negotiator itself.

To plumb using the negotiator's handler, we could extend ChannelHandler with our own interface that has a verifyAuthority() method, return that from ProtocolNegotiator.newHandler(), save that in the NettyClientTransport, and call it per-RPC. But that will require changing more implementations (including Google-internal ones) and will get ugly as the ProtocolNegotiator API is shared between client-side and server-side.

Plumbing using the negotiator's handler results seems easier. The results are communicated from the handler with ProtocolNegotiationEvent. Those results then are plumbed into NettyClientHandler.handleProtocolNegotiationComplete() and saved. (FYI: The Attributes are the same ones that are exposed in ClientCall.getAttributes().) We can make an interface for Status verifyAuthority(String authority), create an internal Attribute.Key, and store an object able to do the verifyAuthority in those attributes from the tls handler. NettyClientTransport has access to those attributes via handler.getAttributes(). That should be pretty easy, but could be annoying in the new tests (no existing tests need updating).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Yet to do unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrote tests.

@@ -194,6 +210,25 @@ public ClientStream newStream(
if (channel == null) {
return new FailingClientStream(statusExplainingWhyTheChannelIsNull, tracers);
}
if (callOptions.getAuthority() != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that we're getting this per-RPC authority separately from the data flow that uses the authority. That makes me nervous, especially for security. As the code changes, it's likely this will break. In fact, it is already broken because the LB-based override only influences setAuthority().

For using the authority, the per-RPC authority is copied to the stream in ClientCallImpl. NettyClientStream then copies it into the Netty Http2Headers (while running an application's thread).

Having the logic in NettyClientTransport has been a bit odd, but fine, and I saw why it was done to avoid plumbing of the protocol negotiator to other places. Normally we'd add RPC logic in NettyClientHandler/NettyClientStream. NettyClientTransport doesn't do much other than create/manage the Netty channel (connection). I suggest we move this logic to NettyClientHandler and forgo any cache synchronization because it will always run on the transport thread.

Copy link
Contributor Author

@kannanjgithub kannanjgithub Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "it is already broken because the LB-based override only influences setAuthority()" I thought you were implying what it doesn't influence is that at the authority using time a different authority could still be set into CallOption and ClientCallImpl.start would set it into the stream.
But in the normal sequence I observe that DelayedClientTransport::reprocess occurs later then ClientCallImpl.start, so it will overwrite any earlier authority set into the stream with the authority for which the peer verification actually happens against the peer cert. So what is the problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when you say we do the authority verification now in NettyClientHandler, where exactly there? At first I thought we would have a NettyClientHandler::verifyAuthority(String authority) invoked from NettyClientStream before enqueuing the CreateStreamCommand. But you said it needs to be done on the transport thread whereas the writing the command to the queue is the application thread.

writeHeadersInternal:171, NettyClientStream$Sink (io.grpc.netty.shaded.io.grpc.netty)
writeHeaders:123, NettyClientStream$Sink (io.grpc.netty.shaded.io.grpc.netty)
start:159, AbstractClientStream (io.grpc.internal)
start:92, ForwardingClientStream (io.grpc.internal)
start:722, InternalSubchannel$CallTracingTransport$1 (io.grpc.internal)
internalStart:256, DelayedStream (io.grpc.internal)
setStream:144, DelayedStream (io.grpc.internal)
createRealStream:367, DelayedClientTransport$PendingStream (io.grpc.internal)
access$500:346, DelayedClientTransport$PendingStream (io.grpc.internal)
reprocess:304, DelayedClientTransport (io.grpc.internal)
updateSubchannelPicker:803, ManagedChannelImpl (io.grpc.internal)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the authority returned by the LB policy would be used (stream.setAuthority() will put it in the headers) without being verified (because this code is looking at callOptions).

Where CreateStreamCommand is consumed:

private void createStream(CreateStreamCommand command, ChannelPromise promise)

Copy link
Contributor Author

@kannanjgithub kannanjgithub Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is the authority returned by the LB policy would be used (stream.setAuthority() will put it in the headers) without being verified (because this code is looking at callOptions).

Even with the suggested change to move the authority verification code to NettyClientHandler , that code can't still influence the using code for authority in ClientCallImpl , right? And this will require plumbing some new code to take into account any authority from picker result to be considered in ClientCallImpl.setAuthority.

With both approaches we are failing real stream creation and the headers set into the pending stream such as from ClientCallImpl.setAuthority are only copied into the real stream after its creation, which would have failed if the authority didn't match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NettyClientHandler would see any changes to setAuthority(). Since it is sending the request on-the-wire, it will always have the "correct" authority to verify. The point is not to influence what authority is being used, but verify that the chose authority is appropriate. So, yes, the ClientCallImpl code will call setAuthority(), and the LB result might cause setAuthority(), but NettyClientHandler can fail the RPC before it is sent.

With both approaches we are failing real stream creation and the headers set into the pending stream such as from ClientCallImpl.setAuthority are only copied into the real stream after its creation, which would have failed if the authority didn't match.

PendingStream shouldn't have much to do with it. setAuthority() is after stream creation but before stream start(). It is expected for setAuthority() to be called before start(). Stream start() is the point that CreateStreamCommand is sent to NettyClientHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved logic to NettyClientHandler. Yet to write/fix unit tests (planning to do via NettyClilentTransportTest itself like before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants