-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xdsclient: update watcher API as per gRFC A88 #7977
base: master
Are you sure you want to change the base?
Conversation
8d198b7
to
a9b45c0
Compare
4009e3e
to
57dbf23
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7977 +/- ##
==========================================
- Coverage 82.02% 82.00% -0.02%
==========================================
Files 410 410
Lines 40233 40250 +17
==========================================
+ Hits 33000 33007 +7
- Misses 5865 5872 +7
- Partials 1368 1371 +3
🚀 New features to boost your workflow:
|
43c9adb
to
89f475a
Compare
89f475a
to
a48b5bb
Compare
…cks for ambient and resource error
1815062
to
b320a7d
Compare
b320a7d
to
33280a0
Compare
@easwars I have made the changes to match the final a88 watcher interface Everything else is working fine but looks like this |
Ah, look like it was failing because of calling ResourceError when channel not found in authority.go which was calling ListenerWrapper onLDSResourceError which was switching serving mode. Seems like that's unexpected behavior? Calling AmbientError fixed it. Is it correct to consider channel not found to be AmbientError while registering the watcher? |
Actually, I thought over it and right thing to do here is handle this case in listener wrapper. Like previously done, we handle lds error only in case of resource-not-found. Let me know if that's okay. |
If we start a watch for a resource and the authority of that resource is not found in the bootstrap config, then the right behavior is to report a non-ambient error to the watcher. In general, we should never call OnAmbientError() for a watcher before we call OnResourceChanged(). If we get a transient error before we've seen a valid resource, then the transient error should be reported to OnResourceChanged() instead of OnAmbientError(). |
yeah handled it in ListenerWrapper. Still calling ResourceError for channel not found |
I haven't looked at the code, but that seems wrong to me. There should not need to be any special handling for individual resource types -- the logic for this in XdsClient should be completely resource-type-agnostic. The only thing that should be handled in individual resource types are things that are different from other resource types, like determining what resource type to pass to the watcher. |
This error for channel not found while registering watch https://github.com/grpc/grpc-go/pull/7977/files#diff-b4a775482793516a8ed23bd6a58a805d316f5abb6468580691c2b0556e22606aR619 was recently added. Listener wrapper for grpc xds server had a logic to stop serving only if error is of type ResourceNotFound so i just kept that instead of stopping always https://github.com/grpc/grpc-go/pull/7977/files#diff-e4706c72ae912399b7f8ee6f04cec2374ef7a7679b12358f201ddb0b45e34146R453-R457. If we don't do that |
I don't think that check is correct -- it should be removed. If the XdsClient delivers an error, it should always be passed through to the watcher. That's an inherent part of the API change we're making in this PR. The errors that we were previously ignoring in |
Yes, I agree that we should not have checks for "resource-not-found" errors in the individual watchers. In fact, I'm thinking we might not even need those error types like |
@@ -813,7 +813,7 @@ func (s) TestResolverError(t *testing.T) { | |||
|
|||
// Grab the wrapped connection from the listener wrapper. This will be used | |||
// to verify the connection is closed. | |||
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout) | |||
ctx, cancel := context.WithTimeout(context.Background(), defaultTestTimeout*100000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please undo this change.
Just FYI: In a large PR, things like this can be easily missed during the review process. Please be a little more mindful of such changes going forward.
"server_uri": %q, | ||
"channel_creds": [{"type": "insecure"}] | ||
}]`, mgmtServer.Address)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it was indented properly earlier and this change is messing up the indenting.
// When used with a go-control-plane management server that continuously | ||
// resends resources which are NACKed by the xDS client, using a `Replace()` | ||
// here and in OnResourceDoesNotExist() simplifies tests which will have | ||
// here and in ResourceError() simplifies tests which will have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: here and in AmbientError
// When used with a go-control-plane management server that continuously | ||
// resends resources which are NACKed by the xDS client, using a `Replace()` | ||
// here and in OnResourceDoesNotExist() simplifies tests which will have | ||
// here and in ResourceError() simplifies tests which will have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
// When used with a go-control-plane management server that continuously | ||
// resends resources which are NACKed by the xDS client, using a `Replace()` | ||
// here and in OnResourceDoesNotExist() simplifies tests which will have | ||
// here and in ResourceError() simplifies tests which will have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
// When used with a go-control-plane management server that continuously | ||
// resends resources which are NACKed by the xDS client, using a `Replace()` | ||
// here and in OnResourceDoesNotExist() simplifies tests which will have | ||
// here and in ResourceError() simplifies tests which will have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
// When used with a go-control-plane management server that continuously | ||
// resends resources which are NACKed by the xDS client, using a `Replace()` | ||
// here and in OnResourceDoesNotExist() simplifies tests which will have | ||
// here and in ResourceError() simplifies tests which will have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
@@ -161,7 +161,7 @@ func (s) TestHandleListenerResponseFromManagementServer(t *testing.T) { | |||
Value: []byte{1, 2, 3, 4}, | |||
}}, | |||
}, | |||
wantErr: "Listener not found in received response", | |||
wantErr: fmt.Sprintf("xds: resource \"%v\" of type \"ListenerResource\" does not exist", resourceName1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using %q
instead of having to do \"%v\"
make this easier to read?
Same with \"ListenerResource\"
. You could do a %q
in the format string and include "ListenerResource" as one of the arguments to fmt.Sprintf
.
// of the update has completed. For example, if processing a resource requires | ||
// watching new resources, those watches should be completed before done is | ||
// called, which can happen after the ResourceWatcher method has returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please simplify/clarify/rephrase this sentence:
// For example, if processing a resource requires
// watching new resources, those watches should be completed before done is
// called, which can happen after the ResourceWatcher method has returned.
Specifically, what do you mean by "those watches should be completed before done is called"? I believe, just registration of the watch, right? So, that needs to be clarified.
And I don't understand what "which can happen after the ResourceWatcher method has returned" this means.
Sorry, I don't have the cycles to debug this failure. But if you can dig a little deeper and get more info on the sequence of events leading to the failure, and why exactly the flake is happening, I might be able to help better. Once we have that information, we will be able to decide if we need to rewrite the test or if there is something wrong we are doing in the xDS client. Thanks. |
This is the first part of implementing gRFC A88 (grpc/proposal#466).
This introduces the new watcher API but does not change any of the existing behavior. This table summarizes the API changes and behavior for each case:
OnResourceDoesNotExist()
OnResourceChanged(NOT_FOUND)
OnResourceDoesNotExist()
OnResourceChanged(NOT_FOUND)
OnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnError()
OnResourceChanged(status)
if resource NOT already cached;OnAmbientError(status)
otherwiseOnUpdate(resource)
OnResourceChanged(resource)
RELEASE NOTES: