Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xDS Kubernetes service #980

Merged
merged 15 commits into from
Aug 16, 2024
Merged

Add xDS Kubernetes service #980

merged 15 commits into from
Aug 16, 2024

Conversation

minwoox
Copy link
Member

@minwoox minwoox commented Jul 15, 2024

Add xDS Kubernetes Service

Motivation:
To support integration with Kubernetes, we need APIs for creating, updating, and deleting Kubernetes watchers that retrieve pod endpoint information from the Kubernetes control plane.

Modifications:

  • Added xds_kubernetes.proto files that define gRPC and HTTP APIs for xDS resources.
  • Introduced XdsKubernetesService:
    • The createWatcherRequest stores watcher information at /k8s/watchers/{watcher_id}.json.
    • To validate the correctness of the watcher information, a KubernetesEndpointGroup is created and used.
      • The current implementation uses KubernetesEndpointGroup for simplicity, with plans to develop a custom implementation for improved error handling.
  • Added XdsKubernetesEndpointFetchingPlugin and XdsKubernetesEndpointFetchingService:
    • The XdsKubernetesEndpointFetchingService, run by the ZooKeeper leader, creates ClusterLoadAssignment based on watcher information.
    • The resulting ClusterLoadAssignment is stored at /k8s/endpoints/{watcher_id}.json and replicated via ZooKeeper log replay.
    • The cluster name of the ClusterLoadAssignment is groups/{group}/k8s/clusters/{watcher_id}.json, and it is served as an xDS endpoint resource by the ControlPlaneService.

Result:

  • You can now add a watcher information that retrieves the pod's endpoint information from the Kubernetes control plane.

@minwoox minwoox added this to the 0.68.0 milestone Jul 15, 2024
@minwoox minwoox modified the milestones: 0.68.0, 0.69.0 Jul 23, 2024
@minwoox minwoox changed the title [WIP] Add xDS Kubernetes service Add xDS Kubernetes service Aug 12, 2024
@minwoox
Copy link
Member Author

minwoox commented Aug 12, 2024

This is ready. PTAL. 🙇

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some questions but looks good to me 👍 👍 👍

Comment on lines 102 to 106
kubernetesWatchers.values().forEach(map -> {
map.values().forEach(KubernetesEndpointGroup::closeAsync);
map.clear();
});
kubernetesWatchers.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Would this be called from plugins-for-* threads here, whereas map updates will be called from k8s-plugin-executor? i.e. the map doesn't seem to be thread-safe

I believe the stop should also be done cleanly since it is not uncommon for a node to lose leadership

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, let me fix that. 😓

return;
}
executor().execute(
() -> pushK8sEndpoints(kubernetesEndpointGroup, groupName, endpoints, watcherName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
() -> pushK8sEndpoints(kubernetesEndpointGroup, groupName, endpoints, watcherName));
() -> pushK8sEndpoints(kubernetesEndpointGroup, groupName, endpoints, watcherName));

if (kubernetesEndpointGroup.isClosing()) {
return;
}
final LocalityLbEndpoints.Builder localityLbEndpointsBuilder = LocalityLbEndpoints.newBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note) I understood that Locality is not set for this iteration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. 👍

Comment on lines 156 to 157
executor().execute(
() -> pushK8sEndpoints(kubernetesEndpointGroup, groupName, endpoints, watcherName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; there is probably a low chance this happens, but I understood that if listener invocation and delete occurs in quick succession, then it is possible that the endpoints remain even though it should be deleted.

I think it's fine to leave this as a known issue, but just want to check if this scenario is possible.

e.g.

  1. handleXdsResource is called
  2. onFileRemoved is called immediately
  3. the pushK8sEndpoints invoked by step 1 is called

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because they are all executed by the executor().
1 handleXdsResource is called by the executor
2 onFileRemoved is called immediately
3 Thus, the created KubernetesEndpointGroup is closing
4 the pushK8sEndpoints invoked by step 1 is called by the executor but it doesn't push a commit because this condition:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that line 😅 Thanks for the pointer 👍

.withTrustCerts(kubernetesConfig.getTrustCerts());

if (!isNullOrEmpty(kubernetesConfig.getOauthToken())) {
configBuilder.withOauthToken(kubernetesConfig.getOauthToken());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Am I understanding correctly that as of now users can access this value via APIs? I'm wondering if we're going need to prioritize access control for xDS related APIs.

Copy link
Member Author

@minwoox minwoox Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users cannot see this value. Only creating, updating and deleting (without seeing the value) is possible at the moment.

@Override
public void init(PluginInitContext pluginInitContext) {
final InternalProjectInitializer projectInitializer = pluginInitContext.internalProjectInitializer();
pluginInitContext.commandExecutor();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could be removed?


public static final String DEFAULT_GROUP = "default_group";
@Nullable
private volatile ControlPlaneService controlPlaneService;

public static final long BACKOFF_SECONDS = 30;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unsed.

@Override
public CompletionStage<Void> start(PluginContext context) {
return UnmodifiableFuture.completedFuture(null);
}

@Override
public CompletionStage<Void> stop(PluginContext context) {
stop = true;
final ControlPlaneService controlPlaneService = this.controlPlaneService;
assert controlPlaneService != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess controlPlaneService could be null if projectInitializer.initialize(...) fails.

executorService = ExecutorServiceMetrics.monitor(
meterRegistry, Executors.newSingleThreadScheduledExecutor(
new DefaultThreadFactory("k8s-plugin-executor", true)), "k8sPluginExecutor");
executorService.execute(this::start);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait for the result of this::start? Otherwise, the exception raised in start won't be propagated to somewhere or logged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, yeah. We need to wait for that. 😉

private static final Pattern WATCHERS_PATTERN = Pattern.compile("(?<=/k8s)/watchers/");

private final CommandExecutor commandExecutor;
private final MeterRegistry meterRegistry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dead code?


service XdsKubernetesService {

rpc CreateWatcher(CreateWatcherRequest) returns (Watcher) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional) How about using service, which seems a simpler expression than watcher which is more abstract so we may need to explain what it means.

Meanwhile, we can simply say that service indicates Kubenetes Service, and this method can be used to create xDS resource for Kubenetes Service.
https://kubernetes.io/docs/concepts/services-networking/service/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a discussion and will use CreateServiceEndpointWatcher instead. 😉

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @minwoox! 🚀🚀

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 77.87611% with 100 lines in your changes missing coverage. Please review.

Project coverage is 71.54%. Comparing base (c960313) to head (e9db3c6).
Report is 144 commits behind head on main.

Files Patch % Lines
...s/k8s/v1/XdsKubernetesEndpointFetchingService.java 68.86% 24 Missing and 9 partials ⚠️
...dogma/xds/internal/XdsResourceWatchingService.java 66.66% 25 Missing and 5 partials ⚠️
.../centraldogma/xds/k8s/v1/XdsKubernetesService.java 82.71% 8 Missing and 6 partials ⚠️
...ds/k8s/v1/XdsKubernetesEndpointFetchingPlugin.java 59.09% 8 Missing and 1 partial ⚠️
...traldogma/server/internal/ExecutorServiceUtil.java 40.00% 4 Missing and 2 partials ⚠️
...centraldogma/xds/internal/ControlPlaneService.java 94.04% 2 Missing and 3 partials ⚠️
.../centraldogma/xds/internal/ControlPlanePlugin.java 77.77% 1 Missing and 1 partial ⚠️
...orp/centraldogma/xds/group/v1/XdsGroupService.java 85.71% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #980      +/-   ##
============================================
+ Coverage     65.61%   71.54%   +5.92%     
- Complexity     3319     4113     +794     
============================================
  Files           355      402      +47     
  Lines         13865    16431    +2566     
  Branches       1498     1756     +258     
============================================
+ Hits           9098    11756    +2658     
+ Misses         3916     3642     -274     
- Partials        851     1033     +182     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ikhoon ikhoon merged commit 02701ae into line:main Aug 16, 2024
10 checks passed
@minwoox minwoox deleted the k8s_service branch August 16, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants