Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Subscription gNMI Extension #169

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 32 additions & 4 deletions proto/gnmi_ext/gnmi_ext.proto
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,12 @@ option go_package = "github.com/openconfig/gnmi/proto/gnmi_ext";
// The Extension message contains a single gNMI extension.
message Extension {
oneof ext {
RegisteredExtension registered_ext = 1; // A registered extension.
RegisteredExtension registered_ext = 1; // A registered extension.
// Well known extensions.
MasterArbitration master_arbitration = 2; // Master arbitration extension.
History history = 3; // History extension.
Commit commit = 4; // Commit confirmed extension.
MasterArbitration master_arbitration = 2; // Master arbitration extension.
History history = 3; // History extension.
Commit commit = 4; // Commit confirmed extension.
ConfigSubscription config_subscription = 6; // Config Subscription extension.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @robshakir @dplore, all

We would like to resume the discussions around this extension proposal.

After extensive internal use of this extension we find it particularly fit for a common problem in the network management domain -- configuration deviation/drift detection and reconciliation. We believe it might be useful for operators and other gNMI users.

To summarize, ConfigSubscription extension provides the following capabilities that contribute to the config drift detection use case:

  1. An ability to subscribe to configuration only data
    Contrary to Get RPC, Subscribe RPC does not have a toggle to select which category of data nodes to subscribe to (state vs config vs both)
    This is important for gNMI applications that are based on yang models without strict state/config separation of data
  2. A way to identify commit boundaries when subscribed to configuration data.
    The sync_done message allows a client to identify when all changes for a particular commit have been fully sent by the gNMI server. This ensures it is safe to trigger deviation computation without needing to perform a full resync.

These two new capabilities added by this extension to gNMI could (among other use cases) enable a gNMI client to make use of a robust and reactive configuration drift detection/reconciliation mechanism.
When used together, the configuration mananagement system can make use of the streaming nature of gNMI Subscribe RPC to get a rapid deviation detection.

First, it benefits from a lightweight subscription mode when only changes to the configuration datastore elements are being sent as notifications.

And then, by making use of the sync_done message in this extension the system can quickly react to the changes made to the configuration data it is subscribed to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case makes sense, we can probably stop selling it and work out the details :-)

The subscription to config only data is fine -- I would be OK with adding something here to SubscriptionList that says the the data type similarly to Get. We can just make the default state of this as BOTH, which would give a means to be able to be backwards compatible should this field not be specified.

}
}

Expand Down Expand Up @@ -131,3 +132,30 @@ message CommitConfirm {}
// CommitCancel is used to cancel an on-going commit. It hold additional
// parameter requried for cancel action.
message CommitCancel {}

// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
oneof action {
// ConfigSubscriptionStart is sent by the server in the SubscribeRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in the use cases doesn't seem to correspond to this, and it seems like there's overloading of these fields.

a) the client sends something from the target with ConfigSubscriptionStart set. I suggest this is either moved to the SubscriptionList or we specifically call this something that is clearer that we are filtering the contents to config only nodes.

b) I'd suggest making this a separate extension -- or making this oneof separate client->target and target->client fields.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo in the comment, it should be ConfigSubscriptionStart is sent by the client in the SubscribeRequest

It's true that receiving ConfigSubscriptionStart really means two things for the target:

  1. Send the content of config only nodes.
  2. Send a ConfigSubscriptionSyncDone (for on-change subscriptions) after the notifications of each commit are streamed.

If we add a field (called type like GetRequest ?) under SubscriptionList that can be set to: BOTH, CONFIG or STATE, should the presence of this extension imply that type is set to CONFIG ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, if we add a field to the SubscriptionList then we should consider receiving this extension without the type being set to CONFIG is an error. Having APIs where something optional (an extension) implies the value of something required is error prone and creates additional testing overhead [0] -- whereas if we're explicit, it's clear what is meant.

[0]: for example, if this extension implies CONFIG then you will end up with the case whereby if I sent something with STATE to a target that doesn't support the extension (and therefore ignores it, validly) it will return a different set of paths to a target that does understand the extension and has implicitly meant that I assumed CONFIG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karimra do you want to add a extension field to the SubscriptionList indicating config vs. state? Seems this will give the client an explicit indication that the target is or is not honoring the ConfigSubscriptionStart.

Another way to handle this might be to implement a way to get the supported extensions? Then a client can programmatically know if ConfigSubscription is supported.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was planning to do that in a separate PR if that's ok.
My understanding is that it won't be an extension but an additional field called type with 3 possible values BOTH, CONFIG and STATE.

ConfigSubscriptionStart start = 1;
// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
ConfigSubscriptionSyncDone sync_done = 2;
}
}
Copy link
Contributor Author

@hellt hellt Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @robshakir
let me splice the two questions you had with a thread mode:

On the base "subscribe to config" idea:

  1. To be clear, when you refer to 'config' nodes here -- then they specifically map to the r/w nodes in a particular tree, I assume? i.e., in OpenConfig this is /x/y/config/....

Correct. We are referring to the config data as per RFC 7950 4.2.3. aka. the nodes with config=true yang statement.

  1. Is this the best way to signal that a client just wants config paths? We could add something to the base SubscriptionList here if this is a common requirement.

100% supportive of this proposal. We feel that this is a vital mgmt interface capability that is required for schemas that do not employ OpenConfig's config/state container composition.

If the openconfig/gnmi stakeholders share the same feeling then this is better to be handled in the standard, and not the extension, IMHO.

  1. Is getting "config" nodes something that can be achieved solely through path wildcarding :-) [for some schemas it is :-)]? In the case that it is not achievable by the schema, what's the target-side complexity to make this filtering happen?

I am not sure I see how path wildcards can serve as a config/state data elements selector. Don't remember seeing anything of that sorts in the path specification document.

I can imagine how custom wildcarding tokens might be used (/a/b/c/+) but this feels less transparent.
If there would be the proposed path wildcard to denote that the subtree targeted by the path must only return config values, we could work with that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points:

  • gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.
  • The use of a wildcard depends on the schema supporting such a model -- e.g., /.../config in OpenConfig would achieve this, or a schema that has /config/... would also have such filtering.
  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO. I also want to do a deep-dive into how implementable this is in schema-unaware collectors, which continue to be important in the ecosystem given that there are many augments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.

I agree. Generally speaking, we were referring to the "configuration data nodes" in a schema-language agnostic definition. To project this on a de-facto standard data modeling language - yang -, I used it as an example of what this would mean for a target employing this schema language.

  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO.

Having a method similar to the GET request would also work for us. We went the extension way to have the changes less intrusive to the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, while I agree we should avoid coupling yang directives into gnmi, it's valid that the concept of 'config' and 'state' are not necessarily yang specific. Also, it seems impractical to require (non-OC) schemas to be modified to include the word 'config' in them.

I am ok with adding this.


// ConfigSubscriptionStart is used to indicate to a target that for a given set
// of paths in the SubscribeRequest, the client wishes to receive updates
// for the configuration schema nodes only.
message ConfigSubscriptionStart {}

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {
// ID of a commit confirm operation as assigned by the client
// see Commit Confirm extension for more details.
string commit_confirm_id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a oneof between the two values?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two values can both be present in the same message. The commit_confirm_id is present if the commit is a commit confirmed one.
The server_commit_id is an internal ID to the target. Implementations can leave it empty if no ID is available or simply set it to the commit timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, here is a demo that shows how both commit_confirm_id and server_commit_id are both present when commit with confirm is executed

https://youtu.be/MO7_ZYbRT3k?si=ukWgC4H73D8_Nqgy&t=723

// ID of a commit as might be assigned by the server
// when registering a commit operation.
string server_commit_id = 2;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robshakir here I would like to address the 2nd part of the original question set.

I'm not quite clear on the need for the "sync" signalling -- and wonder whether this is actually achievable across implementations such that a management system can rely on it. The semantics you're describing are essentially (AIUI) "all intended state that someone asked me to consume has been written to an intended data store, and I have now told you about all the updated paths". Is that correct? In the client implementation, what's the tradeoff between this and the client knowing "I just updated all paths X, and should wait until the values of the intended state converge to know that it has been consumed"?

The need for signaling stems from the fact that the Subscribe RPC lacks a mechanism to mark that a certain snapshot of the data tree has been sent.

Verbatim from the spec:

That is, it is not possible for a client requesting a subscription to assume that the set of update messages received represent a snapshot of the data tree at a particular point in time.

Because configuration management tasks essentially move the system from one state to another using a configuration commit operation, the management system (NMS) can benefit from a long-running subscription stream that would have such markers for a NMS to reliably tell "oh, ok, now all the configuration updates of the data tree I am subscribed to have been received, I see the sync marker".

An important path of the sync message is the embedded commit identifiers. By saying that we would just wait till we see the same paths streamed back without a sync message we would loose that correlation between the intended config changes sent and received and wouldn't be able to reconcile the intended change with the data streamed back if necessary in a reactive manner.

E.g. what if a third party actor changed the same data node in a different commit and the NMS sees that update and assumes it belongs to the change set it sent?

The large part of the sync message usefulness is rooted in the reported commit IDs.
They allow to reconcile configuration by comparing intended and actual config sets

How does this signalling mechanism fare in the case that we do not have the "happy path" -- e.g., there are coalesced updates -- such that config change A makes /x/y/config/a = true, and change B makes /x/y/config/a = false and the updates for /x/y/config/a are coalesced due to congestion/a slow client? Does the target send its synchronisation message for config A after the /x/y/config/a = false update is sent? (I think here the discussion is a little around whether there's actually any way to link to provenance of a particular update, and what coupling you assume between telemetry and config processes on the target.)

The target sends a sync message whenever the change set is written/committed. For systems that leverage commit operations this means whenever a change set is committed, the updates are first streamed to the NMS and after the target sent all the updates for the associated change set it send the sync message with commit identifiers.

Since commit IDs are present in the sync message, the NMS can identify changes it requested vs changes made by other clients (based on the provided commit identifiers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. The delineation of a particular change seems a reasonable use case. However, let's think about a few cases we will need to deal with:

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

b) I'm not sure that sync here sits cleanly with sync_response -- if we have a subscription for config nodes that is received by a target that does not understand this extension, it will stream all the nodes that correspond to the path, and then send sync_response when at least one update for each path has been transmitted. In this case, the target sets sync_response. I expect that this same logic is maintained even with this extension (hopefully we agree here).

If we then add this extension, I presume the initial Subscription with its SubscriptionList (which is now immutable) comes with this extension attached. AIUI, the subscription is not saying "I want updates for this particular commit" it's rather just saying "please just send me configurable paths that were updated". In this scenario -- I expect that the initial set of updates that are sent is just all the configurable paths with no commit_confirm_id value specified -- is that correct? [this ensures that the target does not need to have a blame style implementation where it stores the provenance of every leaf].

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

If the answer to b) is correct, then the interaction with a) needs to be considered -- what happens the target has "finished" applying an update for commit ID = 1, but in the meantime also made a change for commit ID = 2. It's probably easy to say that in the case that commit ID = 2 is a subset of commit ID = 1, then receiving this extension means that commit ID = 1 has been applied and commit ID = 2 has been applied. The complex case is where they are for the same leaf, and we coalesce the two together. We should not prevent coalescing of these updates since it will mean that there are changes to all gNMI servers to treat this as a new "special" field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

Ideally these values shouldn't be coalesced, since a collector would need to know that a commit "happened" and where its boundary is. But if this cannot be guaranteed by some implementations, we can say that if at least one value from a commit has been coalesced then its corresponding ConfigSubscriptionSyncDone should either:

  • Not be sent at all.
  • Be sent after the notifications of the next commit, i.e back to back with the next commit SyncDone

b) I'm not sure that sync here sits cleanly with sync_response -- if we have a subscription for config nodes that is received by a target that does not understand this extension, it will stream all the nodes that correspond to the path, and then send sync_response when at least one update for each path has been transmitted. In this case, the target sets sync_response. I expect that this same logic is maintained even with this extension (hopefully we agree here).

Yes.

If we then add this extension, I presume the initial Subscription with its SubscriptionList (which is now immutable) comes with this extension attached. AIUI, the subscription is not saying "I want updates for this particular commit" it's rather just saying "please just send me configurable paths that were updated". In this scenario -- I expect that the initial set of updates that are sent is just all the configurable paths with no commit_confirm_id value specified -- is that correct? [this ensures that the target does not need to have a blame style implementation where it stores the provenance of every leaf].

Correct. The initial values are streamed to the client just like a regular subscription up until the sync_response: true message. After that, on-change notifications resulting from a specific commit are streamed followed by the extension carrying the optional commit_confirm_id and/or server_commit_id.

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

The extension with a ConfigSubscriptionSyncDone action is not sent along with the last update; instead, it is sent as part of a separate SubscribeRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

Ideally these values shouldn't be coalesced, since a collector would need to know that a commit "happened" and where its boundary is.

This would be a core gNMI specification update -- we specifically allow for this coaelscing as it means that flapping data values, and slow peers have a clean congestion management approach. If there is a case where you want a specific notification that has to be reliably delivered (i.e., that specific update) then it needs to have a unique path, this itself has some scaling considerations that need to be taken into account. I'm not supportive of changing this core element of the specification.

But if this cannot be guaranteed by some implementations, we can say that if at least one value from a commit has been coalesced then its corresponding ConfigSubscriptionSyncDone should either:

  • Not be sent at all.
  • Be sent after the notifications of the next commit, i.e back to back with the next commit SyncDone

It might be non-deterministic as to what happens -- it depends where we are specifying that this extension is sent. Is it attached to the last leaf to be applied, or is this envisaged to be a separate SubscribeResponse that is sent? If the latter, what is the payload? (nil contents could mean that this isn't sent or cached at a collector.)

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Correct. The initial values are streamed to the client just like a regular subscription up until the sync_response: true message. After that, on-change notifications resulting from a specific commit are streamed followed by the extension carrying the optional commit_confirm_id and/or server_commit_id.

Can we specify this somewhere? It's not clear today that the expectation is that this is applied to every SubscribeResponse that corresponds to a specific commit. There's some book-keeping that is needed here internally, right -- since you're saying that the system needs to understand the provenance of a change to any config leaf.

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

The extension with a ConfigSubscriptionSyncDone action is not sent along with the last update; instead, it is sent as part of a separate SubscribeRequest.

Again, please can we specify this if this is the preferred behaviour? If so, it seems like this would not be compatible with having a caching collector that works per-path -- since there's simply nothing to cache this SubscribeResponse against. Again, did you consider just having this part of the payload not part of the RPC?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be non-deterministic as to what happens -- it depends where we are specifying that this extension is sent. Is it attached to the last leaf to be applied, or is this envisaged to be a separate SubscribeResponse that is sent? If the latter, what is the payload? (nil contents could mean that this isn't sent or cached at a collector.)

We discussed including the extension with the last update, there is really nothing against it from our implementation point of view. It's just that the IDs are optional; not all commits are commitConfirmed and not all targets can send a server_commit_id.
We don't intend on storing the extension data on the controller side, otherwise we would include it as part of all the updates (and we would need a marker indicating the last update for a specific commit).
What we are interested on is knowing that the sync-ed notifications give a full view of a committed configuration at one point in time. For each received SyncDone the controller can (safely?) calculate deviations between its intents and the target's sync-ed config.

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Such a leaf tells that the commit was applied but it doesn't tell the controller that it received all the notifications related to that commit. Using an on-change subscription with a message signaling that a commit's notifications were streamed is a fast and lightweight way for a client to remain in sync with target config.

Can we specify this somewhere? It's not clear today that the expectation is that this is applied to every SubscribeResponse that corresponds to a specific commit. There's some book-keeping that is needed here internally, right -- since you're saying that the system needs to understand the provenance of a change to any config leaf.

Again, please can we specify this if this is the preferred behaviour? If so, it seems like this would not be compatible with having a caching collector that works per-path -- since there's simply nothing to cache this SubscribeResponse against. Again, did you consider just having this part of the payload not part of the RPC?

The preferred behavior is sending the extension in its own SubscribeRequest.
There is a hint to that in the proto comments, but I agree that we should write a proper specification for this extension.

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {

Again, did you consider just having this part of the payload not part of the RPC?

Yes, as said above nothing against this. But if the goal is to store the commit IDs with the updates, we need to keep in mind that there might not be commit IDs sent in the extension and we would need to include the extension in all the updates with a marker for the last one.

Copy link
Contributor

@robshakir robshakir Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Such a leaf tells that the commit was applied but it doesn't tell the controller that it received all the notifications related to that commit. Using an on-change subscription with a message signaling that a commit's notifications were streamed is a fast and lightweight way for a client to remain in sync with target config.

From an implementation perspective -- either the gNMI SubscribeResponse you generate or this update needs to be sent in order, isn't it about the same? It seems like you need something where a producer can say "I'm done with creating these updates <eom>", so that the correct SubscribeResponse can be generated. I'm interested why this signal (which also must be in order) couldn't just be an update itself. (The latter being a cached path would mean that one would not necessarily need a subscription running simultaneously, and would work through caches.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the above alternative implies that:

  1. There is a commit ID.
  2. The leaf /system/commits/commit[id=foo]/state/applied is set after all updates have been streamed (not just created).
  3. The commit ID is stored in the controller's cache with each leaf.

For 1) Not all commits are commitConfirmed and not all implementations provide a server commit ID.
For 2) Seems hard to achieve across all existing subscriptions. We also didn't want to assume all gNMI implementation have such a leaf (gNMI being schema agnostic)
For 3) We don't store the commit ID when we receive it, we create a candidate from the cache on the controller side when we receive the syncDone extension. This guarantees that the candidate includes all the data that a certain commit changed. The config deviations are then calculated against the candidate.

We didn't want to assume too much on how gNMI implementations are done out there to keep the extension as vendor agnostic as possible.
Sending a signal over a specific subscription after all notifications are sent (1) does not need a commit ID to exist, (2) does not assume that there is a leaf in the schema that tracks the state of applied commits and is only sent after related notifications are streamed and (3) does not require the controller to store the commit ID per leaf to be able to calculate deviations.