Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Subscription gNMI Extension #169

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hellt
Copy link
Contributor

@hellt hellt commented Feb 14, 2024

Problem Statement

Performing configuration management and handling configuration drift is one of the main features of a higher-level management system or orchestrator. The configuration management tasks are not concerned about the state data and focus on effective retrieval and push of the configuration values.

Thus, having a synchronized configuration view between the management system and the network elements is key to enabling robust and near real-time configuration management.
To enable this synchronization of configuration data, gNMI Subscribe RPC can be used. The bidirectional streaming nature of this RPC enables fast and reliable sync between the management system and the devices it manages.

Unfortunately, gNMI Subscribe RPC does not have an embedded mechanism to stream updates for the configuration values only as opposed to the Get RPC, which makes this RPC rather ineffective on YANG schemas that do not employ a separation of config and state elements by using distinct containers.

This proposal introduces the Config Subscription extension that allows clients to indicate to servers that they are interested in configuration values only.

Specification

A new ConfigSubscription extension is added to the extensions list and modeled as follows:

// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
  oneof action {
    // ConfigSubscriptionStart is sent by the server in the SubscribeRequest
    ConfigSubscriptionStart start = 1;
    // ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
    ConfigSubscriptionSyncDone sync_done = 2;
  }
}

// ConfigSubscriptionStart is used to indicate to a target that for a given set
// of paths in the SubscribeRequest, the client wishes to receive updates
// for the configuration schema nodes only.
message ConfigSubscriptionStart {}

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {
  // ID of a commit confirm operation as assigned by the client
  // see Commit Confirm extension for more details.
  string commit_confirm_id = 1;
  // ID of a commit as might be assigned by the server
  // when registering a commit operation.
  string server_commit_id = 2;
}

The ConfigSubscription extension message is meant to be sent in SubscribeRequest message with the action of ConfigSubscriptionStart and in SubscribeResponse message with the action of ConfigSubscriptionSyncDone.

ConfigSubscriptionStart

The ConfigSubscription message has a oneof action field that is used to decouple request and response messages. When a client wants to initiate a config subscription, it sends a SubscribeRequest message with the ConfigSubscriptionStart action.

ConfigSubscriptionSyncDone

The server sends the ConfigSubscriptionSyncDone message in the SubscribeResponse message after all the updates for the configuration schema nodes have been sent. This message indicates to the client that the server has sent all the updates for the configuration schema nodes and the client can now start processing the updates knowing that it received the full configuration set.

With the commit_confirm_id and/or server_commit_id fields, the ConfigSubscriptionSyncDone clearly sets the boundary of the configuration changes of a given commit operation. This allows a management system to

  • receive the full scope of the configuration changes
  • correlate these changes with the commit operation it performed

The ConfigSubscriptionSyncDone message has two fields:

  • commit_confirm_id - ID of a commit confirm operation as assigned by the client. This field is optional and correlates the ConfigSubscriptionSyncDone message with the CommitConfirm extension. When the client uses Commit Confirm extension and assigns an ID to the commit confirm operation, the server MUST return this ID in the ConfigSubscriptionSyncDone message to correlate the configuration stream with the commit ID.
  • server_commit_id - ID of a commit that might be assigned by the server when registering a commit operation. This field is optional and is used to correlate the ConfigSubscriptionSyncDone message with the internal commit ID that might be assigned by the Network OS when registering the commit.

Workflow

Scenario 1. Configuration changes without Commit Confirm

In this scenario, the following sequence of events happens:

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and without the CommitConfirm extension.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.

Scenario 2. Configuration changes with Commit Confirm

  1. The client subscribes to the path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the client sends the commit confirm message and is processed by the server, the latter does not send any extra SubscribeResponse messages with the ConfigSubscriptionSyncDone message.

Scenario 3. Configuration changes with Commit Confirm and rollback/cancellation

  1. The client subscribes to path P1 with the ConfigSubscription extension present with the action ConfigSubscriptionStart.
  2. The server processes the subscription request as usual but will only send updates for the configuration schema nodes under the path P1.
  3. The client sends a Set RPC with the configuration changes to the path P1 and with the CommitConfirm extension present.
  4. The server processes the Set RPC as usual and sends the updates for the configuration schema nodes under the path P1.
  5. As all the configuration updates are sent, the server sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.
  6. When the commit confirmed rollback timer expires or a commit cancel message is sent, the server
    1. rolls back the configuration changes as per the Commit Confirm extension specification
    2. sends the new configuration updates for the path P1 as the configuration has changed/reverted
    3. sends the ConfigSubscriptionSyncDone message to the client in a SubscribeResponse message.

@robshakir
Copy link
Contributor

Thanks for the contribution.

It seems like there are two separate proposals here (as I see it, YMMV). The first is to allow there to be a way to filter to "config" nodes in Subscribe, the second is a way to show some "sync" state within a particular subscription. The latter is important because today we only have one idea of "sync" in Subscribe, which is about whether the target has actually updated all paths matching the subscription, and we are now in some steady state.

Based on these two, I have two sets of questions.

On the base "subscribe to config" idea:

  • To be clear, when you refer to 'config' nodes here -- then they specifically map to the r/w nodes in a particular tree, I assume? i.e., in OpenConfig this is /x/y/config/....
  • Is this the best way to signal that a client just wants config paths? We could add something to the base SubscriptionList here if this is a common requirement.
  • Is getting "config" nodes something that can be achieved solely through path wildcarding :-) [for some schemas it is :-)]? In the case that it is not achievable by the schema, what's the target-side complexity to make this filtering happen?

On the "synchronisation idea":

  • I'm not quite clear on the need for the "sync" signalling -- and wonder whether this is actually achievable across implementations such that a management system can rely on it. The semantics you're describing are essentially (AIUI) "all intended state that someone asked me to consume has been written to an intended data store, and I have now told you about all the updated paths". Is that correct? In the client implementation, what's the tradeoff between this and the client knowing "I just updated all paths X, and should wait until the values of the intended state converge to know that it has been consumed"?
  • How does this signalling mechanism fare in the case that we do not have the "happy path" -- e.g., there are coalesced updates -- such that config change A makes /x/y/config/a = true, and change B makes /x/y/config/a = false and the updates for /x/y/config/a are coalesced due to congestion/a slow client? Does the target send its synchronisation message for config A after the /x/y/config/a = false update is sent? (I think here the discussion is a little around whether there's actually any way to link to provenance of a particular update, and what coupling you assume between telemetry and config processes on the target.)

It feels useful to me to understand a bit about what use cases a theoretical system using this approach does w.r.t a network operation.

Comment on lines 136 to 145
// ConfigSubscription extension allows clients to subscribe to configuration
// schema nodes only.
message ConfigSubscription {
oneof action {
// ConfigSubscriptionStart is sent by the server in the SubscribeRequest
ConfigSubscriptionStart start = 1;
// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
ConfigSubscriptionSyncDone sync_done = 2;
}
}
Copy link
Contributor Author

@hellt hellt Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @robshakir
let me splice the two questions you had with a thread mode:

On the base "subscribe to config" idea:

  1. To be clear, when you refer to 'config' nodes here -- then they specifically map to the r/w nodes in a particular tree, I assume? i.e., in OpenConfig this is /x/y/config/....

Correct. We are referring to the config data as per RFC 7950 4.2.3. aka. the nodes with config=true yang statement.

  1. Is this the best way to signal that a client just wants config paths? We could add something to the base SubscriptionList here if this is a common requirement.

100% supportive of this proposal. We feel that this is a vital mgmt interface capability that is required for schemas that do not employ OpenConfig's config/state container composition.

If the openconfig/gnmi stakeholders share the same feeling then this is better to be handled in the standard, and not the extension, IMHO.

  1. Is getting "config" nodes something that can be achieved solely through path wildcarding :-) [for some schemas it is :-)]? In the case that it is not achievable by the schema, what's the target-side complexity to make this filtering happen?

I am not sure I see how path wildcards can serve as a config/state data elements selector. Don't remember seeing anything of that sorts in the path specification document.

I can imagine how custom wildcarding tokens might be used (/a/b/c/+) but this feels less transparent.
If there would be the proposed path wildcard to denote that the subtree targeted by the path must only return config values, we could work with that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points:

  • gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.
  • The use of a wildcard depends on the schema supporting such a model -- e.g., /.../config in OpenConfig would achieve this, or a schema that has /config/... would also have such filtering.
  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO. I also want to do a deep-dive into how implementable this is in schema-unaware collectors, which continue to be important in the ecosystem given that there are many augments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gNMI is agnostic to how data is modelled -- so I think a definition that is independent of a YANG-modelled schema would be good. For YANG, of course using config=true is reasonable, but let's strive to have a definition here that is agnostic to modelling language to ensure forwards compatibility, and compatibility with non-YANG-modelled origins.

I agree. Generally speaking, we were referring to the "configuration data nodes" in a schema-language agnostic definition. To project this on a de-facto standard data modeling language - yang -, I used it as an example of what this would mean for a target employing this schema language.

  • I'm OK (but we need to discuss across different folks) in thinking about adding something top-level here -- but it would need to be the simplest implementation here, similar to the specification for GET IMHO.

Having a method similar to the GET request would also work for us. We went the extension way to have the changes less intrusive to the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, while I agree we should avoid coupling yang directives into gnmi, it's valid that the concept of 'config' and 'state' are not necessarily yang specific. Also, it seems impractical to require (non-OC) schemas to be modified to include the word 'config' in them.

I am ok with adding this.

Comment on lines 147 to 161
// ConfigSubscriptionStart is used to indicate to a target that for a given set
// of paths in the SubscribeRequest, the client wishes to receive updates
// for the configuration schema nodes only.
message ConfigSubscriptionStart {}

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {
// ID of a commit confirm operation as assigned by the client
// see Commit Confirm extension for more details.
string commit_confirm_id = 1;
// ID of a commit as might be assigned by the server
// when registering a commit operation.
string server_commit_id = 2;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robshakir here I would like to address the 2nd part of the original question set.

I'm not quite clear on the need for the "sync" signalling -- and wonder whether this is actually achievable across implementations such that a management system can rely on it. The semantics you're describing are essentially (AIUI) "all intended state that someone asked me to consume has been written to an intended data store, and I have now told you about all the updated paths". Is that correct? In the client implementation, what's the tradeoff between this and the client knowing "I just updated all paths X, and should wait until the values of the intended state converge to know that it has been consumed"?

The need for signaling stems from the fact that the Subscribe RPC lacks a mechanism to mark that a certain snapshot of the data tree has been sent.

Verbatim from the spec:

That is, it is not possible for a client requesting a subscription to assume that the set of update messages received represent a snapshot of the data tree at a particular point in time.

Because configuration management tasks essentially move the system from one state to another using a configuration commit operation, the management system (NMS) can benefit from a long-running subscription stream that would have such markers for a NMS to reliably tell "oh, ok, now all the configuration updates of the data tree I am subscribed to have been received, I see the sync marker".

An important path of the sync message is the embedded commit identifiers. By saying that we would just wait till we see the same paths streamed back without a sync message we would loose that correlation between the intended config changes sent and received and wouldn't be able to reconcile the intended change with the data streamed back if necessary in a reactive manner.

E.g. what if a third party actor changed the same data node in a different commit and the NMS sees that update and assumes it belongs to the change set it sent?

The large part of the sync message usefulness is rooted in the reported commit IDs.
They allow to reconcile configuration by comparing intended and actual config sets

How does this signalling mechanism fare in the case that we do not have the "happy path" -- e.g., there are coalesced updates -- such that config change A makes /x/y/config/a = true, and change B makes /x/y/config/a = false and the updates for /x/y/config/a are coalesced due to congestion/a slow client? Does the target send its synchronisation message for config A after the /x/y/config/a = false update is sent? (I think here the discussion is a little around whether there's actually any way to link to provenance of a particular update, and what coupling you assume between telemetry and config processes on the target.)

The target sends a sync message whenever the change set is written/committed. For systems that leverage commit operations this means whenever a change set is committed, the updates are first streamed to the NMS and after the target sent all the updates for the associated change set it send the sync message with commit identifiers.

Since commit IDs are present in the sync message, the NMS can identify changes it requested vs changes made by other clients (based on the provided commit identifiers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. The delineation of a particular change seems a reasonable use case. However, let's think about a few cases we will need to deal with:

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

b) I'm not sure that sync here sits cleanly with sync_response -- if we have a subscription for config nodes that is received by a target that does not understand this extension, it will stream all the nodes that correspond to the path, and then send sync_response when at least one update for each path has been transmitted. In this case, the target sets sync_response. I expect that this same logic is maintained even with this extension (hopefully we agree here).

If we then add this extension, I presume the initial Subscription with its SubscriptionList (which is now immutable) comes with this extension attached. AIUI, the subscription is not saying "I want updates for this particular commit" it's rather just saying "please just send me configurable paths that were updated". In this scenario -- I expect that the initial set of updates that are sent is just all the configurable paths with no commit_confirm_id value specified -- is that correct? [this ensures that the target does not need to have a blame style implementation where it stores the provenance of every leaf].

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

If the answer to b) is correct, then the interaction with a) needs to be considered -- what happens the target has "finished" applying an update for commit ID = 1, but in the meantime also made a change for commit ID = 2. It's probably easy to say that in the case that commit ID = 2 is a subset of commit ID = 1, then receiving this extension means that commit ID = 1 has been applied and commit ID = 2 has been applied. The complex case is where they are for the same leaf, and we coalesce the two together. We should not prevent coalescing of these updates since it will mean that there are changes to all gNMI servers to treat this as a new "special" field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

Ideally these values shouldn't be coalesced, since a collector would need to know that a commit "happened" and where its boundary is. But if this cannot be guaranteed by some implementations, we can say that if at least one value from a commit has been coalesced then its corresponding ConfigSubscriptionSyncDone should either:

  • Not be sent at all.
  • Be sent after the notifications of the next commit, i.e back to back with the next commit SyncDone

b) I'm not sure that sync here sits cleanly with sync_response -- if we have a subscription for config nodes that is received by a target that does not understand this extension, it will stream all the nodes that correspond to the path, and then send sync_response when at least one update for each path has been transmitted. In this case, the target sets sync_response. I expect that this same logic is maintained even with this extension (hopefully we agree here).

Yes.

If we then add this extension, I presume the initial Subscription with its SubscriptionList (which is now immutable) comes with this extension attached. AIUI, the subscription is not saying "I want updates for this particular commit" it's rather just saying "please just send me configurable paths that were updated". In this scenario -- I expect that the initial set of updates that are sent is just all the configurable paths with no commit_confirm_id value specified -- is that correct? [this ensures that the target does not need to have a blame style implementation where it stores the provenance of every leaf].

Correct. The initial values are streamed to the client just like a regular subscription up until the sync_response: true message. After that, on-change notifications resulting from a specific commit are streamed followed by the extension carrying the optional commit_confirm_id and/or server_commit_id.

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

The extension with a ConfigSubscriptionSyncDone action is not sent along with the last update; instead, it is sent as part of a separate SubscribeRequest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) gNMI allows for coalescing of updates -- so if we have some change to /a/b/c from some commit with ID 1, and /a/b/c is also changed from some commit with ID = 2. It is possible that the two updates for /a/b/c can be coalesced. This is an important aspect of congestion control in gNMI. Given that this is the case, what should the behaviour be here?

Ideally these values shouldn't be coalesced, since a collector would need to know that a commit "happened" and where its boundary is.

This would be a core gNMI specification update -- we specifically allow for this coaelscing as it means that flapping data values, and slow peers have a clean congestion management approach. If there is a case where you want a specific notification that has to be reliably delivered (i.e., that specific update) then it needs to have a unique path, this itself has some scaling considerations that need to be taken into account. I'm not supportive of changing this core element of the specification.

But if this cannot be guaranteed by some implementations, we can say that if at least one value from a commit has been coalesced then its corresponding ConfigSubscriptionSyncDone should either:

  • Not be sent at all.
  • Be sent after the notifications of the next commit, i.e back to back with the next commit SyncDone

It might be non-deterministic as to what happens -- it depends where we are specifying that this extension is sent. Is it attached to the last leaf to be applied, or is this envisaged to be a separate SubscribeResponse that is sent? If the latter, what is the payload? (nil contents could mean that this isn't sent or cached at a collector.)

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Correct. The initial values are streamed to the client just like a regular subscription up until the sync_response: true message. After that, on-change notifications resulting from a specific commit are streamed followed by the extension carrying the optional commit_confirm_id and/or server_commit_id.

Can we specify this somewhere? It's not clear today that the expectation is that this is applied to every SubscribeResponse that corresponds to a specific commit. There's some book-keeping that is needed here internally, right -- since you're saying that the system needs to understand the provenance of a change to any config leaf.

Following this initial set of updates, then a set of configurable values might change -- and the contract we are establishing here is that the target is able to say "I have finished applying intended state updates for a particular commit" by applying this extension for the last update. Is that correct?

The extension with a ConfigSubscriptionSyncDone action is not sent along with the last update; instead, it is sent as part of a separate SubscribeRequest.

Again, please can we specify this if this is the preferred behaviour? If so, it seems like this would not be compatible with having a caching collector that works per-path -- since there's simply nothing to cache this SubscribeResponse against. Again, did you consider just having this part of the payload not part of the RPC?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be non-deterministic as to what happens -- it depends where we are specifying that this extension is sent. Is it attached to the last leaf to be applied, or is this envisaged to be a separate SubscribeResponse that is sent? If the latter, what is the payload? (nil contents could mean that this isn't sent or cached at a collector.)

We discussed including the extension with the last update, there is really nothing against it from our implementation point of view. It's just that the IDs are optional; not all commits are commitConfirmed and not all targets can send a server_commit_id.
We don't intend on storing the extension data on the controller side, otherwise we would include it as part of all the updates (and we would need a marker indicating the last update for a specific commit).
What we are interested on is knowing that the sync-ed notifications give a full view of a committed configuration at one point in time. For each received SyncDone the controller can (safely?) calculate deviations between its intents and the target's sync-ed config.

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Such a leaf tells that the commit was applied but it doesn't tell the controller that it received all the notifications related to that commit. Using an on-change subscription with a message signaling that a commit's notifications were streamed is a fast and lightweight way for a client to remain in sync with target config.

Can we specify this somewhere? It's not clear today that the expectation is that this is applied to every SubscribeResponse that corresponds to a specific commit. There's some book-keeping that is needed here internally, right -- since you're saying that the system needs to understand the provenance of a change to any config leaf.

Again, please can we specify this if this is the preferred behaviour? If so, it seems like this would not be compatible with having a caching collector that works per-path -- since there's simply nothing to cache this SubscribeResponse against. Again, did you consider just having this part of the payload not part of the RPC?

The preferred behavior is sending the extension in its own SubscribeRequest.
There is a hint to that in the proto comments, but I agree that we should write a proper specification for this extension.

// ConfigSubscriptionSyncDone is sent by the server in the SubscribeResponse
// after all the updates for the configuration schema nodes have been sent.
message ConfigSubscriptionSyncDone {

Again, did you consider just having this part of the payload not part of the RPC?

Yes, as said above nothing against this. But if the goal is to store the commit IDs with the updates, we need to keep in mind that there might not be commit IDs sent in the extension and we would need to include the extension in all the updates with a marker for the last one.

Copy link
Contributor

@robshakir robshakir Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternate design space here is to have a leaf that specifically means this something like /system/commits/commit[id=foo]/state/applied and use this as the marker. Did you consider such a design space?

Such a leaf tells that the commit was applied but it doesn't tell the controller that it received all the notifications related to that commit. Using an on-change subscription with a message signaling that a commit's notifications were streamed is a fast and lightweight way for a client to remain in sync with target config.

From an implementation perspective -- either the gNMI SubscribeResponse you generate or this update needs to be sent in order, isn't it about the same? It seems like you need something where a producer can say "I'm done with creating these updates <eom>", so that the correct SubscribeResponse can be generated. I'm interested why this signal (which also must be in order) couldn't just be an update itself. (The latter being a cached path would mean that one would not necessarily need a subscription running simultaneously, and would work through caches.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the above alternative implies that:

  1. There is a commit ID.
  2. The leaf /system/commits/commit[id=foo]/state/applied is set after all updates have been streamed (not just created).
  3. The commit ID is stored in the controller's cache with each leaf.

For 1) Not all commits are commitConfirmed and not all implementations provide a server commit ID.
For 2) Seems hard to achieve across all existing subscriptions. We also didn't want to assume all gNMI implementation have such a leaf (gNMI being schema agnostic)
For 3) We don't store the commit ID when we receive it, we create a candidate from the cache on the controller side when we receive the syncDone extension. This guarantees that the candidate includes all the data that a certain commit changed. The config deviations are then calculated against the candidate.

We didn't want to assume too much on how gNMI implementations are done out there to keep the extension as vendor agnostic as possible.
Sending a signal over a specific subscription after all notifications are sent (1) does not need a commit ID to exist, (2) does not assume that there is a leaf in the schema that tracks the state of applied commits and is only sent after related notifications are streamed and (3) does not require the controller to store the commit ID per leaf to be able to calculate deviations.

History history = 3; // History extension.
Commit commit = 4; // Commit confirmed extension.
Depth depth = 5; // Depth extension.
ConfigSubscription config_subscription = 6; // Config Subscription extension.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @robshakir @dplore, all

We would like to resume the discussions around this extension proposal.

After extensive internal use of this extension we find it particularly fit for a common problem in the network management domain -- configuration deviation/drift detection and reconciliation. We believe it might be useful for operators and other gNMI users.

To summarize, ConfigSubscription extension provides the following capabilities that contribute to the config drift detection use case:

  1. An ability to subscribe to configuration only data
    Contrary to Get RPC, Subscribe RPC does not have a toggle to select which category of data nodes to subscribe to (state vs config vs both)
    This is important for gNMI applications that are based on yang models without strict state/config separation of data
  2. A way to identify commit boundaries when subscribed to configuration data.
    The sync_done message allows a client to identify when all changes for a particular commit have been fully sent by the gNMI server. This ensures it is safe to trigger deviation computation without needing to perform a full resync.

These two new capabilities added by this extension to gNMI could (among other use cases) enable a gNMI client to make use of a robust and reactive configuration drift detection/reconciliation mechanism.
When used together, the configuration mananagement system can make use of the streaming nature of gNMI Subscribe RPC to get a rapid deviation detection.

First, it benefits from a lightweight subscription mode when only changes to the configuration datastore elements are being sent as notifications.

And then, by making use of the sync_done message in this extension the system can quickly react to the changes made to the configuration data it is subscribed to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case makes sense, we can probably stop selling it and work out the details :-)

The subscription to config only data is fine -- I would be OK with adding something here to SubscriptionList that says the the data type similarly to Get. We can just make the default state of this as BOTH, which would give a means to be able to be backwards compatible should this field not be specified.

@dplore
Copy link
Member

dplore commented Jul 16, 2024

@hellt please also add a markdown doc for the complete documentation with use cases (as you have in the comments here). This should go into the gnmi refererence repo (you can link that PR to this PR to make it easier for us to track).

@dplore
Copy link
Member

dplore commented Jul 16, 2024

This was reviewed in July 16, 2024 OC operators meeting. No outstanding comments at the moment. Operators will review over the next week. Setting last call for July 30, 2024.

@ubaumann
Copy link

+1 for this extension

Configuration drift and reconciliation is a critical topic if we want to bring automated network operations to the next level. Nice work.

@MrHamel
Copy link

MrHamel commented Jul 16, 2024

Why not have the ConfigSubscription report when a "commit confirm(ed)" is happening? An NMS would be completely blind to that specific kind of action taking place unless they specify that "persist ID", and it wouldn't know if a change is rolled back other than compare the delta changes with its own database over time.

@jarrodb
Copy link

jarrodb commented Jul 17, 2024

+1 for this! Thanks for submitting!

@mcanatella
Copy link

+1 I definitely need this

@yunheL
Copy link

yunheL commented Jul 17, 2024

+1 thanks for the detailed problem statement and use case, I think this is going to be useful

@wendall-robinson
Copy link

+1 I like having this kind of granularity

// schema nodes only.
message ConfigSubscription {
oneof action {
// ConfigSubscriptionStart is sent by the server in the SubscribeRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in the use cases doesn't seem to correspond to this, and it seems like there's overloading of these fields.

a) the client sends something from the target with ConfigSubscriptionStart set. I suggest this is either moved to the SubscriptionList or we specifically call this something that is clearer that we are filtering the contents to config only nodes.

b) I'd suggest making this a separate extension -- or making this oneof separate client->target and target->client fields.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo in the comment, it should be ConfigSubscriptionStart is sent by the client in the SubscribeRequest

It's true that receiving ConfigSubscriptionStart really means two things for the target:

  1. Send the content of config only nodes.
  2. Send a ConfigSubscriptionSyncDone (for on-change subscriptions) after the notifications of each commit are streamed.

If we add a field (called type like GetRequest ?) under SubscriptionList that can be set to: BOTH, CONFIG or STATE, should the presence of this extension imply that type is set to CONFIG ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, if we add a field to the SubscriptionList then we should consider receiving this extension without the type being set to CONFIG is an error. Having APIs where something optional (an extension) implies the value of something required is error prone and creates additional testing overhead [0] -- whereas if we're explicit, it's clear what is meant.

[0]: for example, if this extension implies CONFIG then you will end up with the case whereby if I sent something with STATE to a target that doesn't support the extension (and therefore ignores it, validly) it will return a different set of paths to a target that does understand the extension and has implicitly meant that I assumed CONFIG.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karimra do you want to add a extension field to the SubscriptionList indicating config vs. state? Seems this will give the client an explicit indication that the target is or is not honoring the ConfigSubscriptionStart.

Another way to handle this might be to implement a way to get the supported extensions? Then a client can programmatically know if ConfigSubscription is supported.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was planning to do that in a separate PR if that's ok.
My understanding is that it won't be an extension but an additional field called type with 3 possible values BOTH, CONFIG and STATE.

message ConfigSubscriptionSyncDone {
// ID of a commit confirm operation as assigned by the client
// see Commit Confirm extension for more details.
string commit_confirm_id = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a oneof between the two values?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two values can both be present in the same message. The commit_confirm_id is present if the commit is a commit confirmed one.
The server_commit_id is an internal ID to the target. Implementations can leave it empty if no ID is available or simply set it to the commit timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, here is a demo that shows how both commit_confirm_id and server_commit_id are both present when commit with confirm is executed

https://youtu.be/MO7_ZYbRT3k?si=ukWgC4H73D8_Nqgy&t=723

@ezobn
Copy link

ezobn commented Jul 17, 2024

+1. F.E., Cisco NSO currently is using sync-from requests to initialize sync from devices to internal XML DB (CDB). And this feature is very useful, because you can subscribe to the needed xpath for the changes. It by itself create sync-event based fidelity of living XML DB for configs of the whole network. It gives you a nice 100% integrity of the Network, as the source of truth. You can subscribe of the netconf-config-change events indeed, but this requires to handle a lot of tcp sessions, because of nature of netconf. But if, devices, can push this information, by them-selfs, because you can subscribe to the whole config change, and even give you indicators that you get everything! It sounds very good. if, everybody starts to support this, we can have 100% integrity of the network configuration DB. That itself, one of the building block of all automation journey...

@ryanmerolle
Copy link

This backup and config diffing vs intended is a popular workflow in the network operator community. It would make so much sense to implement like @hellt proposed.

@dplore
Copy link
Member

dplore commented Jul 30, 2024

@robshakir all your comments have responses. Can you take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.