Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifying gNMI server behavior for GET and default values #155

Closed
dplore opened this issue Feb 17, 2022 · 23 comments · Fixed by #169
Closed

Clarifying gNMI server behavior for GET and default values #155

dplore opened this issue Feb 17, 2022 · 23 comments · Fixed by #169

Comments

@dplore
Copy link
Member

dplore commented Feb 17, 2022

This is a fork of issues from #142

  • GET for a state path leaf MUST return one of:

    • a value (the default value or a set value)
    • path not found if the path is syntactically valid but is not currently instantiated on the target
    • not supported if the path is syntactically valid but is not implemented
    • an error if the path is syntactically incorrect
  • GET for a config path leaf MUST return one of:

    • the set value OR the default value if defined OR nil if the value is not set and there is no default
    • path not found if the path is syntactically valid but is not currently instantiated on the target
    • not supported if the path is syntactically valid but is not implemented
    • an error if the path is syntactically incorrect
  • GET a subtree

    • TBD

This may need to be clarified as there is an implementation which is returning 'nil' when a leaf that does exist and contains a default value.

@wenovus
Copy link
Contributor

wenovus commented Feb 18, 2022

For state paths, could it also return nil (i.e. no updates) if the path is not populated? In other words, should there be a difference between config and state paths as far as telemetry is concerned?

@dan-lepage
Copy link

The state paths represent the operational state of the device, and the YANG spec requires:

When the default value is in use, the server MUST operationally
behave as if the leaf was present in the data tree with the default
value as its value.
(from rfc7950: Section 7.6.1)

So if a path has a default value and has not been explicitly set, the server MUST return the default value when you query the state.

There are two ways to unset a path - you can REPLACE an ancestor without providing a value for this path, or you can DELETE the path directly. In the first case, the gNMI spec says that the value shouldn't become unset if it has a default, but should instead be set to its default value Section 3.4.6 says the opposite for deletion - if a path is deleted, it should be removed from the data tree entirely. I think this is an error, and section 3.4.6 should instead say that DELETE behaves just like an empty REPLACE, and sets a path to its default value if there is one.

This means that we won't need to handle config and state paths differently, and it means that the implementation of GET doesn't even need to know about defaults - it's impossible for a node that has a default value to ever be unset.

This also makes the GET logic simple: GET for any path should return the value at that path if one is set, null if the path is not set, and an error if the path is not instantiated, not supported, or not recognized at all.

To support this, we should also clarify that paths with defaults MUST have those default values when the data tree is first created - a newly-started device should have default values throughout the tree, not an empty tree.

Note: Of course, this is only defining how the API behaves - under the hood, an implementation doesn't have to store default values, if it's trying e.g. to minimize memory, but it MUST respond to API calls as though all unset paths with defaults were set to their default values.

@wenovus
Copy link
Contributor

wenovus commented Feb 26, 2022

I mentioned here #142 (comment) that

Since both GET and SUBSCRIBE responses use gnmipb.Notification as the response type, it seems to me that their behaviours should be the same.

However, I also see in https://github.com/openconfig/reference/blob/master/rpc/gnmi/gnmi-specification.md#332-the-getresponse-message

The target MUST generate a Notification message for each path specified in the client's GetRequest

Therefore, there indeed could be differences between Subscribe and Get. We should clarify what a "nil" return value means in the OP.

@wenovus
Copy link
Contributor

wenovus commented Feb 26, 2022

This comment may be relevant: #144 (comment) -- it appears that their behaviours could be different, that for GetResponse, the Notification will be there for each path but with empty Update and Delete fields -- unless we decide to change the spec to say that for a non-existent path, a Notification is not sent.

@robshakir We need some clarification on whether for an unpopulated path, that there is a difference between a GetResponse and SubscribeResponse in how Notifications are populated.

@wenovus
Copy link
Contributor

wenovus commented Apr 18, 2022

Had a quick discussion with @robshakir on both this issue and #142, could you confirm and if necessary expand on the following points? Thanks.

  • OK with always returning the default value according to whether it exists per RFC7951 rules. The .PopulateDefaults() method currently does this.
  • (please confirm this) A GetResponse for a non-existent, but schema-valid path returns NotFound. If any path in the GetRequest is not found, then the whole request can return NotFound since we assume that the user wanted all-or-nothing semantics for both simplicity and since the user is able to send separate requests. Alternatively is there a way to return an error that selectively specifies NotFound per path?
  • A Replace(nil) SetRequest should be rejected: it is equivalent to Delete but less clear.

@dan-lepage
Copy link

OK with always returning the default value according to whether it exists per RFC7951 rules. The .PopulateDefaults() method currently does this.

  1. Does a path where no default or value is specified still exist? E.g. it have an implicit default of some zero value, or will the server not return a value at all if a path has no default and no set value? And more specifically, if you Get or Subscribe to a container, will there be a way to distinguish between "a child of this container has no set value" and "the device doesn't support or even know about that child"?
  2. There are nodes in the OC tree that don't appear to have default values in the YANG sense, but still require defaults according to the spec - e.g. the documentation for interfaces/interface[name]/config/mtu states "If this is not set, the mtu is set to the operational default -- e.g., 1514 bytes on an Ethernet interface.", but it's not clear to me if that means "querying for this path on an Ethernet interface should return 1514 bytes if no value has been set" or just "the device should give this interface a 1514-byte MTU if no other value is specified, but the config and state values of this field will be empty".

(please confirm this) A GetResponse for a non-existent, but schema-valid path returns NotFound. If any path in the GetRequest is not found, then the whole request can return NotFound since we assume that the user wanted all-or-nothing semantics for both simplicity and since the user is able to send separate requests. Alternatively is there a way to return an error that selectively specifies NotFound per path?

This seems fine to me, but I don't have a strong opinion on it either way.

A Replace(nil) SetRequest should be rejected: it is equivalent to Delete but less clear.

This seems like it could make tooling a little more awkward to build - e.g. if you have a function that takes some data and calls several replaces at different points in the tree, and under certain circumstances it might delete the node it's replacing, you can't just do Replace(whatever), you'd have to explicitly if whatever != nil { node.Replace(whatever) } else { node.Delete() }

@wenovus
Copy link
Contributor

wenovus commented Apr 18, 2022

Correction

Sorry I wrote RFC7951 but meant RFC7950. Specifically, if you're interested, here mentions when a leaf has a default value in use.

Aside

.PopulateDefaults() tries to instantiate these values according to these rules for OpenConfig YANG, it's not perfect because it doesn't handle "when" statements. It also does something pretty simple because OpenConfig YANG doesn't have choice/case, presence containers, and if-feature statements.

Reply to 1

A leaf node exists in zero or one instance in the data tree., so I interpret that to mean the server should return nothing at all. In the case of GET it should return NotFound.

I read distinguishing between "a child of this container has no set value" and "the device doesn't support or even know about that child" as the same thing as distinguishing "a path that is in the schema" and "a path not in the schema". In both cases according to Darren's two posts in this and the other issue NotSupported would be returned in the latter case.

Reply to 2

Interesting, I'm thinking what makes the most sense in this case is to have config/mtu not exist, but state/mtu to return 1514 since it is the operational state.

Reply to the last reply

I'm open to keeping this as-is for now since this is not currently an important thing to fix in the tooling, but my current thinking is that the clarity of the API is more important than this programming convenience.

@robshakir
Copy link
Contributor

robshakir commented Jun 13, 2022

Apologies for the delay here. After a number of offline discussions, I suggest these conclusions:

  • If a default value is specified in the schema, it sounds reasonable to always return this value as long as it has a valid parent. The logic here (per standard RFC7950 rules) is that there must not be an ancestor node that is not a "non-presence" container.
    • This means, a node like /system/config/leaf-default would be returned - since OpenConfig does not use presence containers.
    • However, /interfaces/interface[name=eth0]/config/leaf-default would not be returned, since interface is a list (i.e., not a "non-presence" container).
  • If we do a Get on a node that does not exist in the tree -- then we should expect that a NotFound is returned. i.e., if we ask for /system/config/foo and no value was set for this (AND it does not have a default) then NotFound would be returned.
  • Since GetResponse has a single status, if there is more than one path in the GetRequest and any is not found, the whole request can be returned with a NotFound error. The client can choose how to group paths in the case that it does not see the paths as a group that succeed or fail together.
  • Replace(nil) doesn't seem like something we should accept, this is handled by delete in the SetRequest, and cannot be distinguished easily from a buggy behaviour where the value was just not specified.

@wenovus
Copy link
Contributor

wenovus commented Jun 14, 2022

@robshakir

  • However, /interfaces/interface[name=eth0]/config/leaf-default would not be returned, since interface is a list (i.e., not a "non-presence" container).
    For this point this is assuming the list element interface[name=eth0] does not exist right? Since if it exists, then it has a valid parent and thus the default value exists.

Do you think it's worth getting this into the spec while memory is fresh so that this doesn't end up as another FAQ item? If you want I could try adding a subsection on default values for Subscribe and Get and adding a couple sentences for Replace(nil) based on your comment here. I'm also happy to review your PR.

@robshakir
Copy link
Contributor

Yes, I think it's worth making a change in the spec here.

@hellt
Copy link
Contributor

hellt commented Jun 28, 2022

What do we return in cases where

  1. default value is defined and there is a non-presence container and valid tree path is used?
  2. default value is not defined for a leaf & valid tree path is used

no update value?

@robshakir
Copy link
Contributor

there's significant overlap here with #142, the current conclusions there are:

  1. this returns the default value,
  2. this returns NotFound

please see the discussion over in the other issue :-)

@dan-lepage
Copy link

Since GetResponse has a single status, if there is more than one path in the GetRequest and any is not found, the whole request can be returned with a NotFound error. The client can choose how to group paths in the case that it does not see the paths as a group that succeed or fail together.

You can have a NotFound status with a payload - do we require that a GetResponse have a Notification for every requested path that DOES exist on a NotFound, or do we allow the device to just return a NotFound with no data?

@dplore
Copy link
Member Author

dplore commented Jul 2, 2022

Since GetResponse has a single status, if there is more than one path in the GetRequest and any is not found, the whole request can be returned with a NotFound error. The client can choose how to group paths in the case that it does not see the paths as a group that succeed or fail together.

You can have a NotFound status with a payload - do we require that a GetResponse have a Notification for every requested path that DOES exist on a NotFound, or do we allow the device to just return a NotFound with no data?

The gNMI spec doesn't elaborate on how GetRequest errors are handled. It only says that the GetResponse must contain a set of notifications for all the paths. I interpret this as the server must return all the paths in the response with a single error. So all paths are returned; those that are found and those which are not found. Those which are not found would have a 'nil' value.

Ref: GetResponse

@wenovus
Copy link
Contributor

wenovus commented Jul 6, 2022

Under "GetResponse Behavior Table" in #169 you can see that we could have NOT_FOUND, UNIMPLEMENTED, or INVALID_ARGUMENT for any of the paths. If multiple of these errors are triggered, what should the behaviour be?

Since GetResponse has a single status, if there is more than one path in the GetRequest and any is not found, the whole request can be returned with a NotFound error. The client can choose how to group paths in the case that it does not see the paths as a group that succeed or fail together.

You can have a NotFound status with a payload - do we require that a GetResponse have a Notification for every requested path that DOES exist on a NotFound, or do we allow the device to just return a NotFound with no data?

The gNMI spec doesn't elaborate on how GetRequest errors are handled. It only says that the GetResponse must contain a set of notifications for all the paths. I interpret this as the server must return all the paths in the response with a single error. So all paths are returned; those that are found and those which are not found. Those which are not found would have a 'nil' value.

Ref: GetResponse

How do we represent a NotFound path, or Unimplemented or Invalid path in the payload? By simply leaving the update and delete fields empty?

@dplore
Copy link
Member Author

dplore commented Jul 7, 2022

Under "GetResponse Behavior Table" in #169 you can see that we could have NOT_FOUND, UNIMPLEMENTED, or INVALID_ARGUMENT for any of the paths. If multiple of these errors are triggered, what should the behaviour be?

That is a good question. Here is an answer that I think follows specification 2.5, but with a clarification below for your consideration

The response status MUST set status to Aborted (10) and SHOULD set the Message to the first error detected and MAY supply additional structured data encoded as protobuf.Any which contains a of list of errors for each path.

@wenovus
Copy link
Contributor

wenovus commented Jul 7, 2022

Under "GetResponse Behavior Table" in #169 you can see that we could have NOT_FOUND, UNIMPLEMENTED, or INVALID_ARGUMENT for any of the paths. If multiple of these errors are triggered, what should the behaviour be?

That is a good question. Here is an answer that I think follows specification 2.5, but with a clarification below for your consideration

The response status MUST set status to Aborted (10) and SHOULD set the Message to the first error detected and MAY supply additional structured data encoded as protobuf.Any which contains a of list of errors for each path.

Does your suggestion mean that whenever there is a single failure or more of one of these three types that the error code is Aborted? Or only when there are multiple error types that Aborted is used? The former is simpler but the latter more precise. I'm leaning towards the simpler since the handling would always be the same. If we do this, then we would never return NotFound, Unimplemented, or InvalidArgument for Get or Subscribe.

Another alternative is just to return the first error code type encountered. This is also simple but less precise.

@robshakir
Copy link
Contributor

The error code that we specify should just be included in the status.proto that is returned (see the Go package for an example) - so if we're returning NotFound this should be the error code specified there (I don't think generically using Aborted makes sense, because this isn't really that descriptive at that point -- saying that /something/ was NotFound makes more sense to me).

The string message in the error SHOULD be used to indicate what it was that was the problem - and we could maybe have an ErrorDetails if we wanted to have a structured way to specify path requested -> error experienced mappings. In practice of the latter, I'm not entirely sure it's worth it, but we could discuss :-)

@robshakir
Copy link
Contributor

If multiple of the errors are triggered, I think this is really up to the implementation - I expect in practice, it'll be that the first error encountered is used, and we could encode that. This avoids the issue of having to walk all paths even if the response is going to be discarded.

@wenovus
Copy link
Contributor

wenovus commented Jul 7, 2022

Sounds good to me, thanks

@wenovus wenovus linked a pull request Jul 7, 2022 that will close this issue
@wenovus wenovus closed this as completed Jul 7, 2022
@dplore
Copy link
Member Author

dplore commented Jul 8, 2022

gNMI error handling requires both an Aborted status code AND an UpdateResult containing an error if I understand correctly. The gNMI spec is a little ambiguous to me (probably because I am not familiar enough with gRPC) on exactly where Aborted is supposed to be used and how that is different from the gNMI setResponse and UpdateResult?

Can you clarify for me, is Aborted (10) expected to be used for the gRPC status code and the gNMI UpdateResult is expected to contain an error set to NotFound, NotFound , Unimplemented, or InvalidArgument. And on topic to this issue, in the case of multiple gNMI errors, the first gNMI error encountered SHOULD be used in the UpdateResult?

@wenovus
Copy link
Contributor

wenovus commented Jul 8, 2022

gNMI error handling requires both an Aborted status code AND an UpdateResult containing an error if I understand correctly. The gNMI spec is a little ambiguous to me (probably because I am not familiar enough with gRPC) on exactly where Aborted is supposed to be used and how that is different from the gNMI setResponse and UpdateResult?

Can you clarify for me, is Aborted (10) expected to be used for the gRPC status code and the gNMI UpdateResult is expected to contain an error set to NotFound, NotFound , Unimplemented, or InvalidArgument. And on topic to this issue, in the case of multiple gNMI errors, the first gNMI error encountered SHOULD be used in the UpdateResult?

I noticed that this part of the spec is outdated and is pending to be removed by #157, it looks like UpdateResult will no longer contain any errors. Now it has the same issue as Get/Subscribe where if there are multiple error types, the implementation has to decide what to return for the error code. @robshakir do you think the same handling should be used where we leave to the implementation which error to return?

@dplore
Copy link
Member Author

dplore commented Jul 8, 2022

I think I may have resolved my own question after some investigation and gRPC learning. So I think the only status to be used for a SetResponse and UpdateResponse is the standard gRPC return codes and there is no other or additional code?

I agree that #157 clarifies this. I will add further comments there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants