Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(node): caching implementation for node operations #830

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

jayzhudev
Copy link
Contributor

@jayzhudev jayzhudev commented May 1, 2023

What

First of all, thank you for maintaining and sharing such an amazing and useful library, I hope we can give back to the community while making use of this project. This squashed commit was ported from our local fork after we made performance improvement changes for our use case. And my apologies if the size of the combined change is large for review. Please suggest any changes that could improve this patch, and ask for any clarifications.

Why

In one of our use cases of the ygot library, we found that there are significant performance benefits if we add a caching layer to the node calls. Most of the time spent on calling these methods, especially when the config tree is very large, is to perform tree traversal and find the nodes to operate on. This involves using relatively expensive calls to reflect methods, string comparisons and manipulations, and is mostly repetitive computation, if the configuration tree's structure and paths are relatively static but the values may change.

How (high-level)

The idea to implement the caching layer is to be able to use the unique node paths to locate a node immediately once its address is cached. This provides a shortcut to the tree-traversal method and brings in performance gains. In our use case, this feature addition along with the changes we made in other perf PRs brought the computation time down to <= 10ms from ~2000ms, under our benchmark tests with large config trees and high node modification stress.

There could be many other alternative implementations to achieve this performance gain, this is what we chose to implement and it worked well for our use case.

This optimization does come with a cost of complexity though, where the cache operations have to be managed properly. It also resulted in changes we had to make in reflect.go to make sure that the node's address doesn't change when its value is modified.

In addition, we hope to make sure that the ygot library is stateless, and there's no breaking change to the exported APIs after adding the caching mechanism. As a result, the applications that use the ytypes library maintain the cache(s) on their own, and they can optionally pass in a pointer of the node cache to the call option struct and get a performance boost.

Other Changes

JSON Marshaling Performance

When trying to squeeze the last bit of performance out, we decided to use go-json at a few places instead of encoding/json. There was a bit of hesitation in doing so because of adding a 3rd party dependency. But the performance boost was significant, especially on the hot paths.

Procedure Shortcuts and reflect Calls

At the very last bit, but not least, we try to optimize the use of reflect as much as possible, and early return from continuing unnecessary heap allocations (GC pressure at scale) and computations. Both of which could be considered refactoring with performance gains so no breaking change was introduced. Such code changes can be found, for example, in ygot/render.go/prependmodsJSON.

Summary of Change

  • Add caching for high performance Sets/Gets. The CPU usage and performance of GetNode/SetNode calls are significantly reduced after this change.

  • Improve JSON marshaling performance using go-json. This helps squeeze out the last bit of performance particularly for embedded devices.

  • Shortcut expensive heap allocations. The repetitive heap allocations may increase GC pressure and increase CPU usage when being called frequently.

Note: tests need to be added and completed for caching enabled method calls.

Performance

Here are some rough reference numbers from e2e testing sending 2000 leaf-list updates in a SetRequest:

Version Time
Before #830 ~2,300 ms
After #830 ~9 ms (warm cache)

Limitations

Currently, inserting a new element to a list uses reflect.Append in util/reflect.go. This changes the memory address of the slice head which results in node cache synchronization issues. Because of this, list and leaf-list types are not handled by the node cache for now.

- Add caching for high performance Sets/Gets. The CPU usage and performance of GetNode/SetNode calls are significantly reduced after this change.

- Improve JSON marshalling performance using go-json. This helps squeeze out the last bit of performance particularly for embedded devices.

- Shortcut expensive heap allocations. The repetitive heap allocations may increase GC pressure and increase CPU usage when being called frequently.

For the node cache implementation, the applications that use the `ytypes` library maintain the cache(s) on their own. The applications can optionally pass in a pointer of the `node cache` and get a performance boost. Passing in a pointer of the `node cache` is relatively cheap and has a negligible performance impact. This implementation adds optional fields to the APIs and the node arguments and doesn't introduce breaking changes.

**Note**: unit tests need to be added for `caching enabled` method calls.
@google-cla
Copy link

google-cla bot commented May 1, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@coveralls
Copy link

coveralls commented May 2, 2023

Coverage Status

Coverage: 89.905% (-0.2%) from 90.122% when pulling 212fcc8 on jayzhudev:perf-caching into 97fb230 on openconfig:master.

// - encoding.TextMarshalers are marshaled
//
// - integer keys are converted to strings
func uniquePathRepresentation(path *gpb.Path) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using util.PathToString

Copy link
Contributor Author

@jayzhudev jayzhudev May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's ygot.PathToString? We might need to compare the performance of these two if that matters. Previously path.String (encoding/prototext/encode.go) was used. Its performance was more than doubled after switching to json.Marshal (json.Marshal is deterministic so it's fine to use). After switching to go-json its was again largely improved.

Because computing the unique path representation is one of the most expensive, if not the most expensive parts of the node cache, and it's used by all cache operations, the performance of this function affects the performance of the node cache.

ytypes/node_cache.go Outdated Show resolved Hide resolved
ytypes/node_cache.go Outdated Show resolved Hide resolved
Comment on lines +75 to +77
default:
// Only TypedValue is supported by the node cache for now.
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this return an error instead?

Copy link
Contributor Author

@jayzhudev jayzhudev May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I think this type switch is used to check the type of val and see if it's supported by the node cache. Maybe we don't want to return an error since that will turn into a SetNode error and block the Set.

The node cache doesn't support all types that the ordinary SetNode supports so the intention is to let the SetNode continue (which will still modify the node, without the performance boost) if the val is not a supported type. Hope that makes sense.

Comment on lines 121 to 123
// When the node doesn't exist the unmarshal call will fail. Within the setNodeCache function this
// should not return an error. Instead, this should return setComplete == false to let the ygot library
// continue to go through the node creation process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps check that parent is nil and have an explicit return for this condition, so that the not found error is not conflated into one here?

If the cached node is valid, there is no reason that the unmarshal should fail, so I think we can return an error here so that we catch any bugs from any unforseen behaviour.

Comment on lines 50 to 51
// mapPathsPool helps reduce heap allocations.
var mapPathsPool = sync.Pool{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more details on why this helps avoid heap allocations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an experiment in this frequently called function about how much GC load sync.Pool can reduce, mostly using reference to the documentation in https://go.dev/src/sync/pool.go. In this use case I don't think there was significant GC pressure reduced (much less than the change from switching to go-json). If this is overly complex we can revert the use of sync.Pool for now. Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this file and render.go be split into a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do, that's a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you mentioned were split into #831.

Comment on lines +37 to +38
// IsGetOrCreateNodeOpt implements the GetOrCreateNodeOpt interface.
func (*NodeCacheOpt) IsGetOrCreateNodeOpt() {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing IsDelNodeOpt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for catching this, I think the node.DeleteNode should be fixed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial cache implementation was a singleton in the ygot library. When it was refactored I think some bugs were introduced too. It was used by a PoC earlier but I plan to do some major work soon to improve the tests for the node cache to help us catch such issues easier.

@@ -18,6 +18,7 @@ func TestUnmarshalSetRequest(t *testing.T) {
inUnmarshalOpts []UnmarshalOpt
want ygot.GoStruct
wantErr bool
resetNodeCache bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reusing existing tests that weren't designed for the cached node option, I think it might make for more effective testing if there were a dedicated nodeCache test that mixes Get/GetOrCreate/Set/Delete and to see it working.

Copy link
Contributor Author

@jayzhudev jayzhudev May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will be working on that over the weekend.

@wenovus
Copy link
Collaborator

wenovus commented May 3, 2023

By the way thanks for posting this contribution! This looks like something that can benefit others and overall this implementation looks good.

@jayzhudev
Copy link
Contributor Author

By the way thanks for posting this contribution! This looks like something that can benefit others and overall this implementation looks good.

Thanks for supporting this change! Let's get the other 3 PRs completed and I'll use some after work time to clean up this PR and add better tests 😄

jayzhudev added a commit to jayzhudev/ygot that referenced this pull request May 6, 2023
This is also a split change from the combined PR of node cache
implementation (openconfig#830).
... and add initial `node_cache_test`.
…sues

- Fixed the node cache not handling the container type `unmarshalGeneric` correctly
- Added a temp node cache work around for memory address changes of slice heads when
  appending to a slice
Check nil for node cache stored `schema` and `parent`. An `Internal`
error code is returned if either `schema` or `parent` is nil.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants