Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new rax function raxAllocSize to return rax tree allocation size in constant time #677

Closed
kyle-yh-kim opened this issue Jun 20, 2024 · 12 comments · Fixed by #688
Closed

Comments

@kyle-yh-kim
Copy link
Contributor

kyle-yh-kim commented Jun 20, 2024

The problem/use-case that the feature addresses

Originally, the issue was opened in the Redis project to provide more visibility in slotToKey overhead [issue link].

A customer reached out to ask why their memory usage is very high. The customer had 40 bytes values and roughly 900 million keys. They compute the memory usage by doing key size * keyspace size. They observed a near 200% memory usage overhead on their production Redis server. A triage done by our team showed that over half of the overhead was from the slotToKey rax tree.

As a small experiment. I took the rax.c and rax.h files and compiled them into a small program. Inserting 484 Million keys into the rax tree yielded 48GB of memory usage. This is about 99 bytes per key.

The proposed solution would display the slotToKey rax tree as an overhead in the MEMORY STATS command output. This would allow Redis users to inspect and understand their memory usage in more detail.

However, since version 7.4, slotToKey rax tree was replaced with kvstore->dicts[slot]. While the original concern is no longer relevant, there still exist multiple motivations for introducing raxAllocSize, namely;

  1. Limiting stream growth

    • This was based on @madolson's comment. There has been multiple asks to support limiting a stream based of its size, for which raxAllocSize will enable.
  2. Per-slot memory metric

    • Building on top of the accurate stream size calculation, this will enable Valkey cluster to accurately track its per-slot memory usage.
  3. Detailed MEMORY STATS

    • Introduce a new field called overhead.rax, for tracking internal rax tree used to manage Valkey server. Examples include; acl, aof, active defrag, config and networking.

Description of the feature

  • Introduce a size_t field into the rax struct to track allocation size.
  • Update the allocation size on rax insert and deletes.
  • Return the allocation size when raxAllocSize is called.
  • Retrieve the allocation size when MEMORY STATS is called.

Alternatives you've considered

N/A.

Additional information

This is a revision of an already opened issue that never closed from the Redis project.

@zuiderkwast
Copy link
Contributor

Sure, go for it!

Just a thought: Is rax good for streams or should we consider replacing it with something else?

We've replaced rax in various places and now it's almost only used in streams. But maybe it's actually really good for timestamps? (Common prefixes)

@knggk
Copy link
Contributor

knggk commented Jun 21, 2024

Hi Viktor, can you give examples of where rax was replaced? Being sorted and providing range operations, I am guessing it's been replaced by a structure with those abilities as well (skiplist?).

What would be a good test to consider replacing rax on timestamps for streams? Or asked differently, what motivated the switch away from rax in the other places?

@zuiderkwast
Copy link
Contributor

@knggk It was used for slot-to-key mapping. That was replaced by storing the keys in one dict per slot. Same for shard pubsub channels.

Some other internal usages where replaced by dict i think, but i don't remember exactly. A sorted structure was not needed in those cases.

@knggk
Copy link
Contributor

knggk commented Jun 24, 2024

@zuiderkwast Understood. I am not deeply familiar with streams, but it does require sorted elements to handle range operations (for example on timestamps in xrange, iterated on here

valkey/src/t_stream.c

Lines 1802 to 1804 in 32ca6e5

raxSeek(&ri, ">=", startkey, sizeof(startkey));
while (raxNext(&ri) && (!count || arraylen < count)) {
if (end && memcmp(ri.key, end, ri.key_len) > 0) break;
). So dict would not be an option. Another option though could be skiplist. (Not sure how Valkey's skiplist implementation is usable outside of sorted sets.)

@zuiderkwast
Copy link
Contributor

@knggk Sure, dict would not be an option for streams. Purely hypothetically, imagine btree or some other exotic structure. Radix trees are good for storing keys with common prefixes. Timestamps do have a long common prefix (when they're stored in big endian, which we do) so I believe it's a pretty good choice for streams. But after that, Salvatore started using the radix trees for various other things, where they may not have been the best data structure to use.

@knggk
Copy link
Contributor

knggk commented Jun 24, 2024

@zuiderkwast Ok that makes sense. Thank you for providing that context and asking whether rax still makes sense.

I've cut an initial attempt at adding the functionality (salvaged a previous attempt). Happy to take any feedback before adding tests.

@madolson
Copy link
Member

madolson commented Jun 25, 2024

@knggk As part of your PR, or many someone else, can you consider porting https://github.com/antirez/rax/blob/master/rax-test.c over into our test base so we can write proper unit tests around the allocation tracking?

@knggk
Copy link
Contributor

knggk commented Jun 25, 2024

@madolson Done updating the PR with rax-test.c. Doing make SERVER_TEST=yes and then src/valkey-server test rax ends with OK! \o/.

Sounds like we're missing:

  1. Integrating rax-test into the build system/github workflow
  2. Plugging raxAllocSize into mem_usage so it shows for streams under show memory
  3. Tests that demonstrate improved memory tracking for streams?

We can split the work with @kyle-yh-kim when we figure out how two people can work on a PR.

On 1) and 3) I don't have precise ideas so far, input very welcome.

@zuiderkwast
Copy link
Contributor

@knggk Where's the PR?

Did you use the new unit test framework under src/unit/? If yes, then it's already included in the CI.

@knggk
Copy link
Contributor

knggk commented Jun 26, 2024

@zuiderkwast It's here #688. I tried to mention this issue (677) in that pull request thinking that's how you link an issue and a PR. Maybe I am confused.

Re src/unit, thanks! that's the piece of insight I was looking for. I will move rax-test.c under src/unit/test_rax.c and do the required adaptations.

Another issue I've faced: On a Linux dev machine, make valkey-unit-tests throws errors at link time:

    ARCHIVE libvalkey.a                                               
ar: threads_mngr.o: plugin needed to handle lto object
ar: adlist.o: plugin needed to handle lto object 
ar: quicklist.o: plugin needed to handle lto object
ar: ae.o: plugin needed to handle lto object 
...
    LINK valkey-unit-tests                                                                                                                                                                                                                                                               
/tmp/ccv5bGmV.ltrans0.ltrans.o: In function `freeTestCallback':                                                                                                                                                                                                                          
/workplace/valkey/src/unit/test_kvstore.c:10: undefined reference to `zfree'                                                                 
/tmp/ccv5bGmV.ltrans0.ltrans.o: In function `test_reclaimFilePageCache':
...

Note these failures are only on building the unit tests, not on building eg valkey-server.

I don't have issues linking unit tests on local Mac Sonoma (I am continuing there for now).

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jun 26, 2024

@zuiderkwast It's here #688. I tried to mention this issue (677) in that pull request thinking that's how you link an issue and a PR. Maybe I am confused.

Ah, I see now "knggk mentioned this issue 2 days ago". Great. Even better: If you write "Fixes #677" or "Closes #677" in the PR description, the PR with be linked to the issue and appear on the sidebar as a PR that will automatically close the issue when the PR is merged.

I'll look at the PR when I have some time. Thanks!

@knggk
Copy link
Contributor

knggk commented Jun 27, 2024

Didn't know about Fixes or Closes. Updated the text there. I can see it appearing in the side panel now as you were saying. We can continue the discussion in the PR. Thanks Viktor!

naglera pushed a commit to naglera/placeholderkv that referenced this issue Oct 10, 2024
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.

This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.

The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).

Fixes valkey-io#677.

For the release notes:

* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.

---------

Signed-off-by: Guillaume Koenig <[email protected]>
Signed-off-by: Guillaume Koenig <[email protected]>
Co-authored-by: Joey <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: naglera <[email protected]>
SoftlyRaining pushed a commit to SoftlyRaining/valkey that referenced this issue Oct 11, 2024
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.

This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.

The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).

Fixes valkey-io#677.

For the release notes:

* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.

---------

Signed-off-by: Guillaume Koenig <[email protected]>
Signed-off-by: Guillaume Koenig <[email protected]>
Co-authored-by: Joey <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
eifrah-aws pushed a commit to eifrah-aws/valkey that referenced this issue Oct 20, 2024
Introduce a `size_t` field into the rax struct to track allocation size.
Update the allocation size on rax insert and deletes.
Return the allocation size when `raxAllocSize` is called.

This size tracking is now used in MEMORY USAGE and MEMORY STATS in place
of the previous method based on sampling.

The module API allows to create sorted dictionaries, which are backed by
rax. Users now also get precise memory allocation for them (through
`ValkeyModule_MallocSizeDict`).

Fixes valkey-io#677.

For the release notes:

* MEMORY USAGE and MEMORY STATS are now exact for streams, rather than
based on sampling.

---------

Signed-off-by: Guillaume Koenig <[email protected]>
Signed-off-by: Guillaume Koenig <[email protected]>
Co-authored-by: Joey <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants