Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce HashCompressor for name compression #396

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

bal-e
Copy link

@bal-e bal-e commented Sep 23, 2024

I noticed that 'TreeCompressor' will likely have a lot of overhead, as it has to copy each label into its internal tree and it maintains a separate hashmap for every known name (where each name likely has one or two parents at the most, even though hashmaps have a large capacity).

'HashCompressor' is an alternative name compressor built on top of the 'hashbrown' crate, which offers lower-level hash table access that lets entries indirectly reference the built message for hashing and equality checks. It maintains a single hashmap, it should be faster, and it does not require as much memory.

The implementation here makes it clear that domain names should have been stored in reverse order, at least in the wire format. A decent chunk of the logic goes into correctly reversing the domain name. In fact, the implementation likely quadratic runtime because of this (at least when 'Name' or 'RelativeName' are used, as backward iteration on them has quadratic runtime).

While 'std' essentially exposes the same data structures as 'hashbrown', it does not currently offer a way to perform low-level hash table access (although there is a nightly "raw entry" feature). This may change in the future.

If not for the added 'hashbrown' dependency, I would suggest deprecating 'TreeCompressor' entirely.

@bal-e bal-e added the enhancement New feature or request label Sep 23, 2024
@bal-e bal-e self-assigned this Sep 23, 2024
@bal-e bal-e force-pushed the hash-name-compressor branch 2 times, most recently from d790e84 to b09d0e8 Compare September 23, 2024 12:49
arya dradjica added 2 commits September 23, 2024 15:07
I noticed that 'TreeCompressor' will likely have a lot of overhead, as
it has to copy each label into its internal tree and it maintains a
separate hashmap for every known name (where each name likely has one or
two parents at the most, even though hashmaps have a large capacity).

'HashCompressor' is an alternative name compressor built on top of the
'hashbrown' crate, which offers lower-level hash table access that lets
entries indirectly reference the built message for hashing and equality
checks.  It maintains a single hashmap, it should be faster, and it does
not require as much memory.

The implementation here makes it clear that domain names should have
been stored in reverse order, at least in the wire format.  A decent
chunk of the logic goes into correctly reversing the domain name.  In
fact, the implementation likely quadratic runtime because of this (at
least when 'Name' or 'RelativeName' are used, as backward iteration on
them has quadratic runtime).

While 'std' essentially exposes the same data structures as 'hashbrown',
it does not currently offer a way to perform low-level hash table access
(although there is a nightly "raw entry" feature).  This may change in
the future.

If not for the added 'hashbrown' dependency, I would suggest deprecating
'TreeCompressor' entirely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant