-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chaining hash table implementation #37
base: master
Are you sure you want to change the base?
Conversation
Nice work! I'm quite busy this week and won't have time to perform a full review, however I will try to find some time next for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an initial review with some comments. I will do a further review later.
src/hash_table/chaining.rs
Outdated
pub use crate::hash_table::ChainedLoadFactor; | ||
use crate::key_adapter::KeyAdapter; | ||
use crate::linked_list::Link as LinkedListLink; | ||
use crate::linked_list::LinkedList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really should be using singly-linked lists here. The memory savings are significant and nobody really cares about iterating through hash tables backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally planned for O(1)
inserts and removals. If I use singly-linked lists, the constant time remove operation is lost.
I had also wanted a general purpose implementation. Currently the max load factor is not constrained; the user is free to create hash tables where the number of buckets is small relative to the number of items. Linear time removals could affect this kind of use case.
I have a use case where link: LinkedListLink
is reused between a double-ended-queue and a hash table to save memory. I'll need an extra pointer if I use singly-linked lists, unless a LinkedListLink
can be used as a SinglyLinkedListLink
.
I think the solution is two chaining hash table implementations:
Which names do you like?
- singly-linked list
SinglyLinkedChainedHashTable
SinglyChainedHashTable
SinglyLinkedHashTable
- linked list
LinkedChainedHashTable
ChainedHashTable
LinkedHashTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O(1)
removal can be obtained by having cursors point to the previous element in the linked list.
Sharing a link between a linked list and a hash table can be solved by #27 which would allow you to implement your own Link
types and make them work for different collections.
src/hash_table/chaining.rs
Outdated
buckets: B, | ||
buckets_bits: C, | ||
buckets_len: usize, | ||
hash_builder: S, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, it is probably better to leave the choice of hasher to the adapter. As a bonus this eliminates our dependency on std
.
I would like to make one major change regarding the trait Array<T>: DerefMut<[T]> {
/// Hint to the array that the hash table is about to insert additional elements.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn reserve(&mut self, len: usize) -> Option<Self>;
/// Hint to shrink the capacity of the array if possible.
/// This function can return a new array to tell the hash table to rehash into the new array.
fn shrink_to_fit(&mut self, len: usize) -> Option<Self>;
} The key change here is that the The implementations of For dynamic arrays which track the load factor we will provide a |
Yes, I plan on finishing this later this week. |
f5d36dc
to
ba60b09
Compare
I've been ruminating about the design and came to the conclusion that it is much cleaner to treat This means that I wanted to get the current design reviewed. Also, test coverage is 93.2%, so check the tests for example usage. Outstanding issues
Some notesThere are now two kinds of cursors:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review, I haven't finished reading through all the code yet.
/// This function can return a new array to tell the hash table to rehash into the new array. | ||
fn shrink_to(&mut self, min_capacity: usize) -> Option<Self>; | ||
|
||
/// Returns the capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns the capacity. | |
/// Returns the number of elements the map can hold without reallocating. | |
/// | |
/// This number is a lower bound; the hash table might be able to hold | |
/// more, but is guaranteed to be able to hold at least this many. |
} | ||
|
||
impl<T> Array<T> for &'_ mut [T] { | ||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this #[inline(never)]
?
inner: Vec<T>, | ||
capacity: usize, | ||
load_factor: LoadFactor, | ||
len: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is never used?
impl<T: Default> Array<T> for DynamicArray<T> { | ||
#[inline(never)] | ||
fn reserve(&mut self, new_capacity: usize) -> Option<Self> { | ||
let new_len = new_capacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rename the parameter to new_len
?
10 11 12 13 14 15 16 17 18 19 | ||
20 21 22 23 24 25 26 27 28 29 | ||
30 31 32 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer restricting array lengths to only powers of two. This has several advantages:
- Bit masking is much faster than modulo.
- We only need to provide array impls for powers of two.
- Growing is easy: just double the size.
You can use the capacity_to_buckets
and bucket_mask_to_capacity
methods from hashbrown as a reference.
#![allow(clippy::declare_interior_mutable_const, clippy::collapsible_if)] | ||
|
||
#[cfg(feature = "alloc")] | ||
extern crate alloc; | ||
|
||
#[cfg(test)] | ||
#[macro_use] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should import macros using use std::<macro name>;
rather than using #[macro_use]
.
@@ -267,13 +267,14 @@ | |||
#![warn(missing_docs)] | |||
#![warn(rust_2018_idioms)] | |||
#![no_std] | |||
#![cfg_attr(feature = "nightly", feature(const_fn))] | |||
#![cfg_attr(feature = "nightly", feature(const_fn, const_generics))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Const generics aren't needed if we are only providing array impls for power-of-two sizes.
/// |`remove`|`O(n)`|`O(1)`| | ||
/// |`replace_next_with`|`O(1)`|`O(1)`| | ||
/// |`replace_with`|`O(n)`|`O(1)`| | ||
pub unsafe trait BucketOps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify this: just implement BucketOps
directly on LinkedList
/SinglyLinkedList
instead of using DefaultBucketOps
. This functionality isn't really something we need to let the user customize.
Remove type Bucket
and type PointerOps
. Add type Pointer
and type Value
.
/// It is also possible to create stateful adapters. | ||
/// This allows links and containers to be separated and avoids the need for objects to be modified to | ||
/// contain a link. | ||
pub trait HashTableAdapter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashTableAdapter
can be simplified. Remove PointerOps
and BucketOps
, the adapter only needs to handle key equality and key hashing. Everything else is handled by the BucketOps
from the B
array parameter to HashTable
.
Intrusive hash table implementation that supports:
Adapter<Link = LinkedListLink>
)bucket usage bitmapEntry
APIThere are a few utility traits:
Array
for implementing the bucket index and the metadata.DynamicArray: Array
for implementing dynamic allocation / resizing.Bits
BitsArray
Dynamic allocation is entirely optional and can be implemented via the
DynamicArray
trait.