Skip to content

Hashing Framework

Felix Gündling edited this page Dec 27, 2022 · 12 revisions

Introduction

Writing hashing functions for keys in hash-based containers is a tedious task. However, often hashing a struct involves hashing all its members and combining those hashes. Hashing iterable containers like std::vector<T> or std::set<T> involves hashing all entries and hash combining the computed hashes.

Cista contains a framework that automatically hashes any composition of scalars, iterable structs (like vector, set, etc.), and standard layout, non-polymorphic aggregate types (Code is here). Char arrays are hashed as string. Additionally, it allows to implement a hash() const -> size_t member function or an overloaded std::hash<T>. Hashing is done recursively. If your type does not fulfil those requirements, but you implemented the cista_members function (as described in the chapter about serialization), cista's default hashing function will do the job for you. Custom implementations will take precedence over cista's default hash function.

This hashing function is the default hash of cista::hash_map and cista::hash_set.

Example

Consider the following struct:

struct message {
  int code_{};
  std::string text_;
  std::string type_;
};

Depending on the application, a hashing function would probably often just hash all members. This is what cista::hashing<T> does. And it does it recursively - e.g the std::string members in the message struct are hashed as iterable of char entries.

message k{3, std::string{"4321"}, std::string{"1234"}};
CHECK(cista::hashing<message>{}(k) ==
      // hash message::type_ third
      cista::hash(std::string{"1234"},  // std::string is hashed as iterable!
                  // hash message::text_ second
                  cista::hash(std::string{"4321"},
                              // hash message::code_ first
                              cista::hash_combine(cista::BASE_HASH, 3))));

Strings

It is important to notice, that all string types yield the same hash if they contain the same sequence of characters. This includes:

  • Cista string cista::raw::string and cista::offset::string
  • Cista string view cista::raw::string_view and cista::offset::string_view
  • standard string std::string
  • standard string view std::string_view
  • char arrays: char[N]

Hash Equivalent Types

As mentioned in the point above (Strings), all string types yield the same hash if the contents match. This is integrated into cista::hashing<T>: the create<T1>() function allows to create a hashing<T1> struct iff cista::is_hash_equivalent<T, T1> evaluates to true. This attribute is true for all string types mentioned above. To enable the same behaviour for custom types, the following "marker" can be used:

template <typename A, typename B>
struct is_hash_equivalent_helper : std::true_type {};

Hash equivalence is very handy when it comes to lookups in cista::hash_map and cista::hash_set because it prevents creation of intermediate values.