Bit pattern issue #102

polyminer1 · 2023-09-07T20:12:49Z

Line 255 in 61a0530

void MurmurHash3_x64_128 ( const void * key, const int len,

At this line you initialize a 64 bit hash from a 32 bit seed. Effectively forcing the high 32bit to 0 for every call. The mixing will be more patterned down the line. The input seed should be 64bit to ensure a good mix overall.

sebres · 2023-09-07T23:29:30Z

Effectively forcing the high 32bit to 0 for every call

Nope. Firstly it has not really an "initialization" (if one carefully follows the mixing algorithm inside the blocks). Secondly it is not the case here, since 2 x 64 bits buffers get the seed in the low 32bits in:

smhasher/src/MurmurHash3.cpp

Lines 261 to 262 in 61a0530

    
           uint64_t h1 = seed; 
        
           uint64_t h2 = seed;

But they get complexly mixed hereafter during hashing and finalization phase, so almost all 32bits of seed significantly distributed over the whole hash.

The input seed should be 64bit to ensure a good mix overall.

Then it will be not MM3 anymore.

Additionally note:

MM3 is a non-cryptographic hash function, so its main goals are totally different and role of seed is secondary (see below for one of the purposes).
Also don't confuse the seed with salt, which should be also mixed with the input using some cryptographically strong methods like HMAC and co.
Therefore the seed is rather to select the specific computed value of hash function, for instance to distribute different types of objects with theoretically equal block representation across some hash table.
(if two hashes use different seeds, they are very likely to compute distinct hash values for any also the same given input).
Unsigned 32-bit seed provides 4 billions "unique" values (if hash function shows good distribution and few collisions), what is fully enough for purposes like described above typization (since 4 billions types are fully enough) etc.

And last but not least, you can have the "seed" as long as you need it - just prefix or suffix the input with some buffer (64 bits or larger) and calculate the hash for whole buffer (with increased length), or calculate it using chain mode (repeated calculation for data in parts), starting or ending with a buffer containing the seed of your arbitrary length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bit pattern issue #102

Bit pattern issue #102

polyminer1 commented Sep 7, 2023

sebres commented Sep 7, 2023 •

edited

Loading

Bit pattern issue #102

Bit pattern issue #102

Comments

polyminer1 commented Sep 7, 2023

sebres commented Sep 7, 2023 • edited Loading

sebres commented Sep 7, 2023 •

edited

Loading