Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to test some hashes but need a bit of information on the "state" #17

Open
MatthewCushing opened this issue Jan 21, 2019 · 1 comment
Assignees

Comments

@MatthewCushing
Copy link

MatthewCushing commented Jan 21, 2019

So I was able to get a variation of the FNV1a tests working by adding it in which was great. But honestly, I'm not 100% sure I've put it in correctly since I'm not sure what is getting passed in. If we look at one of your implementations:

FNV32a_with_state_test(const void *key, int len, const void *state, void *out)
{
    uint32_t h = *((uint32_t *)state);
    const uint8_t *data = (const uint8_t *)key;

    h ^= BIG_CONSTANT(2166136261);

    for(int I = 0; I < len; I++) {
        h ^= data[I];
        h *= 16777619;
    }

    *(uint32_t *) out = h;
}

The big thing that I'm not understanding is what is going on here:

h ^= BIG_CONSTANT(2166136261);

Am I correct that this is initially a value of 0? Thus XOR'ing it is doing nothing but keeping its original state that we have casted and giving it the constant value here?

This is a version of it we have implemented:

static std::uint32_t GenerateFnv1aHash(const std::string inputString)
		{
			std::uint32_t hash = 2166136261;

			for (char i : inputString)
			{
				hash = 16777619 * (hash ^ i);
			}

			return hash ^ (hash >> 16);
                }

The way I put it into SMHasher is:

FNV32a_My_Implementation_Test(const void *key, int len, const void *state, void *out)
{
    uint32_t h = *((uint32_t *)state);
    const uint8_t *data = (const uint8_t *)key;

    h ^= BIG_CONSTANT(2166136261);

    for(int I = 0; I < len; I++) {
        h ^= data[I];
        h *= 16777619;
    }

    *(uint32_t *) out = h ^ (h >> 16);
}

Or another has we've implemented is Djb2, here's our original implementation:

		static std::uint32_t GenerateDjb2Hash(const std::string inputString)
		{
			std::uint32_t hash = 5381;

			for (char i : inputString)
			{
				hash = 33 * hash ^ static_cast<unsigned char>(i);
			}

			return hash;
		}

And here's how I was implementing it into SMHasher:

Djb2_My_Implementation_Test(const void *key, int len, const void *state, void *out)
{
    uint32_t hash = *((uintew_t *)state);
    const uint8_t *data = (const uint8_t *)key;

    hash ^= 5381;

    for(int i = 0; i < len; ++i)
    {
        hash *= 33;
        hash ^= static_cast<unsigned char>(data[i]);
    }
    *(uint32_t *) out = hash;
}

Am I getting this right or am I getting this completely wrong? I'd apprecitate any help, I'm still in the process of getting familiar with hash functions, the terminology, and how it all comes together.

Thanks!

EDIT: I have a feeling I'm not putting these in right as I am getting some pretty big failures via the SMHasher tests.

For the version of the FNV1a that we used I got - Failed 107 of 195 tests
For the version of Djb2 that we used I got - Failed 160 of 195 tests

I'm running your version of FNV1a to see if it's similar but I have a feeling it won't be.

EDIT2: Welp, I accidentally exited out of test running for your version of FNV1a but the amount of time it was taking to implement it vs our version was vastly different. I will run it again tomorrow when I can but I don't have time before I get off work. In the end, I feel as if I am not understanding the arguments correctly that I need to implement into the hash function. It would be great if you could explain what each argument is doing.

From what I understand:
key = is the string that is going to be hashed
len = the length of the key
state = (not sure exactly)
out = essentially the return value?

EDIT3: Well it seems the tests are similar but I'd love some confirmation that I'm understanding this correctly.

@demerphq demerphq self-assigned this Apr 8, 2019
@demerphq
Copy link
Owner

demerphq commented Apr 8, 2019

Most hashes are seeded, and that seed is transformed into an internal state before hashing. So for instance a hash might take a 128 bit seed, and then transform it into a 256 bit state, and hashing proceeds against the state. Sometimes this operation is expensive. My version of smhasher separates this into two steps. The first takes a seed and sets up the state, and the second uses that state to hash. Consider djb2 hash, it does not take a seed, and initializes its 32-bit state to 5381. If it were seeded then it would allow you to set the state explicitly via a seed. So for instance it might do:

state = seed ^ 5381;

or something similar. Maybe it might take a 16 bit seed and then do something like:

state = ((seed ^ 123) << 16 | (seed ^ 5381));

so that it ensures the state does not start as zero.

out is the return value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants