Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Hash for storm::RationalNumber in sylvan #618

Open
dbasgoeze opened this issue Sep 5, 2024 · 1 comment
Open

Bad Hash for storm::RationalNumber in sylvan #618

dbasgoeze opened this issue Sep 5, 2024 · 1 comment

Comments

@dbasgoeze
Copy link
Contributor

dbasgoeze commented Sep 5, 2024

I've included grid.pm.txt [as .txt to make github happy].

While running storm --prism grid.pm --exact rationals --sound --prop "R{\"x\"}=? [F \"goal\"]" --engine dd I get the error message: BDD Unique table full, 897954 of 67108864 buckets filled!

Taking a closer look 897954 / 67108864 is only about 1.34%! With my sylvan knowledge I immediately suspected that the hash used is not good enough and leads to many collisions. Investigation soon pointed to the hash of the RationalNumbers used in the ADD, calculated in this line:

return seed ^ (std::hash<storm::RationalNumber>()(srn_a) + 0x9e3779b9 + (seed<<6) + (seed>>2));

Changing the line the following fixes the issue:

return seed + std::hash<std::string>{}(srn_a.get_str());

Now storm actually runs OOM like I would've expected (therefore clearly using more of the unique table than 2%). Of course this is not a great fix as first std::hash still does not guarantee to be a good hash, second creating the string is costly and third + while better than ^ imho is still not a good hash combine function.

The following assumes that storm::RationalNumber is always a GMP type.

One could hash the numerator and denumerator arrays in the underlying mpq_t but GMP uses some small fraction optimizations (something like storing the values in the space for the pointers to the arrays I suppose), so this is rather tricky to do. Another approach is to convert the number to double and hash it (but don't use std::hash as I think it more often than not it just uses the identity hash which has bad properties for the hash map used in sylvan) but that approach has the risk of creating collisions if the model uses a lot numbers closer to each other than the resolution of double (which I suppose is not that uncommon). In another project I use a combination of all three approaches.

Another important thing is to not hash the value of the pointers in the mpq_t as then equal but different numbers have different hashes. I speak from experience...

Edit: I was wrong with the assumption that GMP uses small object optimizations. It's just that mpz_t._mp_size is negative when the number is negative.

@sjunges
Copy link
Contributor

sjunges commented Sep 7, 2024

Our hash function looks an awful loot like the scramble here: https://daniel-strecker.com/blog/2020-07-06_c++_hash_for_gmp_mpz_t_mpz_class/. We could experiment with that hash function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants