-
-
Notifications
You must be signed in to change notification settings - Fork 147
Investigate non-throwing overflow resolution #21
Comments
If this is not possible I could imagine a solution that lets the user choose either pure performance (throw on overflow, as it is now) or a bit less performance but with graceful error handling. Of course, if it is possible to have both (graceful overflow handling and performance) this option is obsolete. |
I think we can achieve this by sacrificing time or space efficiency only when the above extreme situations occur. This will ensure that we maintain optimal performance in most cases, but will continue to work in extreme conditions. |
I use a marker that counts the distance to the original bucket. I currently throw an exception when that marker would overflow. Instead, I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead. That way there should be almost no performance penalty in the normal case, and I could still handle the extreme case. |
Does this mean that in that case, all items in the container will degrade into linear probing? If so, is this container likely to automatically promote it self back to robin hood hashing in the future? |
well, it's always robin hood hashing, even when there are lots of collisions. Collision handling is always a linear search. When you remove the colliding elements there will be less linear search. |
If I understand correctly, you means there will be two linear search algorithms, one is we currently used (the fast version), and the other slower one is used to solve the problem of overflow (marker reaches the maximum), right? So we can now determine that in the new implementation, the faster linear search algorithm may be replaced by the slower one. Then:
|
Is there any update on this? I'm currently using your excellent implementation for sparse grids. The problem is that sometimes I get an overflow exception, which unfortunately forces me to use the much slower and memory hungry, but perfectly stable, std::unordered_map again ... |
What hash are you using? With a reasonably good hash this should never happen. Can you post just the hash code? |
I'm using your default hash-function with values computed as follows:
Edit: I just remembered that I "fixed" the exceptions by changing my key to the code above. I used this before and it would cause the exception more often: Also pretty interesting is that the overflow occurs when I simply copy all the elements from one map into another:
|
Can you share a reproducer? What compiler are you using, and do you compiler for 64 or 32 bit? |
I wrote you an email with the link to the minimal reproducer. I'm using Visual Studio 16.4.2 and only compile for 64 bit. Thanks! |
Thanks for the reproducer! I could reproduce it, here's what's the issue: for (const auto& entry : sourceNode.grid) {
targetNode.grid[entry.first] = entry.second;
} You iterate a large hashmap, and insert all the entries into When inserting, since the There is no ideal way to fix this problem, in your case you can do this in PointCloudConverter.cpp: void PointCloudConverter::MergeOctreesInternal(Node& targetNode, Node& sourceNode,
uint32_t depth, uint32_t partitionDepth, uint32_t partitionIndex) {
targetNode.grid.reserve(targetNode.grid.size() + sourceNode.grid.size());
for (const auto& entry : sourceNode.grid) {
targetNode.grid[entry.first] = entry.second;
} With that |
On second thought, this is really an issue of bad quality of the hash. If you replace robin_hood's hash_int with this code, this issue also doesn't happen (then there is also no need for reserve()): inline size_t hash_int(uint64_t obj) noexcept {
#if ROBIN_HOOD(HAS_UMUL128)
uint64_t h;
uint64_t l = detail::umul128(obj ^ UINT64_C(0xa0761d6478bd642f), UINT64_C(0xe7037ed1a0b428db), &h);
return h ^ l;
#elif ROBIN_HOOD(BITNESS) == 32
uint64_t const r = obj * UINT64_C(0xca4bcaa75ec3f625);
auto h = static_cast<uint32_t>(r >> 32U);
auto l = static_cast<uint32_t>(r);
return h + l;
#else
// murmurhash 3 finalizer
uint64_t h = obj;
h ^= h >> 33;
h *= 0xff51afd7ed558ccd;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53;
h ^= h >> 33;
return static_cast<size_t>(h);
#endif
} |
Finally came around to trying your other hash function and it seems to work for all the datasets that I have. In my opinion, a fallback to a slower linear search should still be added for absolute stability. The issue is that if my program needs to run for multiple hours, I need to be sure that it won't crash due to a very rare sequence of hashs. |
You are of course right, a linear search fallback would be important to have. I don't have the time currently for this though. But I'm at least working on an improved hash. |
I believe I've (finally) a simple & fast reproducer: TEST_CASE("overflow_finder_simple") {
robin_hood::unordered_set<uint64_t> set;
for (uint64_t i = 0; i < UINT64_C(1000000); ++i) {
auto h = robin_hood::hash<uint64_t>{}(i);
if (UINT64_C(0) == (h & UINT64_C(0x1ff))) {
set.insert(i);
}
}
} This shows that the lower bits of the hash are very bad, which is what can lead to the overflow. mitigation strategy: create an initial very simple mixing hash that's part of the robin_hood table, that basically multiplies the value with a random odd number, and this is then passed on to the robin_hood hash. When overflow happens, add an odd constant to the mix multiplier, and rehash the map. Advantages/disadvantages of this approach:
|
Sorry to step in, but I have trouble with hashing of grid indexes (so, two ints, I'm working on a discrete spatial grid). Should I be able to fix the error by swapping the I have map overflows exceptions. I tried the above solution but seems to slow down my algorithm. Do you know more specific and efficient hashing functions for a continuous bidimensional range of integers (AKA grid coordinates)? |
I have been getting this exception extremely rarely when inserting sha256 hashed data -- truly random data really, into the hash table (I just take the first 64 bits of the 256 bit number)!! Is this still some sort of known issue? I have seen this on so-called "good" hash functions, but extremely rarely. |
For extremely bad hash, overflow can happen. Investigate how to resolve this gracefully (and without losing performance!)
The text was updated successfully, but these errors were encountered: