Investigate non-throwing overflow resolution #21

martinus · 2019-04-23T08:47:51Z

For extremely bad hash, overflow can happen. Investigate how to resolve this gracefully (and without losing performance!)

felixguendling · 2019-05-03T10:47:44Z

and without losing performance

If this is not possible I could imagine a solution that lets the user choose either pure performance (throw on overflow, as it is now) or a bit less performance but with graceful error handling. Of course, if it is possible to have both (graceful overflow handling and performance) this option is obsolete.

asbai · 2019-08-04T17:01:45Z

I think we can achieve this by sacrificing time or space efficiency only when the above extreme situations occur. This will ensure that we maintain optimal performance in most cases, but will continue to work in extreme conditions.

martinus · 2019-08-04T17:14:11Z

I use a marker that counts the distance to the original bucket. I currently throw an exception when that marker would overflow. Instead, I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead. That way there should be almost no performance penalty in the normal case, and I could still handle the extreme case.

asbai · 2019-08-05T02:20:05Z

I could give the maximum value of the marker a special meaning where I switch to a slow linear search instead

Does this mean that in that case, all items in the container will degrade into linear probing? If so, is this container likely to automatically promote it self back to robin hood hashing in the future?

martinus · 2019-08-08T18:06:55Z

well, it's always robin hood hashing, even when there are lots of collisions. Collision handling is always a linear search. When you remove the colliding elements there will be less linear search.

asbai · 2019-08-08T18:32:16Z

If I understand correctly, you means there will be two linear search algorithms, one is we currently used (the fast version), and the other slower one is used to solve the problem of overflow (marker reaches the maximum), right?

So we can now determine that in the new implementation, the faster linear search algorithm may be replaced by the slower one. Then:

Does this replacement occur globally in the specified container or only in some parts of that container where the conflict is more intense?
Is there a reverse process for this replacement? i.e.: If a container conflict is no longer so intense, is it possible for the linear search algorithm to be replaced back to the fast one automatically?

tblut · 2019-12-19T14:24:19Z

Is there any update on this? I'm currently using your excellent implementation for sparse grids. The problem is that sometimes I get an overflow exception, which unfortunately forces me to use the much slower and memory hungry, but perfectly stable, std::unordered_map again ...
I would be very happy if you made some optional setting that makes the map 100% stable. Even if it makes it a little bit slower, it would still be much faster than the STL version!

martinus · 2019-12-19T14:35:06Z

What hash are you using? With a reasonably good hash this should never happen. Can you post just the hash code?

tblut · 2019-12-19T14:40:31Z

I'm using your default hash-function with values computed as follows:

uint32_t cellX = static_cast<uint32_t>((point.x - bounds.min.x) / bounds.size.x * gridResolution);
uint32_t cellY = static_cast<uint32_t>((point.y - bounds.min.y) / bounds.size.y * gridResolution);
uint32_t cellZ = static_cast<uint32_t>((point.z - bounds.min.z) / bounds.size.z * gridResolution);
uint32_t cellIndex = cellX | (cellY << 10) | (cellZ << 20);

auto& entry = grid[cellIndex]; // grid is a robin_hood::unordered_map<uint32_t, ...>

Edit: I just remembered that I "fixed" the exceptions by changing my key to the code above. I used this before and it would cause the exception more often:
cellIndex = cellX + cellY * gridResolution + cellZ * gridResolution * gridResolution

Also pretty interesting is that the overflow occurs when I simply copy all the elements from one map into another:

for (const auto& entry : source.grid) {
    target.grid[entry.first] = entry.second;
}

martinus · 2019-12-19T17:12:21Z

Can you share a reproducer? What compiler are you using, and do you compiler for 64 or 32 bit?

tblut · 2019-12-20T10:19:19Z

I wrote you an email with the link to the minimal reproducer. I'm using Visual Studio 16.4.2 and only compile for 64 bit. Thanks!

martinus · 2019-12-20T19:08:23Z

Thanks for the reproducer! I could reproduce it, here's what's the issue:

for (const auto& entry : sourceNode.grid) {
	targetNode.grid[entry.first] = entry.second;
}

You iterate a large hashmap, and insert all the entries into targetNode.grid map. All hashmaps use the same hash function, so it's basically iterating from lowest to highest hash value.

When inserting, since the targetNode.grid is small, all the elements that are inserted will first cause a collision (because their hashes are so small), so eventually robin_hood throws in the towel and causes an exception. std::unordered_map doesn't throw but insertion will be slow because it's boiling down to a linear search.

There is no ideal way to fix this problem, in your case you can do this in PointCloudConverter.cpp:

void PointCloudConverter::MergeOctreesInternal(Node& targetNode, Node& sourceNode,
		uint32_t depth, uint32_t partitionDepth, uint32_t partitionIndex) {
	targetNode.grid.reserve(targetNode.grid.size() + sourceNode.grid.size());
	for (const auto& entry : sourceNode.grid) {
		targetNode.grid[entry.first] = entry.second;
	}

With that reserve() call the map has already plenty of size for all the data to insert, so it doesn't feel the effect of hash collision due to linear insertion. With that change it works for me.

martinus · 2019-12-20T19:58:17Z

On second thought, this is really an issue of bad quality of the hash. If you replace robin_hood's hash_int with this code, this issue also doesn't happen (then there is also no need for reserve()):

inline size_t hash_int(uint64_t obj) noexcept {
#if ROBIN_HOOD(HAS_UMUL128)
    uint64_t h;
    uint64_t l = detail::umul128(obj ^ UINT64_C(0xa0761d6478bd642f), UINT64_C(0xe7037ed1a0b428db), &h);
    return h ^ l;
#elif ROBIN_HOOD(BITNESS) == 32
    uint64_t const r = obj * UINT64_C(0xca4bcaa75ec3f625);
    auto h = static_cast<uint32_t>(r >> 32U);
    auto l = static_cast<uint32_t>(r);
    return h + l;
#else
    // murmurhash 3 finalizer
    uint64_t h = obj;
    h ^= h >> 33;
    h *= 0xff51afd7ed558ccd;
    h ^= h >> 33;
    h *= 0xc4ceb9fe1a85ec53;
    h ^= h >> 33;
    return static_cast<size_t>(h);
#endif
}

tblut · 2019-12-30T10:14:00Z

Finally came around to trying your other hash function and it seems to work for all the datasets that I have. In my opinion, a fallback to a slower linear search should still be added for absolute stability. The issue is that if my program needs to run for multiple hours, I need to be sure that it won't crash due to a very rare sequence of hashs.

martinus · 2019-12-30T18:20:30Z

You are of course right, a linear search fallback would be important to have. I don't have the time currently for this though. But I'm at least working on an improved hash.

martinus · 2020-03-20T05:59:27Z

I believe I've (finally) a simple & fast reproducer:

TEST_CASE("overflow_finder_simple") {
    robin_hood::unordered_set<uint64_t> set;

    for (uint64_t i = 0; i < UINT64_C(1000000); ++i) {
        auto h = robin_hood::hash<uint64_t>{}(i);
        if (UINT64_C(0) == (h & UINT64_C(0x1ff))) {
            set.insert(i);
        }
    }
}

This shows that the lower bits of the hash are very bad, which is what can lead to the overflow.

mitigation strategy: create an initial very simple mixing hash that's part of the robin_hood table, that basically multiplies the value with a random odd number, and this is then passed on to the robin_hood hash. When overflow happens, add an odd constant to the mix multiplier, and rehash the map.

Advantages/disadvantages of this approach:

no need to refactor the infobyte array, which would produce some slowdowns
Only a single multiplication is needed at hashing => not much performance slowdown.
allows to use the hash's address as initial multipler, to be a bit more resilient against overflow attacks
implementation of rehashing is necessary - but can probably be implemented as a copy of the map.

roncapat · 2020-06-04T14:21:21Z

Sorry to step in, but I have trouble with hashing of grid indexes (so, two ints, I'm working on a discrete spatial grid). Should I be able to fix the error by swapping the hash_int as proposed above? @martinus

I have map overflows exceptions. I tried the above solution but seems to slow down my algorithm.

Do you know more specific and efficient hashing functions for a continuous bidimensional range of integers (AKA grid coordinates)?

cculianu · 2020-12-09T13:43:23Z

I have been getting this exception extremely rarely when inserting sha256 hashed data -- truly random data really, into the hash table (I just take the first 64 bits of the 256 bit number)!!

Is this still some sort of known issue? I have seen this on so-called "good" hash functions, but extremely rarely.

martinus self-assigned this Apr 23, 2019

martinus added the quality quality improvement label Apr 23, 2019

guevara mentioned this issue Oct 18, 2023

Robin Hood Hashing 源码分析 guevara/read-it-later#10115

Open

guevara mentioned this issue Apr 8, 2024

Robin Hood Hashing 源码分析 | SF-Zhou's Blog guevara/read-it-later#11065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate non-throwing overflow resolution #21

Investigate non-throwing overflow resolution #21

martinus commented Apr 23, 2019

felixguendling commented May 3, 2019

asbai commented Aug 4, 2019

martinus commented Aug 4, 2019

asbai commented Aug 5, 2019 •

edited

Loading

martinus commented Aug 8, 2019

asbai commented Aug 8, 2019 •

edited

Loading

tblut commented Dec 19, 2019

martinus commented Dec 19, 2019

tblut commented Dec 19, 2019 •

edited

Loading

martinus commented Dec 19, 2019

tblut commented Dec 20, 2019

martinus commented Dec 20, 2019

martinus commented Dec 20, 2019

tblut commented Dec 30, 2019

martinus commented Dec 30, 2019

martinus commented Mar 20, 2020

roncapat commented Jun 4, 2020 •

edited

Loading

cculianu commented Dec 9, 2020

Investigate non-throwing overflow resolution #21

Investigate non-throwing overflow resolution #21

Comments

martinus commented Apr 23, 2019

felixguendling commented May 3, 2019

asbai commented Aug 4, 2019

martinus commented Aug 4, 2019

asbai commented Aug 5, 2019 • edited Loading

martinus commented Aug 8, 2019

asbai commented Aug 8, 2019 • edited Loading

tblut commented Dec 19, 2019

martinus commented Dec 19, 2019

tblut commented Dec 19, 2019 • edited Loading

martinus commented Dec 19, 2019

tblut commented Dec 20, 2019

martinus commented Dec 20, 2019

martinus commented Dec 20, 2019

tblut commented Dec 30, 2019

martinus commented Dec 30, 2019

martinus commented Mar 20, 2020

roncapat commented Jun 4, 2020 • edited Loading

cculianu commented Dec 9, 2020

asbai commented Aug 5, 2019 •

edited

Loading

asbai commented Aug 8, 2019 •

edited

Loading

tblut commented Dec 19, 2019 •

edited

Loading

roncapat commented Jun 4, 2020 •

edited

Loading