Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

diabonas · 2018-12-14T13:50:23Z

Consider this minimal example:

#include "uthash.h"

struct my_struct {
    int id;
    char name[10];
    UT_hash_handle hh;
};

struct my_struct *users = NULL; 

int main(int argc, char *argv[]) {
	struct my_struct *s;
	s = (struct my_struct *)malloc(sizeof *s);
	s->id = 1;
	HASH_ADD_INT( users, id, s );
}

When running with the Clang 7.0.0 static analyser scan-build, the following warning is generated:

$ scan-build clang example.c 
scan-build: Using '/usr/bin/clang-7' for static analysis
example.c:15:2: warning: Division by zero
        HASH_ADD_INT( users, id, s );
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:383:5: note: expanded from macro 'HASH_ADD_INT'
    HASH_ADD(hh,head,intfield,sizeof(int),add)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:318:3: note: expanded from macro 'HASH_ADD'
  HASH_ADD_KEYPTR(hh, head, &((add)->fieldname), keylen_in, add)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:311:3: note: expanded from macro 'HASH_ADD_KEYPTR'
  HASH_ADD_KEYPTR_BYHASHVALUE(hh, head, keyptr, keylen_in, _ha_hashv, add);      \
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:301:3: note: expanded from macro 'HASH_ADD_KEYPTR_BYHASHVALUE'
  HASH_ADD_TO_BKT((head)->hh.tbl->buckets[_ha_bkt], &(add)->hh);                 \
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:745:8: note: expanded from macro 'HASH_ADD_TO_BKT'
       HASH_EXPAND_BUCKETS((addhh)->tbl);                                        \
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:815:58: note: expanded from macro 'HASH_EXPAND_BUCKETS'
             _he_newbkt->expand_mult = _he_newbkt->count /                       \
                                       ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.

Apparently (tbl)->ideal_chain_maxlen can be zero, leading to undefined behaviour when dividing through it. This seems undesirable, however I do not understand the code well enough to propose a fix. #129 seems to be related and might be an indicator that additional error handling for this case could be required.

As this happens in a macro in the header file uthash.h, downstream projects might be affected by this issue when they use scan-build on a project that includes uthash, see e.g. tpm2-software/tpm2-tss#1229 (comment).

The text was updated successfully, but these errors were encountered:

- Fix "uninitialized value" warning for sessionAttributes in esys-create-session-auth.int.c. - Supress "division by zero" warnings for HASH_ADD_INT in uthash.h, reported upstream as troydhanson/uthash#166. Signed-off-by: Jonas Witschel <[email protected]>

Quuxplusone · 2018-12-14T14:53:43Z

I admit that the existence of #129 does scare me a little.
On the other hand, all previous Clang static-analysis issues have turned out to be false positives: #111, #128.
So I'm not a priori inclined to spend a lot of time digging into this one. If you could provide a compilable test case that actually encounters a division by zero, that would be a very different matter, and I'd consider it a high priority to fix!

The other interesting thing about division (besides that it can have UB) is that it's very slow. I would gladly review a patch that replaced this division with an appropriate right-shift or something like that, as long as it preserved the current high-level asymptotic behaviors.

diabonas · 2018-12-14T15:31:16Z

I agree that static analysis is not always reliable, and I don't immediately see a way to exploit the undefined behaviour. Would using an assertion be an option? Adding something like

assert((tbl)->ideal_chain_maxlen > 0);

after the calculation of ideal_chain_maxlen (and using #include <assert.h>, of course) signals to the analyser that this code path does not need to be tested, thus silencing the warning. As a bonus, should this case actually occur, the error is a bit clearer. Or is this piece of code so time-critical that no additional error-handling should be done there, unless absolutely necessary?

Quuxplusone · 2018-12-14T15:54:57Z

The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any asserts, so adding one would mean adding #include <assert.h> and conceivably breaking some codebase where not-using-<assert.h> was considered a feature. (OTOH, <utlist.h> does use assert already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)

Reading the code comment on ideal_chain_maxlen reassures me that the calculation is (or ought to be) correct:

 * The calculation of tbl->ideal_chain_maxlen below deserves some
 * explanation. First, keep in mind that we're calculating the ideal
 * maximum chain length based on the *new* (doubled) bucket count.
 * In fractions this is just n/b (n=number of items,b=new num buckets).
 * Since the ideal chain length is an integer, we want to calculate
 * ceil(n/b). We don't depend on floating point arithmetic in this
 * hash, so to calculate ceil(n/b) with integers we could write
 *
 *      ceil(n/b) = (n/b) + ((n%b)?1:0)
 *
 * and in fact a previous version of this hash did just that.
 * But now we have improved things a bit by recognizing that b is
 * always a power of two. We keep its base 2 log handy (call it lb),
 * so now we can write this with a bit shift and logical AND:
 *
 *      ceil(n/b) = (n>>lb) + ( (n & (b-1)) ? 1:0)

I wonder, what if we rewrote this as

     (tbl)->ideal_chain_maxlen =                                                  \
-       ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) +                      \
-       ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U);    \
+        ((tbl)->num_items + 2*(tbl)->num_buckets - 1U)                           \
+            >> ((tbl)->log2_num_buckets + 1U)                                    \
     (tbl)->nonideal_items = 0;                                                   \

Would that help the static analyzer at all?

Also, I'm pretty sure that when num_buckets > (1u << 31) (i.e. "way more buckets than we intend to encounter in practice"), we do have ideal_chain_maxlen == 0 (and perhaps undefined behavior on the shift). That's just a fact of math. Does the static analyzer have some way to avoid reporting on that kind of thing? or how do people generally work around that?

tstruk · 2018-12-14T18:06:51Z

@Quuxplusone WRT how do people generally work around that? - simply by adding an explicit check.
Something like the below would do the job:

diff --git a/src/uthash.h b/src/uthash.h
index f34c1f9..27c4ae0 100644
--- a/src/uthash.h
+++ b/src/uthash.h
@@ -937,7 +937,8 @@ do {
         _he_newbkt = &(_he_new_buckets[_he_bkt]);                                \
         if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) {                 \
           (tbl)->nonideal_items++;                                               \
-          _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
+          if ((tbl)->ideal_chain_maxlen)                                         \
+               _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
         }                                                                        \
         _he_thh->hh_prev = NULL;                                                 \
         _he_thh->hh_next = _he_newbkt->hh_head;                                  \

- Fix "uninitialized value" warning for sessionAttributes in esys-create-session-auth.int.c. - Supress "division by zero" warnings for HASH_ADD_INT in uthash.h, reported upstream as troydhanson/uthash#166. Signed-off-by: Jonas Witschel <[email protected]>

diabonas · 2018-12-14T18:56:40Z

The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any asserts, so adding one would mean adding #include <assert.h> and conceivably breaking some codebase where not-using-<assert.h> was considered a feature. (OTOH, <utlist.h> does use assert already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)

That makes sense. Without using assertions, I think a simple if like @tstruk suggested seems to be the best solution.

     (tbl)->ideal_chain_maxlen =                                                  \
-       ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) +                      \
-       ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U);    \
+        ((tbl)->num_items + 2*(tbl)->num_buckets - 1U)                           \
+            >> ((tbl)->log2_num_buckets + 1U)                                    \
     (tbl)->nonideal_items = 0;                                                   \

Would that help the static analyzer at all?

Unfortunately the analyser seems to be pretty dumb when trying to deduce whether ideal_chain_maxlen can be zero: even something as easy as

tbl->ideal_chain_maxlen = (tbl->num_items >> 1U) + (((tbl->num_items & 1U) != 0U) ? 1U : 0U);

is not recognised to be nonzero for every tbl->num_items>0. So I think it's going to be difficult to find an expression that Clang is going to be content with.

Quuxplusone · 2018-12-14T19:35:13Z

Actually, now that I look at this code...

        if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) {                 \
          (tbl)->nonideal_items++;                                               \
          _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
        }

...surely we could just eliminate the division completely by doing something like this: master...Quuxplusone:assert
Would that satisfy Clang static analysis?

diabonas · 2018-12-14T20:02:26Z

@Quuxplusone Great, that does not produce any warnings with scan-build anymore!

Fixes troydhanson#166.

The 2.1.0 version of uthash contains a fix for HASH_ADD_INT macro issue, which trigged a clang static analysis warning. troydhanson/uthash#166 Signed-off-by: Tadeusz Struk <[email protected]>

diabonas changed the title ~~Clang static analyser: "division by zero" zero warning in HASH_EXPAND_BUCKETS~~ Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS Dec 14, 2018

Quuxplusone added a commit to Quuxplusone/uthash that referenced this issue Dec 14, 2018

Remove a division that was troubling Clang static analysis.

df7c81b

Fixes troydhanson#166.

Quuxplusone closed this as completed in 02b0706 Dec 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

tstruk commented Dec 14, 2018

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

diabonas commented Dec 14, 2018

Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

Comments

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

tstruk commented Dec 14, 2018

diabonas commented Dec 14, 2018

Quuxplusone commented Dec 14, 2018

diabonas commented Dec 14, 2018