Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS #166

Closed
diabonas opened this issue Dec 14, 2018 · 7 comments
Closed

Comments

@diabonas
Copy link

Consider this minimal example:

#include "uthash.h"

struct my_struct {
    int id;
    char name[10];
    UT_hash_handle hh;
};

struct my_struct *users = NULL; 

int main(int argc, char *argv[]) {
	struct my_struct *s;
	s = (struct my_struct *)malloc(sizeof *s);
	s->id = 1;
	HASH_ADD_INT( users, id, s );
}

When running with the Clang 7.0.0 static analyser scan-build, the following warning is generated:

$ scan-build clang example.c 
scan-build: Using '/usr/bin/clang-7' for static analysis
example.c:15:2: warning: Division by zero
        HASH_ADD_INT( users, id, s );
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:383:5: note: expanded from macro 'HASH_ADD_INT'
    HASH_ADD(hh,head,intfield,sizeof(int),add)
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:318:3: note: expanded from macro 'HASH_ADD'
  HASH_ADD_KEYPTR(hh, head, &((add)->fieldname), keylen_in, add)
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:311:3: note: expanded from macro 'HASH_ADD_KEYPTR'
  HASH_ADD_KEYPTR_BYHASHVALUE(hh, head, keyptr, keylen_in, _ha_hashv, add);      \
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:301:3: note: expanded from macro 'HASH_ADD_KEYPTR_BYHASHVALUE'
  HASH_ADD_TO_BKT((head)->hh.tbl->buckets[_ha_bkt], &(add)->hh);                 \
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:745:8: note: expanded from macro 'HASH_ADD_TO_BKT'
       HASH_EXPAND_BUCKETS((addhh)->tbl);                                        \
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/uthash.h:815:58: note: expanded from macro 'HASH_EXPAND_BUCKETS'
             _he_newbkt->expand_mult = _he_newbkt->count /                       \
                                       ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bug found.

Apparently (tbl)->ideal_chain_maxlen can be zero, leading to undefined behaviour when dividing through it. This seems undesirable, however I do not understand the code well enough to propose a fix. #129 seems to be related and might be an indicator that additional error handling for this case could be required.

As this happens in a macro in the header file uthash.h, downstream projects might be affected by this issue when they use scan-build on a project that includes uthash, see e.g. tpm2-software/tpm2-tss#1229 (comment).

@diabonas diabonas changed the title Clang static analyser: "division by zero" zero warning in HASH_EXPAND_BUCKETS Clang static analyser: "division by zero" warning in HASH_EXPAND_BUCKETS Dec 14, 2018
diabonas added a commit to diabonas/tpm2-tss that referenced this issue Dec 14, 2018
- Fix "uninitialized value" warning for sessionAttributes in
  esys-create-session-auth.int.c.
- Supress "division by zero" warnings for HASH_ADD_INT in uthash.h,
  reported upstream as troydhanson/uthash#166.

Signed-off-by: Jonas Witschel <[email protected]>
@Quuxplusone
Copy link
Collaborator

I admit that the existence of #129 does scare me a little.
On the other hand, all previous Clang static-analysis issues have turned out to be false positives: #111, #128.
So I'm not a priori inclined to spend a lot of time digging into this one. If you could provide a compilable test case that actually encounters a division by zero, that would be a very different matter, and I'd consider it a high priority to fix!

The other interesting thing about division (besides that it can have UB) is that it's very slow. I would gladly review a patch that replaced this division with an appropriate right-shift or something like that, as long as it preserved the current high-level asymptotic behaviors.

@diabonas
Copy link
Author

I agree that static analysis is not always reliable, and I don't immediately see a way to exploit the undefined behaviour. Would using an assertion be an option? Adding something like

assert((tbl)->ideal_chain_maxlen > 0);

after the calculation of ideal_chain_maxlen (and using #include <assert.h>, of course) signals to the analyser that this code path does not need to be tested, thus silencing the warning. As a bonus, should this case actually occur, the error is a bit clearer. Or is this piece of code so time-critical that no additional error-handling should be done there, unless absolutely necessary?

@Quuxplusone
Copy link
Collaborator

The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any asserts, so adding one would mean adding #include <assert.h> and conceivably breaking some codebase where not-using-<assert.h> was considered a feature. (OTOH, <utlist.h> does use assert already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)

Reading the code comment on ideal_chain_maxlen reassures me that the calculation is (or ought to be) correct:

 * The calculation of tbl->ideal_chain_maxlen below deserves some
 * explanation. First, keep in mind that we're calculating the ideal
 * maximum chain length based on the *new* (doubled) bucket count.
 * In fractions this is just n/b (n=number of items,b=new num buckets).
 * Since the ideal chain length is an integer, we want to calculate
 * ceil(n/b). We don't depend on floating point arithmetic in this
 * hash, so to calculate ceil(n/b) with integers we could write
 *
 *      ceil(n/b) = (n/b) + ((n%b)?1:0)
 *
 * and in fact a previous version of this hash did just that.
 * But now we have improved things a bit by recognizing that b is
 * always a power of two. We keep its base 2 log handy (call it lb),
 * so now we can write this with a bit shift and logical AND:
 *
 *      ceil(n/b) = (n>>lb) + ( (n & (b-1)) ? 1:0)

I wonder, what if we rewrote this as

     (tbl)->ideal_chain_maxlen =                                                  \
-       ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) +                      \
-       ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U);    \
+        ((tbl)->num_items + 2*(tbl)->num_buckets - 1U)                           \
+            >> ((tbl)->log2_num_buckets + 1U)                                    \
     (tbl)->nonideal_items = 0;                                                   \

Would that help the static analyzer at all?

Also, I'm pretty sure that when num_buckets > (1u << 31) (i.e. "way more buckets than we intend to encounter in practice"), we do have ideal_chain_maxlen == 0 (and perhaps undefined behavior on the shift). That's just a fact of math. Does the static analyzer have some way to avoid reporting on that kind of thing? or how do people generally work around that?

@tstruk
Copy link

tstruk commented Dec 14, 2018

@Quuxplusone WRT how do people generally work around that? - simply by adding an explicit check.
Something like the below would do the job:

diff --git a/src/uthash.h b/src/uthash.h
index f34c1f9..27c4ae0 100644
--- a/src/uthash.h
+++ b/src/uthash.h
@@ -937,7 +937,8 @@ do {
         _he_newbkt = &(_he_new_buckets[_he_bkt]);                                \
         if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) {                 \
           (tbl)->nonideal_items++;                                               \
-          _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
+          if ((tbl)->ideal_chain_maxlen)                                         \
+               _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
         }                                                                        \
         _he_thh->hh_prev = NULL;                                                 \
         _he_thh->hh_next = _he_newbkt->hh_head;                                  \

tstruk pushed a commit to tpm2-software/tpm2-tss that referenced this issue Dec 14, 2018
- Fix "uninitialized value" warning for sessionAttributes in
  esys-create-session-auth.int.c.
- Supress "division by zero" warnings for HASH_ADD_INT in uthash.h,
  reported upstream as troydhanson/uthash#166.

Signed-off-by: Jonas Witschel <[email protected]>
@diabonas
Copy link
Author

The code is definitely not time-critical (it happens only during reallocation). However, uthash currently doesn't use any asserts, so adding one would mean adding #include <assert.h> and conceivably breaking some codebase where not-using-<assert.h> was considered a feature. (OTOH, <utlist.h> does use assert already. But the userbases of uthash and utlist don't seem to overlap much, from what I've seen.)

That makes sense. Without using assertions, I think a simple if like @tstruk suggested seems to be the best solution.

     (tbl)->ideal_chain_maxlen =                                                  \
-       ((tbl)->num_items >> ((tbl)->log2_num_buckets+1U)) +                      \
-       ((((tbl)->num_items & (((tbl)->num_buckets*2U)-1U)) != 0U) ? 1U : 0U);    \
+        ((tbl)->num_items + 2*(tbl)->num_buckets - 1U)                           \
+            >> ((tbl)->log2_num_buckets + 1U)                                    \
     (tbl)->nonideal_items = 0;                                                   \

Would that help the static analyzer at all?

Unfortunately the analyser seems to be pretty dumb when trying to deduce whether ideal_chain_maxlen can be zero: even something as easy as

tbl->ideal_chain_maxlen = (tbl->num_items >> 1U) + (((tbl->num_items & 1U) != 0U) ? 1U : 0U);

is not recognised to be nonzero for every tbl->num_items>0. So I think it's going to be difficult to find an expression that Clang is going to be content with.

@Quuxplusone
Copy link
Collaborator

Actually, now that I look at this code...

        if (++(_he_newbkt->count) > (tbl)->ideal_chain_maxlen) {                 \
          (tbl)->nonideal_items++;                                               \
          _he_newbkt->expand_mult = _he_newbkt->count / (tbl)->ideal_chain_maxlen; \
        }

...surely we could just eliminate the division completely by doing something like this: master...Quuxplusone:assert
Would that satisfy Clang static analysis?

@diabonas
Copy link
Author

@Quuxplusone Great, that does not produce any warnings with scan-build anymore!

Quuxplusone added a commit to Quuxplusone/uthash that referenced this issue Dec 14, 2018
tstruk added a commit to tpm2-software/tpm2-tss that referenced this issue Jan 7, 2019
The 2.1.0 version of uthash contains a fix for HASH_ADD_INT
macro issue, which trigged a clang static analysis warning.
troydhanson/uthash#166

Signed-off-by: Tadeusz Struk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants