Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix up from a bunch of ubsan issues found. #16074

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

donaldsharp
Copy link
Member

@donaldsharp donaldsharp commented May 23, 2024

See individual commits

Fixes: #16755

return ret;
if (module_iter->compiled->rpcs) {
LY_LIST_FOR (&module_iter->compiled->rpcs->node, snode) {
ret = yang_snodes_iterate_subtree(snode, module,
Copy link
Contributor

@choppsv1 choppsv1 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which error is returned for this? Basically this idiom is the C version of inherited types. That is, &rpcs->node where node is an embedded structure (struct lysc_node) at the start of struct lysc_node action. This allows the elimination of unsafe casting (struct lysc_node *)rpc and instead using type-safe checking that an struct lysc_action_node * is also a struct lysc_node * so typeof(&rpc->node) == struct lysc_node *) so &rpc->node requires no cast when trying to use as an struct lysc_node *.

In any case what &foo->bar translates to is the following:

struct foo *foo = 0;
char *result;

result = (char *)foo; /* so result is 0 */
result += offsetof(struct foo, bar);

So there is no NULL ptr dereference.

In the concrete example above when rpcs is NULL, &rpcs->node is also NULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commit message clearly spelled out the message. The sanitizer is claiming that module_iter->compiled->rpcs is NULL thus derefencing ->node is going to cause fun.

bgpd/bgp_attr.c Outdated
unsigned char ndata[peer->max_packet_size];
uint32_t max_packet_size =
atomic_load_explicit(&peer->max_packet_size,
memory_order_relaxed);
Copy link
Contributor

@choppsv1 choppsv1 May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this may mess with the data cache and so could be rather expensive as a result. Can we instead simply guarantee that peer's are allocated with the required alignment to guarantee ((uint32_t)&peer->max_packet_size) % 4) == 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm max_packet_size is actually a uin16_t. This is probably a threading error I guess, and not an alignment one. Can you try changing max_packet_size to int rather than using the atomic function and see if that fixes the error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see how not using a atomic value is going to prevent two different threads from reading/writing to the same data at the same time. In any event I'm just removing this one

Copy link
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good ... waiting on comments from @choppsv1

Copy link
Member

@ton31337 ton31337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fix styling issues?

@MPFuzz
Copy link

MPFuzz commented Sep 7, 2024

Hoping this pr is being merged soon.

@MPFuzz
Copy link

MPFuzz commented Sep 13, 2024

Hi, when will this pr be merged? Thanks

@riw777
Copy link
Member

riw777 commented Sep 24, 2024

looks like there are lint errors and some comments to address yet?

@riw777
Copy link
Member

riw777 commented Oct 22, 2024

anyone still working on this?

Sanitizers are finding:

isisd/isis_spf.c:2122:22: runtime error: index 2 out of bounds for type '_uint64_t [2]'

Comparing the pattern against the rest of the code, 1 should be
subtracted.

Signed-off-by: Donald Sharp <[email protected]>
In fact there are more than several places
that we do in fact pass in NULL to get a NULL
returned.

Signed-off-by: Donald Sharp <[email protected]>
The call into zlog_target_clone was passing
&zcf->active->zt.  From running memory sanitizer
we are seeing that it is complaining about
member access within null pointer of type...
for this value.  Since the call into zlog_target_clone
checks for NULL for this value, let's just make sure
we don't do anything stupid here.

Signed-off-by: Donald Sharp <[email protected]>
lib/yang.c:248:3: runtime error: member access within null pointer of type 'struct lysc_node_action'
lib/yang.c:254:3: runtime error: member access within null pointer of type 'struct lysc_node_notif'

If this data structure happens to ever be moved around we'll start
crashing.  Let's just fix it.

Signed-off-by: Donald Sharp <[email protected]>
Running the mgmt_oper tests under heavy load
would occassionally cause the test to fail.
The retry mechanism would kick in and the
test would have succeeded.  Let's extend
the timer.

Signed-off-by: Donald Sharp <[email protected]>
Getting these messages:
./bgp_lu_topo2.test_bgp_lu2/R1/bgpd.err:bgpd/bgp_labelpool.c:310:3: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:310:3: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:497:5: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
./bgp_lu_topo2.test_bgp_lu2/R4/bgpd.err:bgpd/bgp_labelpool.c:499:5: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

Signed-off-by: Donald Sharp <[email protected]>
Found this problem:
./isis_te_topo1.test_isis_te_topo1/r1/isisd.err:lib/link_state.c:330:8: runtime error: left shift of 1073741824 by 1 places cannot be represented in type 'int'

Clean it up.

Signed-off-by: Donald Sharp <[email protected]>
Found this problem:
./eigrp_topo1.test_eigrp_topo1/r3/eigrpd.err:eigrpd/eigrp_packet.c:1113:35: runtime error: left shift of 192 by 24 places cannot be represented in type 'int'

Fix it.

Signed-off-by: Donald Sharp <[email protected]>
Error message:
./multicast_pim_dr_nondr_test.test_pim_dr_nondr_with_ospf_topo2/r5/zebra.err:zebra/rt_netlink.c:1142:15: runtime error: load of misaligned address 0x7bb40000a064 for type 'long long unsigned int', which requires 8 byte alignment

Fix it.

Signed-off-by: Donald Sharp <[email protected]>
The call to set thread is being used and set at the
same time in various pthreads in the code.  This
should not be happening.  Let's fix it.

Signed-off-by: Donald Sharp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BGPd: member access within null pointer of type 'struct lysc_node_action'
5 participants