Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug assert "filter->addr != 0" trips in trunk_flush_into_bundle() -> trunk_inc_filter(): test_issue_458_mini_destroy_unused_debug_assert test case #570

Open
gapisback opened this issue Apr 17, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@gapisback
Copy link
Collaborator

gapisback commented Apr 17, 2023

The test case splinterdb_stress_test.c:test_issue_458_mini_destroy_unused_debug_assert is currently commented out.

It was added as part of commit SHA f3c92ef to fix issue #545 (under PR #561). The test case is a simple workload of a single client loading 100M short k/v pairs.

Repro has been provided as part of this branch: agurajada/570-filter-addr-ne-0-assert

When enabled, that test runs into the following assertion. Repro'ed on/main @ SHA b2245ac:

#2  __GI___pthread_kill (threadid=140737320392256, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7cfb476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7ce17f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7f665ff in platform_assert_false (filename=0x7ffff7fa477b "src/trunk.c", linenumber=3543,
    functionname=0x7ffff7fa8ef0 <__FUNCTION__.66> "trunk_inc_filter", expr=0x7ffff7fa4d95 "filter->addr != 0",
    message=0x7ffff7fa3fca "") at src/platform_linux/platform.c:377
#6  0x00007ffff7f7f6f7 in trunk_inc_filter (spl=0x7fffb5f8c040, filter=0x7fffcf5902d2) at src/trunk.c:3543
#7  0x00007ffff7f821a5 in trunk_flush_into_bundle (spl=0x7fffb5f8c040, parent=0x7ffff5fce7c0,
    child=0x7ffff5fce620, pdata=0x7fffd36fa58c, req=0x7fffa434f3c0) at src/trunk.c:4102
#8  0x00007ffff7f828e9 in trunk_flush (spl=0x7fffb5f8c040, parent=0x7ffff5fce7c0, pdata=0x7fffd36fa58c,
    is_space_rec=0) at src/trunk.c:4214
#9  0x00007ffff7f82e90 in trunk_flush_fullest (spl=0x7fffb5f8c040, node=0x7ffff5fce7c0) at src/trunk.c:4295
#10 0x00007ffff7f83e5b in trunk_compact_bundle (arg=0x5555558f0c80, scratch_buf=0x7ffff5fd2040)
    at src/trunk.c:4642
#11 0x00007ffff7f72950 in task_group_run_task (group=0x555555575980, assigned_task=0x5555558eec40)
    at src/task.c:475
#12 0x00007ffff7f72ac9 in task_worker_thread (arg=0x555555575980) at src/task.c:514
#13 0x00007ffff7f7209d in task_invoke_with_hooks (func_and_args=0x5555555772c0) at src/task.c:221
#14 0x00007ffff7d4db43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007ffff7ddfa00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
#6  0x00007ffff7f7f6f7 in trunk_inc_filter (spl=0x7fffb5f8c040, filter=0x7fffcf5902d2) at src/trunk.c:3543
3543	   debug_assert(filter->addr != 0);

A possibly related issue that should be investigated as part of this item is that while the workload is running, we see these messages:

Inserted 52 million KV-pairs, this batch: 5 s, 200000 rows/s, cumulative: 311 s, 167202 rows/s ...
Inserted 53 million KV-pairs, this batch: 6 s, 166666 rows/s, cumulative: 318 s, 166666 rows/s ...btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456
btree_pack failed: No space left on device
btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456
btree_pack failed: No space left on device

Inserted 54 million KV-pairs, this batch: 5 s, 200000 rows/s, cumulative: 323 s, 167182 rows/s ...

These messages also do appear with release binary, but the test case seems to succeed. (Of course, it's a debug assert that is tripping.)


Historical note: This specific assertion has been reported and is mixed-up in the annals of issue #545 (bug in routing_filter_prefetch()). That bug has been fixed separately, so I'm peeling off this different failure to its own issue.

@chrisxu333
Copy link

chrisxu333 commented Apr 5, 2024

Hi @gapisback , I also ran into the issue of this message while inserting 20M kv pairs (each 16 bytes large) in a single thread.
btree_pack(): req->num_tuples=6291456 exceeded output size limit, req->max_tuples=6291456 btree_pack failed: No space left on device

Any idea how to address this? Thanks :)

P.S. If I increase the size of kv to 128 bytes (8 bytes key and 120 bytes value), this issue could not be reproduced.

@gapisback
Copy link
Collaborator Author

Thanks for reporting this @chrisxu333 -- I'm afraid that I don't have much more to add.

In your failing repro situation: while inserting 20M kv pairs (each 16 bytes large) in a single thread., can you clarify what were the sizes of the key and the value?

There were some set of known instabilities around trunk bundle mgmt, and at some point (~ 12 months ago) these were discussed internally with Splinter dev engineers. I have since moved on from that project and this repo, so am not able to provide any meaningful suggestions.

Cc:'ing @rtjohnso who is the gate-keepeer for this repo now and may have been doing some work to stabilize some of these areas.

@chrisxu333
Copy link

@gapisback Thanks for your kindly reply and explanation. Regarding the key and value size for the failing scenario, I used 8 byte key and 8 byte value.

Moreover, I also opened a new issue about another likely deadlock bug regarding O_DIRECT that I encountered (#620). It would be really helpful if you or whoever is working on this take a look at your convenience :) Thank you!

@rtjohnso
Copy link
Contributor

rtjohnso commented Apr 7, 2024

I believe I've seen the issue with small kv-pairs before. It is due to an estimate of the maximum number of items that might be in a trunk node:

trunk_cfg->max_tuples_per_node = trunk_cfg->max_kv_bytes_per_node / 32;

It assumes kv-pairs are at least 32 bytes.

You could try changing the divisor from 32 to 16, or you could just pad out your kv-pairs to 32 bytes.

This is a long-term item to fix due to limitations in other parts of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants