-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault running notcurses-info on alpine drone builder #2828
Comments
we also get a warning regarding ffmpeg on alpine: [ 26%] Building C object CMakeFiles/notcurses-static.dir/src/media/ffmpeg.c.o
/home/dank/git/src/media/ffmpeg.c: In function 'ffmpeg_pkt_duration':
/home/dank/git/src/media/ffmpeg.c:45:7: warning: 'pkt_duration' is deprecated [-Wdeprecated-declarations]
45 | return frame->pkt_duration;
| ^~~~~~
In file included from /home/dank/git/src/media/ffmpeg.c:4:
/usr/include/libavutil/frame.h:700:13: note: declared here
700 | int64_t pkt_duration;
| ^~~~~~~~~~~~ |
that warning suggests a pretty old version of libav (<59). is it possible that our runner is simply out of date? uint64_t ffmpeg_pkt_duration(const AVFrame* frame){
#if LIBAVUTIL_VERSION_MAJOR < 59
return frame->pkt_duration;
#else
return frame->duration;
#endif
} btw apparently we ought be using |
the versions installed look reasonable... ffmpeg-dev-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libavcodec-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libavdevice-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libavfilter-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libavformat-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libavutil-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libpostproc-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libswresample-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed]
ffmpeg-libswscale-6.1.2-r1 x86_64 {ffmpeg} (GPL-2.0-or-later AND LGPL-2.1-or-later) [installed] |
doh i was looking at the 2.7 ffmpeg docs, idiot |
/usr/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MAJOR 58
/usr/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MINOR 29
/usr/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MICRO 100 a surprisingly weird libavutil version in the Alpine headers! |
meanwhile, on my /usr/local/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MAJOR 59
/usr/local/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MINOR 53
/usr/local/include/libavutil/version.h:#define LIBAVUTIL_VERSION_MICRO 100 |
looks like we can just relax the version selector; /home/dank/git/build # ./notcurses-info
notcurses 3.0.11 on Kitty 0.38.1 (Linux 6.12.1nlb2)
74 rows (19px) 132 cols (10px) 1406x1320 rgb+8 colors
gcc-14.2.0 (LE)
terminfo 6.5.20241006 libdeflate 1.23 GPM n/a
af+ ab+ sum- vpa+ hpa+ sgr0+ op+ fgop+ bgop+ bce+ rect- 0.3.100
af+ ab+ sum- vpa+ hpa+ sgr0+ op+ fgop+ bgop+ bce+ rect-
bold+ ital+ struck+ ucurl+ uline+ u7+ ccc- rgb+ el+
utf8+ 2x1+ 2x2+ 3x2+ 4x2- 4x2+ img+ vid+ indn+ gpm- kbd+
default fg 0xffffff default bg 0x080808 pmouse+
1st gen rgba pixel animation support 🯁🯂🯃https://notcurses.com
▘▝▀▖▌▞▛▗▚▐▜▄▙▟█⎧ 🬀🬁🬂🬃🬄🬅🬆🬇🬈🬊🬋🬌🬍🬎🬏🬐🬑🬒🬓▌🬔🬕🬖🬗🬘🬙🬚🬛🬜🬝⎫♠♥🯰🯱🯲🯳🯴🯵🯶🯷🯸🯹⅗⅘⅙⅚⅛⎧▕▏⎫┌╥─╥─╥┐🭩⎛⎞
╲╿╱ ◨◧ ◪◩ ◖◗ ⫷⫸ ⎩🬟🬠🬡🬢🬣🬤🬥🬦🬧▐🬨🬩🬪🬫🬬🬭🬮🬯🬰🬱🬲🬳🬴🬵🬶🬷🬸🬹🬺🬻█⎭♦♣¼½¾⅐⅑⅒⅓⅔⅕⅖⅜⅝⅞⅟↉⎪🮇▎⎪├╜╓╫╖╙┤🭫⎜⎟
╾╳╼ ◲◱ ◶◵ 🮣🮠 🮤🮥◜◝ ◿◺ 🮞🮟 ◢◣ ┌┐─ ┏┓━ ╭╮─ ╔╗═ 🭽🭾▁♟♜♞⩘▵△▹▷▿▽◃◁⭡⭣⭠⭢⭧⭩⭦⭨⎪🮈▍⎪├─╨╫╨─┤┇⎜⎟
╱╽╲ ◳◰ ◷◴ 🮡🮢 🮦🮧◟◞ ◹◸ 🮝🮜 ◥◤ └┘│ ┗┛┃ ╰╯│ ╚╝║ 🭼🭿🭵♝♛♚⩗▴⏶⯅▲▸⏵⯈▶▾⏷⯆▼◂⏴⯇◀⎪▐▌⎪╞═╤╬╤═╡┋⎜⎟
⎡⠀⠁⠈⠉⠂⠃⠊⠋⠐⠑⠘⠙⠒⠓⠚⠛⠄⠅⠌⠍⠆⠇⠎⠏⠔⠕⠜⠝⠖⠗⠞⠟⠠⠡⠨⠩⠢⠣⠪⠫⠰⠱⠸⠹⠲⠳⠺⠻⠤⠥⠬⠭⠦⠧⠮⠯⠴⠵⠼⠽⠶⠷⠾⠿⎤⎨🮉▋⎬╞╕╘╬╛╒╡┊⎜⎟
⎢⡀⡁⡈⡉⡂⡃⡊⡋⡐⡑⡘⡙⡒⡓⡚⡛⡄⡅⡌⡍⡆⡇⡎⡏⡔⡕⡜⡝⡖⡗⡞⡟⡠⡡⡨⡩⡢⡣⡪⡫⡰⡱⡸⡹⡲⡳⡺⡻⡤⡥⡬⡭⡦⡧⡮⡯⡴⡵⡼⡽⡶⡷⡾⡿⎥⎪🮊▊⎪└┴─╨─┴┘╏⎝⎠
⎢⢀⢁⢈⢉⢂⢃⢊⢋⢐⢑⢘⢙⢒⢓⢚⢛⢄⢅⢌⢍⢆⢇⢎⢏⢔⢕⢜⢝⢖⢗⢞⢟⢠⢡⢨⢩⢢⢣⢪⢫⢰⢱⢸⢹⢲⢳⢺⢻⢤⢥⢬⢭⢦⢧⢮⢯⢴⢵⢼⢽⢶⢷⢾⢿⎥⎪🮋▉⎪╭──╮⟬⟭╔╗≶≷
⎣⣀⣁⣈⣉⣂⣃⣊⣋⣐⣑⣘⣙⣒⣓⣚⣛⣄⣅⣌⣍⣆⣇⣎⣏⣔⣕⣜⣝⣖⣗⣞⣟⣠⣡⣨⣩⣢⣣⣪⣫⣰⣱⣸⣹⣲⣳⣺⣻⣤⣥⣬⣭⣦⣧⣮⣯⣴⣵⣼⣽⣶⣷⣾⣿⎦⎪██⎪│╭╮│╔═╝║⊆⊇
▔🭶🭷🭸🭹🭺🭻▁ 🭁🭌 🭂🭍 🭃🭎 🭄🭏 🭅🭐 🭆🭑 🭇🬼 🭈🬽 🭉🬾 🭊🬿 🭋🭀 ₀₁₂₃₄₅₆₇₈₉ ⎛ ▁▂▃▄▅▆▇█🭫⎞⎪🭨🭪⎪╰╯││║╔═╝⊴⊵
▏🭰🭱🭲🭳🭴🭵▕ 🭒🭝 🭓🭞 🭔🭟 🭕🭠 🭖🭡 🭧🭜 🭢🭗 🭣🭘 🭤🭙 🭥🭚 🭦🭛 ⁰¹²³⁴⁵⁶⁷⁸⁹ ⎝ ▔🮂🮃▀🮄🮅🮆█🭩⎠⎩🭪🭨⎭⧒⧑╰╯╚╝❨❩⟃⟄
Segmentation fault (core dumped)
/home/dank/git/build # |
note we're reproducing here with Kitty, suggesting this is not term-dependent |
looks like all binaries are segfaulting on alpine, ugh |
Thread 2 "notcurses-info" received signal SIG33, Real-time event 33.
[Switching to LWP 2421]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
warning: 29 src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) bt
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007efff68e3cb5 in __syscall_cp_c (nr=271, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007efff68dbaf7 in ppoll (fds=fds@entry=0x7effed5ff520, n=n@entry=1, to=to@entry=0x0, mask=mask@entry=0x7effed5ff530)
at src/select/ppoll.c:24
#3 0x00007efff68170b1 in block_on_input (ictx=<optimized out>, rtfd=<synthetic pointer>, rifd=<synthetic pointer>)
at /home/dank/git/src/lib/in.c:2551
#4 read_inputs_nblock (ictx=<optimized out>) at /home/dank/git/src/lib/in.c:2589
#5 input_thread (vmarshall=0x7effed871ab0) at /home/dank/git/src/lib/in.c:2620
#6 0x00007efff68e49d2 in start (p=0x7effed5ff620) at src/thread/pthread_create.c:207
#7 0x00007efff68e6314 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
(gdb) |
same in Thread 2 "zalgo" received signal SIG33, Real-time event 33.
[Switching to LWP 2435]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
warning: 29 src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) bt
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007f935e52ecb5 in __syscall_cp_c (nr=271, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007f935e526af7 in ppoll (fds=fds@entry=0x7f935e1bda00, n=n@entry=1, to=to@entry=0x0, mask=mask@entry=0x7f935e1bda10)
at src/select/ppoll.c:24
#3 0x00007f935e4690b1 in block_on_input (ictx=<optimized out>, rtfd=<synthetic pointer>, rifd=<synthetic pointer>)
at /home/dank/git/src/lib/in.c:2551
#4 read_inputs_nblock (ictx=<optimized out>) at /home/dank/git/src/lib/in.c:2589
#5 input_thread (vmarshall=0x7f935e1fab30) at /home/dank/git/src/lib/in.c:2620
#6 0x00007f935e52f9d2 in start (p=0x7f935e1bdb00) at src/thread/pthread_create.c:207
#7 0x00007f935e531314 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
(gdb) |
i wouldn't be shocked if this is due to Musl's implementation of so we're getting hit with Thread 2 "zalgo" received signal SIG33, Real-time event 33.
[Switching to LWP 2455]
__cp_end () at src/thread/x86_64/syscall_cp.s:29
warning: 29 src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) thread apply all bt
Thread 2 (LWP 2455 "zalgo"):
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007efd465c8cb5 in __syscall_cp_c (nr=271, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007efd465c0af7 in ppoll (fds=fds@entry=0x7efd46257a00, n=n@entry=1, to=to@entry=0x0, mask=mask@entry=0x7efd46257a10) at src/select/ppoll.c:24
#3 0x00007efd465030b1 in block_on_input (ictx=<optimized out>, rtfd=<synthetic pointer>, rifd=<synthetic pointer>) at /home/dank/git/src/lib/in.c:2551
#4 read_inputs_nblock (ictx=<optimized out>) at /home/dank/git/src/lib/in.c:2589
#5 input_thread (vmarshall=0x7efd46294b30) at /home/dank/git/src/lib/in.c:2620
#6 0x00007efd465c99d2 in start (p=0x7efd46257b00) at src/thread/pthread_create.c:207
#7 0x00007efd465cb314 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
Thread 1 (LWP 2454 "zalgo"):
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007efd465c8cb5 in __syscall_cp_c (nr=202, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007efd465c8283 in __futex4_cp (addr=0x7efd46257b70, op=128, val=2, to=<optimized out>) at src/thread/__timedwait.c:24
#3 __timedwait_cp (addr=addr@entry=0x7efd46257b70, val=2, clk=clk@entry=0, at=at@entry=0x0, priv=128, priv@entry=1) at src/thread/__timedwait.c:52
#4 0x00007efd465c9cfd in __pthread_timedjoin_np (t=t@entry=0x7efd46257b38, res=res@entry=0x0, at=at@entry=0x0) at src/thread/pthread_join.c:18
#5 0x00007efd465c9d6b in __pthread_join (t=t@entry=0x7efd46257b38, res=res@entry=0x0) at src/thread/pthread_join.c:30
#6 0x00007efd4650527d in cancel_and_join (name=0x7efd46547473 "input", res=0x0, tid=0x7efd46257b38) at /home/dank/git/src/lib/internal.h:1840
#7 stop_inputlayer (ti=ti@entry=0x7efd46299320) at /home/dank/git/src/lib/in.c:2651
#8 0x00007efd46536a7e in free_terminfo_cache (ti=ti@entry=0x7efd46299320) at /home/dank/git/src/lib/termdesc.c:198
#9 0x00007efd46514f75 in notcurses_stop (nc=nc@entry=0x7efd46299020) at /home/dank/git/src/lib/notcurses.c:1466
#10 0x00005652382cf245 in main () at /home/dank/git/src/poc/zalgo.c:29
(gdb) cont
Continuing.
Thread 2 "zalgo" received signal SIGSEGV, Segmentation fault.
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29 in src/thread/x86_64/syscall_cp.s
(gdb) thread apply all bt
Thread 2 (LWP 2455 "zalgo"):
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007efd465c8cb5 in __syscall_cp_c (nr=271, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007efd465c0af7 in ppoll (fds=fds@entry=0x7efd46257a00, n=n@entry=1, to=to@entry=0x0, mask=mask@entry=0x7efd46257a10) at src/select/ppoll.c:24
#3 0x00007efd465030b1 in block_on_input (ictx=<optimized out>, rtfd=<synthetic pointer>, rifd=<synthetic pointer>) at /home/dank/git/src/lib/in.c:2551
#4 read_inputs_nblock (ictx=<optimized out>) at /home/dank/git/src/lib/in.c:2589
#5 input_thread (vmarshall=0x7efd46294b30) at /home/dank/git/src/lib/in.c:2620
#6 0x00007efd465c99d2 in start (p=0x7efd46257b00) at src/thread/pthread_create.c:207
#7 0x00007efd465cb314 in __clone () at src/thread/x86_64/clone.s:22
Backtrace stopped: frame did not save the PC
Thread 1 (LWP 2454 "zalgo"):
#0 __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1 0x00007efd465c8cb5 in __syscall_cp_c (nr=202, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2 0x00007efd465c8283 in __futex4_cp (addr=0x7efd46257b70, op=128, val=2, to=<optimized out>) at src/thread/__timedwait.c:24
#3 __timedwait_cp (addr=addr@entry=0x7efd46257b70, val=2, clk=clk@entry=0, at=at@entry=0x0, priv=128, priv@entry=1) at src/thread/__timedwait.c:52
#4 0x00007efd465c9cfd in __pthread_timedjoin_np (t=t@entry=0x7efd46257b38, res=res@entry=0x0, at=at@entry=0x0) at src/thread/pthread_join.c:18
#5 0x00007efd465c9d6b in __pthread_join (t=t@entry=0x7efd46257b38, res=res@entry=0x0) at src/thread/pthread_join.c:30
#6 0x00007efd4650527d in cancel_and_join (name=0x7efd46547473 "input", res=0x0, tid=0x7efd46257b38) at /home/dank/git/src/lib/internal.h:1840
#7 stop_inputlayer (ti=ti@entry=0x7efd46299320) at /home/dank/git/src/lib/in.c:2651
#8 0x00007efd46536a7e in free_terminfo_cache (ti=ti@entry=0x7efd46299320) at /home/dank/git/src/lib/termdesc.c:198
#9 0x00007efd46514f75 in notcurses_stop (nc=nc@entry=0x7efd46299020) at /home/dank/git/src/lib/notcurses.c:1466
#10 0x00005652382cf245 in main () at /home/dank/git/src/poc/zalgo.c:29
(gdb) |
like close, pthread_join is a resource-deallocation function which is also a cancellation point. the intent of masked cancellation mode is to exempt such functions from failure with ECANCELED.
long __cancel()
{
pthread_t self = __pthread_self();
if (self->canceldisable == PTHREAD_CANCEL_ENABLE || self->cancelasync)
pthread_exit(PTHREAD_CANCELED);
self->canceldisable = PTHREAD_CANCEL_DISABLE;
return -ECANCELED;
}
long __syscall_cp_asm(volatile void *, syscall_arg_t,
syscall_arg_t, syscall_arg_t, syscall_arg_t,
syscall_arg_t, syscall_arg_t, syscall_arg_t);
long __syscall_cp_c(syscall_arg_t nr,
syscall_arg_t u, syscall_arg_t v, syscall_arg_t w,
syscall_arg_t x, syscall_arg_t y, syscall_arg_t z)
{
pthread_t self;
long r;
int st;
if ((st=(self=__pthread_self())->canceldisable)
&& (st==PTHREAD_CANCEL_DISABLE || nr==SYS_close))
return __syscall(nr, u, v, w, x, y, z);
r = __syscall_cp_asm(&self->cancel, nr, u, v, w, x, y, z);
if (r==-EINTR && nr!=SYS_close && self->cancel &&
self->canceldisable != PTHREAD_CANCEL_DISABLE)
r = __cancel();
return r;
}
__syscall_cp_asm:
__cp_begin:
mov (%rdi),%eax
test %eax,%eax
jnz __cp_cancel
mov %rdi,%r11
mov %rsi,%rax
mov %rdx,%rdi
mov %rcx,%rsi
mov %r8,%rdx
mov %r9,%r10
mov 8(%rsp),%r8
mov 16(%rsp),%r9
mov %r11,8(%rsp)
syscall
__cp_end:
ret
__cp_cancel:
jmp __cancel
~ |
hrmmm, how are we getting from |
Azure/azure-sphere-gallery#131 hrrrrrmrmm |
I'm guessing you just have a stack overflow. The segfault at the syscall exit is what I'd expect if the signal delivery event (SIGCANCEL) didn't have space to allocate a stack frame. |
verified that |
hey, thanks for taking a look @richfelker ! btw i figured out the |
and stack overflow sounds very much like a musl vs glibc difference, which is what we're seeing here -- let's proceed under the assumption @richfelker is correct (generally a safe assumption). what big allocations are we making on the input thread? |
oooh i notice we're calling |
commenting out |
// the alternate signal stack is a thread property; any other threads we
// create ought go ahead and install the same alternate signal stack.
void setup_alt_sig_stack(void){
pthread_mutex_lock(&lock);
if(alt_signal_stack.ss_sp){
sigaltstack(&alt_signal_stack, NULL);
}
pthread_mutex_unlock(&lock);
} this smells like a questionable assumption... |
possibly relevant: https://www.openwall.com/lists/musl/2019/03/05/2 |
fwiw i do not see any big allocations on this stack. so that pretty much cements this as altstack-related bad magics imho. proceeding under that assumption. |
@richfelker you might be interested in this alternative (pun unintended) path to the error we're hitting. investigating... |
note that this is happening after reset_term_palette:82:restoring palette via xtpopcolors
notcurses_stop_minimal:174:restored terminal, returning 0
notcurses_drop_planes:1432:we have some planes
ncpile_drop:1413:killing plane 0x7f6c85007e40, next is 0
notcurses_drop_planes:1440:all planes dropped
stop_inputlayer:2650:tearing down input thread
Segmentation fault (core dumped) |
hrmm this is happening after we've called |
if i comment out the |
i can leave the |
so we hit this immediately after seeing the SIG33 used by musl (similarly to glibc) for thread stuff. and we die coming out of the the but everything turns on that musl definitely doesn't seem to be stashing let me do an instruction-level step following receipt of signal 33... |
hrmm oh shit! alternate stack is a per-thread deal, but we're only disabling it in the thread that calls definitely our bug. best way to handle this....? obviously a distinct alternate signal stack per thread would resolve it, but then we'd want each thread to clean up, which would be a pain. we don't want the fatal signal handlers running once session destruction has begun, though. and there's still the question of why thread 2 is handling signal 33 on its alternate signal stack at all, when |
note that we kill the so you could make the argument that we oughtn't be i feel |
The effect of sigaltstack is thread local. If you don't want to have an alt stack for a particular thread, don't call it there. If you do register it there, you can't free the memory until you unregister it. I suspect you have memory corruption, possibly exploitable, on glibc due to UAF. On musl mallocng the memory was unmapped causing the fault. As for why cancellation happens on the alt stack if installed, https://git.musl-libc.org/cgit/musl/commit?id=2e5fff43dd7fc808197744c67cca7908ac19bb4f |
no, did you read the last comment? |
#2828 (comment) <--- i'm mailing the list now, it's definitely figured out. you might not want to apply my fix, though, which is not unreasonable. |
See the linked commit message. POSIX specifically says "available to the implementation". |
fair enough. does it cost musl anything to drop the |
either way, i'll work around it here. thanks for your helpful comments through the course of this debugging session! |
hrmm, can't tell whether https://www.openwall.com/lists/ stopped listing mail after 2024-12-31 or no one has sent mail since then or my mail was silently rejected; waiting on subscription confirmation mail for ten minutes now...ugh =] |
Again, see the linked commit. This was a change specifically requested for a long time by Go and possibly other language projects, where the main stack may be very small because of coroutine-type stuff, and where they rightly feel entitled to use small stacks because they don't intend for any signals to be handled. Having the implementation-internal signals run on the alt stack makes it so they don't crash with stack overflow from what's an implementation-detail of how cancellation and MT setuid etc have to be implemented on Linux. |
so i'm thinking the easiest way to fix this is to disable the fatal signal handlers in |
got it, thanks for the explanation! i'd missed the commit link; i thought you were linking to POSIX for some reason. well, i can work around this easily enough, but IMHO it warrants a mention on https://wiki.musl-libc.org/functional-differences-from-glibc.html. |
...and the alpine runner is now green, excellent |
Looks good. FWIW I think glibc also runs the cancellation signal handler on the alt stack, and only didn't crash because the memory was still mapped. |
i actually don't see a signal when running in gdb on libc, and was wondering if they had changed the implementation since the early days of nptl. cancellation being deferred to explicit points seems to imply a signal-based implementation isn't strictly necessary. |
|
Signal is needed to handle the case where the cancellation request arrives after the flag is checked but before the cancellable syscall happens or while the cancellable syscall is blocked. A clever implementation can probably avoid generating signals except when the target thread is in this state or racing to possibly be in this state; maybe glibc is doing that. |
exactly my thoughts |
i will look into it in thirty minutes or so for completion; in the middle of a command performance for @jart at the moment |
@richfelker I think Go needs this for We haven't seen this issue with glibc because it still does not use |
Ah, that makes sense. |
@fweimer-rh thanks for the explanation, always a pleasure to see you!
ought this be read to imply that glibc intends to use |
See e.g. https://drone.dsscaw.com:4443/dankamongmen/notcurses/10721/4/2. I've brought all our autobuilders back up to speed, but we're not successfully running tests on alpine edge. We're doing fine on the other runners.
The text was updated successfully, but these errors were encountered: