Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion fail in macOS (#1609 regression) #1611

Closed
Explorer09 opened this issue Feb 20, 2025 · 18 comments
Closed

Assertion fail in macOS (#1609 regression) #1611

Explorer09 opened this issue Feb 20, 2025 · 18 comments
Assignees
Labels
bug 🐛 Something isn't working MacOS 🍏 MacOS / Darwin related issues
Milestone

Comments

@Explorer09
Copy link
Contributor

Explorer09 commented Feb 20, 2025

Just built the release candidate of htop-3.4.0 (b1474aa) on macOS Sequoia, and this is what I got:

>>>>>>>>>> stderr output >>>>>>>>>>
Assertion failed: (0), function processStateChar, file Process.c, line 533.

<<<<<<<<<< stderr output <<<<<<<<<<
Error information:
------------------
A signal 6 (Abort trap: 6) was received.

Backtrace information:
----------------------
0   htop                                0x0000000102ba7d18 CRT_handleSIGSEGV + 288
1   libsystem_platform.dylib            0x000000018d686de4 _sigtramp + 56
2   libsystem_pthread.dylib             0x000000018d64ff70 pthread_kill + 288
3   libsystem_c.dylib                   0x000000018d55c908 abort + 128
4   libsystem_c.dylib                   0x000000018d55bc1c err + 0
5   htop                                0x0000000102bc3520 Process_done.cold.1 + 0
6   htop                                0x0000000102bb35a4 Process_done + 0
7   htop                                0x0000000102bb46ac Row_display + 72
8   htop                                0x0000000102bb119c Panel_draw + 1148
9   htop                                0x0000000102bb6d10 ScreenManager_run + 824
10  htop                                0x0000000102ba574c CommandLine_run + 1804
11  dyld                                0x000000018d2d0274 start + 2840

It's a regression of #1609.

@Explorer09
Copy link
Contributor Author

Explorer09 commented Feb 20, 2025

Trying to diagnose this one by myself. The assertion error happens only when the "Hide userland process threads" option is off (i.e. show the threads), and the error happens on htop's own thread. See the screenshot.

And there is one thing strange with #1609 in which many processes now show with the status "R" while they should be "S" (sleeping) instead. This is shown on the same screenshot.

Image

Cc: @aestriplex

@BenBE BenBE added bug 🐛 Something isn't working MacOS 🍏 MacOS / Darwin related issues labels Feb 20, 2025
@BenBE BenBE added this to the 3.4.0 milestone Feb 20, 2025
@BenBE
Copy link
Member

BenBE commented Feb 20, 2025

Have a look at the value passed to processStateChar in state.

Also, make sure you rebuild clean, to avoid some old object files from interfering.

@Explorer09
Copy link
Contributor Author

@BenBE I did rebuild everything. And by reverting the changes of that PR, htop works again (no assertion error), even though many processes' statuses show "?" instead.

@BenBE
Copy link
Member

BenBE commented Feb 20, 2025

What values do you see for the arguments in ProcessTable_mapDarwinProcessState? Otherwise diagnosing is hard to do …

@Explorer09
Copy link
Contributor Author

What values do you see for the arguments in ProcessTable_mapDarwinProcessState? Otherwise diagnosing is hard to do …

That thread didn't get through ProcessTable_mapDarwinProcessState at all. The function was never called for that thread.

@aestriplex
Copy link
Contributor

There's something I am missing. The function ProcessTable_mapDarwinProcessState is directly called in the ProcessTable_goThroughEntries function, that iterate for every process. is the htop process itself excluded from that loop?

@Explorer09
Copy link
Contributor Author

There's something I am missing. The function ProcessTable_mapDarwinProcessState is directly called in the ProcessTable_goThroughEntries function, that iterate for every process. is the htop process itself excluded from that loop?

@aestriplex It's hard to answer this one. What I see is two entries for htop process itself, one being the process and one being the thread.

Trying to explain this with a tree view (this screenshot shows the process states without the #1609 patch):

Image

aestriplex added a commit to aestriplex/htop that referenced this issue Feb 21, 2025
@fasterit
Copy link
Member

@aestriplex The noise to stderr in c808fd8 is a debugging help for you right?

@aestriplex
Copy link
Contributor

@Explorer09 cc @BenBE
I think I have found something. I committed both the fix and the debug lines.

The main problem is that when you let stop show the threads, it shows a thread with a strange PID, that does not exists. In the sshot attached here, for instance, it has PID 111919, that clearly violates the constraint in Platform_getMaxPid of Darwin machine. This process does not appear on the list of processes retrieved by the syscall, so you want see anything on stderr after

if (pid > Platform_getMaxPid()) {
    fprintf(stderr, "PID %d found\n", pid);
}

I don't know if this process id artificially attached for a reason, but the problem is that it has status code 0. A possible workaround then (the one implemented here) is just to set this->super.state = UNKNOWN; in the process init in DarwinProcess.c.

Image

@aestriplex
Copy link
Contributor

@fasterit Yes, I let this lines as debugging help

@Explorer09
Copy link
Contributor Author

@fasterit @BenBE @aestriplex Even though the assertion error is gone now. I still see two issues with the process table:

  1. It still shows that thread with the strange ID that is greater than Platform_getMaxPid. Although it's no longer a big issue now, the thread's status now shows "?".
  2. The statuses of many processes show as "R", while they should be "S" as they are not busy running. Compare the screen with macOS's top output to get what I mean.

For the second point, should I file another issue for this, or are you already investigating what happened?

Image

@aestriplex
Copy link
Contributor

@Explorer09 The thread with the strange ID shows status '?' because of the this fix, that sets UNKNOWN to the new processes in the init function.

Yes, It shows the running state even for processes that, strictly speaking, should be in state S. Those are the states retrieved by macOS apis though. I got them from the kinfo structure. I tried to get them from the proc_bsdinfo structure as well, and I got the same results

struct proc_bsdinfo pbsd;
if (sizeof(pbsd) == proc_pidinfo(Process_getPid(&proc->super), PROC_PIDTBSDINFO, 0, &pbsd, sizeof(pbsd))) {
    proc->super.state = DarwinProcess_mapDarwinProcessState(pbsd.pbi_status)
}

@BenBE
Copy link
Member

BenBE commented Feb 21, 2025

To investigate why the states all show R we'd need to get the raw state values read from the internal structures. It's likely there is a confusion of which states map to which constants.

Another option is that OSX interprets "running" differently and means "is able to run" while an active "is executing" might yield a different value we currently don't look for.

@aestriplex
Copy link
Contributor

@BenBE For what concerns the kinfo_proc, I do not think that it's a matter of mapping the states, as they results all running. So, even if the state 2 (the one that any kinfo_proc contains) would map to SLEEPING, they would simply result all in a sleep state.

I am investigating better the proc_bsdinfo, that I superficially tried this morning for the fix.

@aestriplex
Copy link
Contributor

I think I came up with something.

  1. kinfo_proc is actually unreliable for what concerns process statuses. The only status that it can identify is SZOMB (i.e. zombie processes). You can find here the list of the available statuses, that were correctly mapped in the switch statement in ProcessTable_mapDarwinProcessState. The problem is that kinfo_proc considers running any non-zombie process that is not stopped.
  2. proc_bsdinfo apparently has the same "problem" than kinfo_proc. All the processes are in running state.
  3. I was looking at the PROC_PIDTASKALLINFO option of proc_pidinfo here, and I noticed that it's already referenced and used by darwin platform-dependent code. With this structure we can discriminate between processes that has at least one running thread (status RUNNING), and processes that has no running threads (status SLEEP), by simply checking the pti_numrunning property. The results are very close to top command.

I also attach here a small program to create zombie processes for debug purposes (zombie.c.zip).

@Explorer09
Copy link
Contributor Author

@aestriplex

  1. kinfo_proc is actually unreliable for what concerns process statuses. The only status that it can identify is SZOMB (i.e. zombie processes). You can find here the list of the available statuses, that were correctly mapped in the switch statement in ProcessTable_mapDarwinProcessState. The problem is that kinfo_proc considers running any non-zombie process that is not stopped.

I noticed that kinfo_proc can detect stopped process as well. The SSTOP value works. The Stopped state is not even shown in macOS's top (it shows as Sleeping instead).

It's easy to test the Stopped state. Start a long running command in the terminal, press ^Z and see it stopped.

aestriplex added a commit to aestriplex/htop that referenced this issue Feb 22, 2025
@aestriplex
Copy link
Contributor

@Explorer09 Thanks, I added the SSTOP case in ProcessTable_goThroughEntries. Should I open a new pull request?

@BenBE
Copy link
Member

BenBE commented Feb 22, 2025

Yes. A new PR for this would be fine. The current workaround mostly re-establishes a non-crashing state; and as I understand does not change much compared to before. With the planned changes we'd actually get an improvement.

BenBE pushed a commit to aestriplex/htop that referenced this issue Feb 25, 2025
This deduces the RUNNING state from the proc_taskinfo structure
by using the number of running threads.
BenBE pushed a commit to aestriplex/htop that referenced this issue Feb 25, 2025
This deduces the RUNNING state from the proc_taskinfo structure
by using the number of running threads.

Fixes: htop-dev#1611
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working MacOS 🍏 MacOS / Darwin related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants