-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t440p: Stability issues: intermittent OS Kernel panics / segfaults on boot, shutdown #1413
Comments
Yes, a picture of the segfault should be minimally provided. This will point into what the kernel was doing at the moment of the fault (driver involved, memory management, etc)
My only intuition there would be about the ram init blob there (MRC.bin borrowed from a Haswell chromebook) and some weird corruption happening in the same regions from coreboot (the CBFS_SIZE has been lowered under 8mb until native ram init is par with blob initialized ram upstream). The difference with libreboot here are: size of CBFS_SIZE (ROM size, kernel) and payload being linux kexec'ing into another kernel (where in your case here, you are booting into gentoo, and where the board config (boards/t440p-maximized.config) is not stating From #692 t400p board owners/testers |
Thank you for your reply and your advice! However I did manage to get a bit of data that is hopefully somewhat helpful - sadly the logs and kernel dumps are completely beyond my understanding.
I hope that this information is of any help, I try to get more kernel dumps and corresponding coreboot logs in the next days in the questionable hope of getting the behavior more often again … |
I only use my t440p for testing, not as a daily driver. I don't even have any networking, sounds, etc plugged in keep it as easy as possible to externally reflash if needed. Hence I have not particularly stress tested my machine. From my own limited testing though, I have not experienced any random kernel panics. I appreciate @akunterkontrolle dosen't feel likely a hardware issue. In difficult to pin down bugs like this, though, would suggest if possible try swapping out the RAM and see if you get the same problem using different sticks. Over the years (not heads specific), most of my kernel panics (apart from when using very bleeding edge software) has either been RAM or thermal issues. I have sometimes had certain RAM sticks only work with certain coreboot versions - hence my suggestion. I presume you have repasted your CPU in the last couple of years and the thermal metrics are within reasonable limits? If your problem persists despite this, I'll try and stress test my T440p and check what happens. (I have used various builds of heads on my T440p. Currently have the version build by CircleCI for the testing branch in #1398 |
@akunterkontrolle have you tried #535 last comment? Please reopen this issue when done. |
Please identify some basic details to help process the report
A. Provide Hardware Details
1. What board are you using (see list of boards here)?
2. Does your computer have a dGPU or is it iGPU-only?
3. Who installed Heads on this computer?
4. What PGP key is being used?
5. Are you using the PGP key to provide HOTP verification?
B. Identify how the board was flashed
1. Is this problem related to updating heads or flashing it for the first time?
2. If the problem is related to an update, how did you attempt to apply the update?
3. How was Heads initially flashed
4. Was the board flashed with a maximized or non-maximized/legacy rom?
5. If Heads was externally flashed, was IFD unlocked?
C. Identify the rom related to this bug report
1. Did you download or build the rom at issue in this bug report?
2. If you downloaded your rom, where did you get it from?
Please provide the release number or otherwise identify the rom downloaded
3. If you built your rom, which repository:branch did you use?
4. What version of coreboot did you use in building?
5. In building the rom where did you get the blobs?
Please describe the problem
Describe the bug
On booting after kexec-ing into the os kernel, their is a ca. 50% chance of the kernel panicing either directly after kexecing before even producing any output or shortly afterwards e.g. directly after entering the disk unlock passphrase. If the init-system starts, there is a good chance of the system booting up normally and then fully working without any problems. On very few occasions even after successfully starting init there is still a chance of a kernel panic or some programs randomly segfaulting.
On some occasions this happens also on shutdown directly before the system should power off, instead it hangs with a kernel panic.
Hardware: Thinkpad t440p without d-gpu. Memory: 16GB, CPU: i7-4810MQ, upgraded Touchpad from t450, upgraded to SATA-SSD.
Current OS: Gentoo with Kernel 6.1.28
I am rather confident that the problem is not faulty hardware or depending on the OS (kernel version). I did not encounter any kernel panics or other weird behavior when running libreboot or skulls. However when using heads all GNU/Linux distros with various kernel versions produced the same behavior - I tried Rocky Linux, Debian 11.6, Devuan Chimaera and Gentoo.
Sadly I can't really provide much more information than that: Especially on startup there is often an os kernel panic, when well, there shouldn't be … The heads kernel doesn't panic at all.
Has any other owner of a t440p with heads experienced this?
To Reproduce
Expected behavior
No kernel panics, normal running OS.
Screenshots
I could try to take a picture with my phone of the parts of the panic message that fit on the screen.
Additional context
Is their any method to get logs after the crash? Obviously the kernel logs are gone since I need to hard power-off the laptop after a kernel panic. On the very few occasions where it proceeded to boot and "only" a few programs segfaulted, I sadly forgot to save any logs.
I think I read somewhere that (coreboot-)logs from previous boots could be extracted from heads, but I couldn't find that information anymore. Without any kind of logs I have a feeling it is impossible to determine what is going wrong.
The text was updated successfully, but these errors were encountered: