Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

librem_14 ROM misfunction, no display or freeze then CPU hard LOCKUP #1712

Closed
13 tasks done
aluciani opened this issue Jul 5, 2024 · 34 comments
Closed
13 tasks done

librem_14 ROM misfunction, no display or freeze then CPU hard LOCKUP #1712

aluciani opened this issue Jul 5, 2024 · 34 comments
Assignees
Labels

Comments

@aluciani
Copy link

aluciani commented Jul 5, 2024

Please identify some basic details to help process the report

A. Provide Hardware Details

  1. What board are you using? (Choose from the list of boards here)
    librem_14

  2. Does your computer have a dGPU or is it iGPU-only?

    • iGPU-only (Internal GPU, normally Intel GPU)
  3. Who installed Heads on this computer?

    • Self-installed
  4. What PGP key is being used?

    • Nitrokey 3 NFC
  5. Are you using the PGP key to provide HOTP verification?

    • Yes

B. Identify how the board was flashed

  1. Is this problem related to updating heads or flashing it for the first time?

    • Updating heads
  2. If the problem is related to an update, how did you attempt to apply the update?

    • Using the Heads menus
    • External flashing (ch341a_spi)
  3. How was Heads initially flashed?

    • Don't know (purism)
  4. Was the board flashed with a maximized or non-maximized/legacy rom?

    • I don't know (purism)
  5. If Heads was externally flashed, was IFD unlocked?

    • Don't know (purism the first time, make BOARD=librem_14 for the update)

C. Identify the rom related to this bug report

  1. Did you download or build the rom at issue in this bug report?

    • I built it
  2. If you built your rom, which repository:branch did you use?

    • Heads:Master
      heads-librem_14-v0.2.0-2206-gfb9c558.rom
  3. What version of coreboot did you use in building?
    { You can find this information from github commit ID or once flashed, by giving the complete version from Sytem Information under Options --> menu}
    coreboot-purism

  4. In building the rom, where did you get the blobs?

    • Extracted from the online bios using the automated tools provided in Heads

Please describe the problem

Describe the bug
I wanted to update heads on my librem_14 purism.
So I cloned the git repo, built the ROM and put it on a usb key.
I updated the ROM via the GUI (keep settings), rebooted and then got a black screen.
I told myself that I'd removed the USB key too quickly, that the zip file was corrupted.
So I externally flashed the rom with a ch341a_spi programmer.
The result was the same:
A black screen. then the pc rebooted, and displayed a screen that was on but not displayed. Then nothing moves, the pc freezes, or displays nothing.

To Reproduce
Steps to reproduce the behavior:

  1. get a debian 12 system
  2. update the system
    sudo apt update && sudo apt upgrade the system
  3. clone the head repo
    git clone https://github.com/linuxboot/heads
  4. install docker (https://docs.docker.com/engine/install/debian/)
  5. install nix
    [ -d /nix ] || sh <(curl -L https://nixos.org/nix/install) --no-daemon
    . /home/user/.nix-profile/etc/profile.d/nix.sh
  1. build the docker image
    nix build .#dockerImage && docker load < result
  1. jump into the image
docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env
  1. produce the ROM
make BOARD=librem_14
  1. copy the rom on an USB key
  2. update the librem_14 from GUI
    see error

THEN

  1. flash with the ch341a_spi
sudo flashrom -p ch341a_spi -w /home/user/heads/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip
-c <the name of the chip which i don t remember right now>

see error

//
I also tried the old build method on a debian 11 system

  1. boot debian 11 system
  2. clone the repo
  3. make BOARD=librem_14
  4. flash with flashrom and ch341a_spi

Expected behavior
the laptop should print the bootsplash and then the head menu

Screenshots
I can put pictures of the blackscreen if needed ...

Here is the zip file produced by the build :
heads-librem_14-v0.2.0-2206-gfb9c558.zip

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

@aluciani
Copy link
Author

aluciani commented Jul 6, 2024

@123ahaha https://github.com/linuxboot/heads/blob/master/README.md#building-heads was followed?

Yes indeed, first time (GUI update and 1st ch341a_spi) were with the nix. Then i tried to boot an old debian 11, and to just make BOARD=librem_14. Didn't change the result, still the black screen, reboot, then black screen with some backlight.
I also realize when I turn off the librem_14, there is some flash on screen, like bright white screen, I don't know if it's something that can help.

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

@JonathonHall-Purism diffoscope fails on romstage and most of the rom is different?

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions.
By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions.
By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

By doing this, you are using your host buildsystem, not the nix docker built isolated buildsystem.

https://github.com/linuxboot/heads/blob/fb9c558ba4ed4d6a581b05d7e47b883e0f79c04a/README.md

Eg: docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=nitropad-nv41

Or again:
https://github.com/linuxboot/heads/blob/fb9c558ba4ed4d6a581b05d7e47b883e0f79c04a/README.md#pull-docker-hub-image-to-prepare-reproducible-roms-as-circleci-in-one-call

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=x230-hotp-maximized
docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=nitropad-nv41

Which would translate to this for your librem_14 build case:
docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) tlaurion/heads-dev-env:latest -- make BOARD=librem_14

Since you are building latest commit (which incidently is "latest" docker image. Otherwise doc specifies to check Circleci config to get the docker image version on heads used docker image version to match reproducible build output)


To complete your build with self built nix creared docker image, as you intended to do in for that specific Heads commit (observed at the end of your rom name -gXXXXXXXX.zip:

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

@123ahaha : makes sense?


Please suggest changes you would like to see in README.md that would clarify what was missing so no others come to the same problems. Thanks!

@aluciani
Copy link
Author

aluciani commented Jul 6, 2024

produce the ROM\nmake BOARD=librem_14

This is not upstream instructions.

I know I first did the build via nix+docker
The only difference I have is :

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env

I tough this put me inside the docker image with the correct build system.
then i could do

make BOARD=librem_14

from inside the image, am I wrong ?

user@debian$ docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env
bash-5.2# make BOARD=librem_14
----------------------------------------------------------------------
!!!!!! BUILD SYSTEM INFO !!!!!!
System CPUS: 8
System Available Memory: 13909 GB
System Load Average: 0.34
----------------------------------------------------------------------
Used **CPUS**: 8
Used **LOADAVG**: 12
Used **AVAILABLE_MEM_GB**: 13909 GB
----------------------------------------------------------------------
**MAKE_JOBS**: -j8 --load-average=12 

Variables available for override (use 'make VAR_NAME=value'):
**CPUS** (default: number of processors, e.g., 'make CPUS=4')
**LOADAVG** (default: 1.5 times CPUS, e.g., 'make LOADAVG=54')
**AVAILABLE_MEM_GB** (default: memory available on the system in GB, e.g., 'make AVAILABLE_MEM_GB=4')
**MEM_PER_JOB_GB** (default: 1GB per job, e.g., 'make MEM_PER_JOB_GB=2')
----------------------------------------------------------------------
!!!!!! Build starts !!!!!!

... removed the build log

16777216:/home/user/heads/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.rom
bash-5.2# 

Are you sure the way I did is not the same as

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

?

docker run -e DISPLAY=$DISPLAY --network host --rm -ti -v $(pwd):$(pwd) -w $(pwd) linuxboot/heads:dev-env -- make BOARD=librem_14

@123ahaha : makes sense?

I just tried this way, and flashed via ch341a_spi, still got blackscreen with backlight.

Here is the rom
heads-librem_14-v0.2.0-2206-gfb9c558.zip

@aluciani
Copy link
Author

aluciani commented Jul 6, 2024

update :
When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

@tlaurion tlaurion added the bug label Jul 6, 2024
@aluciani aluciani changed the title librem_14 ROM misfunction, no display or freeze librem_14 ROM misfunction, no display or freeze then CPU HARDLOCK Jul 6, 2024
@aluciani aluciani changed the title librem_14 ROM misfunction, no display or freeze then CPU HARDLOCK librem_14 ROM misfunction, no display or freeze then CPU hard LOCKUP Jul 6, 2024
@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

from inside the image, am I wrong ?

Correct. If previous commands ran to generate output and output used to construct your docker image you then ran interactively, your docker image should be reproducible and produce reproducible rom. (I didn't ran diffoscope on your latest one, I leave this for after you confirm expected rom outcome works or not before investing more unpaid time into troubleshooting this further, going into a more straightforward troubleshooting path, hope you don't get it wrong)

@123ahaha
Out of curiosity, if you flash this rom externally (inside of zip) do you get rid of your issue?

https://output.circle-artifacts.com/output/job/b70503c8-c52f-4ff7-b77d-e166f624bd0d/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

update :
When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

Ho. So that's a kernel issue then. Considering coreboot was recently updated, if latest Circleci rom produces same output, I would recommend flashing a rom prior of last coreboot version bump here?

Don you have version info of heads version you were using before internal upgrading?

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

update :
When i let the librem_14 ON long enough i actually have a message :

[249,0789298] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1

The hard LOCKUP also is on CPU 2,3,4,5,6,7,9,10,11

Ho. So that might be a kernel<->coreboot issue then. Considering coreboot was recently updated, if latest Circleci rom produces same behavior, I would recommend flashing a rom prior of last coreboot version bump here?

Do you have version info of heads version you were using before internal upgrading?

@aluciani
Copy link
Author

aluciani commented Jul 6, 2024

I tried the

https://output.circle-artifacts.com/output/job/b70503c8-c52f-4ff7-b77d-e166f624bd0d/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2206-gfb9c558.zip rom

, still same issue

I ll try to get a commit before.
I don't really remember which commit it was. But I'm sure it was before commit 80284ff.

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

My hypothesis then is that you're suffering from last coreboot bump #1703

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot
https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot
https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

You can off course go and use pureboot releases as well, which is supported way from purism support. Heads is a rolling release which OEM decides to support a Heads upstream commit and rebrand (with purism maintaining their coreboot branch).

I would doubt librem_14 doesn't boot from their releases not sure which heads master commit they use, but should probably be written in their Bill Of Material (BOM) in their release page.

@aluciani
Copy link
Author

aluciani commented Jul 6, 2024

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

booting back to normal with this commit
Thanks

Should I let you close the issue ?

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 6, 2024

My hypothesis then is that you're suffering from last coreboot bump #1703

So if my hypothesis is right, this should boot https://output.circle-artifacts.com/output/job/14741774-df0a-4b31-a384-512abcef62a8/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2008-gfd98c8d.zip

booting back to normal with this commit
Thanks

@JonathonHall-Purism something wrong after commit fd98c8d for librem 14, most probably coreboot / config stuff, where soft lockup watchdog would probably kick back later after enough wait, but definitely a regression.

Please pin issue, Afk.

@JonathonHall-Purism
Copy link
Collaborator

Quick update, I'm able to reproduce this and checking it out, thanks for reporting.

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 9, 2024

@JonathonHall-Purism depending on timeline for fix, I propose we revert #1703 as per #1713 PR.

@tlaurion tlaurion pinned this issue Jul 9, 2024
@tlaurion
Copy link
Collaborator

tlaurion commented Jul 9, 2024

Please pin issue, Afk.

Done to assure visibility

@JonathonHall-Purism
Copy link
Collaborator

Agree; commented over there too - I'll test that ROM as soon as it's available from CI. If it boots and I haven't found the actual fix yet, we'll merge it.

@aluciani
Copy link
Author

aluciani commented Jul 9, 2024

Even if you know more or less where the problem comes from, I'd like to add that the ROM produced for a nitropad-nv41 is working (heads-nitropad-nv41-v0.2.0-2206-gfb9c558.zip)

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 9, 2024

Even if you know more or less where the problem comes from, I'd like to add that the ROM produced for a nitropad-nv41 is working (heads-nitropad-nv41-v0.2.0-2206-gfb9c558.zip)

@123ahaha: not related but thanks for the report (I tested nv41 myself, I cannot test librems).
Changeset (reverted changes) can be seen under #1713 which is building under CircleCI at https://app.circleci.com/pipelines/github/tlaurion/heads/2636/workflows/f0bfe047-0fa5-40ab-b636-25b63472794d

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 9, 2024

@aluciani
Copy link
Author

aluciani commented Jul 9, 2024

Internal flashing zip downloadable from https://output.circle-artifacts.com/output/job/bd451c3d-9c9e-4f8f-a2c1-ddc402dacca6/artifacts/0/build/x86/librem_14/heads-librem_14-v0.2.0-2207-gb20cde8.zip for 30 days starting now.

working on my librem 14

@JonathonHall-Purism
Copy link
Collaborator

Bisecting the few commits we had downstream on Release 30 has led me here, to the commit switching to Purism bootsplashes:

https://source.puri.sm/firmware/pureboot/-/commit/7f912babf2aca7af73473b1cd41ca586ebdcc3df

The 24.02.01-Purism-1 change works with this commit, but not on any prior commit. Not sure why though, working on it. I would not have expected a bootsplash to cause CPU lockups, I would have thought it'd either show the bootsplash or fail and go on without it.

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it! It doesn't look like any other boards use 24.02.01 yet.

@aluciani
Copy link
Author

aluciani commented Jul 9, 2024

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 10, 2024

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

I could start version bumping to coreboot 24.03.01 for xx30's t430+x230 on a PR to see if it bricks my x230, on which I can test myself as a start, and then extend per family this time (ivy, then sandbridge, then haswell). This will target testing for coreboot version bump, which in the past takes a lot of time to test per board owners so I will make smaller changes this time I guess.

But nv41 depends on Dasharo's fork which is not upstream under coreboot so that will depend on the version base of next Dasharo release.

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 10, 2024

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

I could start version bumping to coreboot 24.03.01 for xx30's t430+x230 on a PR to see if it bricks my x230, on which I can test myself as a start, and then extend per family this time (ivy, then sandbridge, then haswell). This will target testing for coreboot version bump, which in the past takes a lot of time to test per board owners so I will make smaller changes this time I guess.

But nv41 depends on Dasharo's fork which is not upstream under coreboot so that will depend on the version base of next Dasharo release.

@aluciani :

I'm curious to know if any boards from other vendors would work with coreboot 24.02.01 and the Heads default bootsplash if anybody would like to try it!

I m willing to test it on t430,even on nitropad-nv41..........

It doesn't look like any other boards use 24.02.01 yet.

...... but I cannot make the rom for a t430 with coreboot v24.02.01, I don t really have time to get deep down and to make the hack work

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

AFAIK, the patches that were under patches/coreboot-4.22.01/0001-x230-fhd-variant.patch, relative to edp patch for edp/fhd x230 board variant, was merged upstream and unneeded now. So no more coreboot patches should be maintained downstream under Heads, which is where the actual "hack work" was needed before. Let's see.

@aluciani
Copy link
Author

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

@tlaurion
Copy link
Collaborator

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

Won't change anything for nv41.

@tlaurion
Copy link
Collaborator

The "hack work" is under #1715, which is basically changing strings, hashes of downloaded github artifacts, and regenerating oldconfigs as per 3a93e44 comment. If the roms artifacts don't boot, there is regression on coreboot side between 4.20.01 and 24.02.01.

I ll just wait the CI to build the rom and try it on nitropad-nv41

Won't change anything for nv41.

This affects platfroms in title of #1715: xx30, xx20, xx40, xx41 and qemu q35 coreboot test platforms.

@JonathonHall-Purism
Copy link
Collaborator

Here's what happened. After the switch to the Wuffs JPEG decoder in 24.02.01, the JPEG decoder now needs a "work area" allocated from the heap roughly proportional to the image size.

The Heads bootsplash is much larger than the PureBoot bootsplashes (1024x768 vs. ~672x112). So the PureBoot bootsplashes were fine, but the Heads bootsplashes exceeded the available heap space.

Then, the coreboot allocator left the heap "full" after failing to fulfill a request due to exceeding the heap size. This caused boot to fail entirely after failing to load the bootsplash.

Upstream fixes:

I'm preparing a branch to bump Librems again with the heap size fix.

(We don't need to cherry-pick the malloc fix, as long as the heap size is increased it won't apply.)

@tlaurion
Copy link
Collaborator

tlaurion commented Jul 19, 2024

@JonathonHall-Purism fixed in master, should close?

Was ebd9fba

@JonathonHall-Purism
Copy link
Collaborator

Yes thank you, closing

@tlaurion tlaurion unpinned this issue Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants