Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overheating likely due to nvme #1551

Closed
9 of 49 tasks
commandline-be opened this issue Dec 12, 2023 · 7 comments
Closed
9 of 49 tasks

overheating likely due to nvme #1551

commandline-be opened this issue Dec 12, 2023 · 7 comments

Comments

@commandline-be
Copy link

Please identify some basic details to help process the report

A. Provide Hardware Details

1. What board are you using (see list of boards here)?

2. Does your computer have a dGPU or is it iGPU-only?

  • dGPU
  • iGPU-only

3. Who installed Heads on this computer?

  • Insurgo
  • Nitrokey
  • Purism
  • Other provider
  • Self-installed

4. What PGP key is being used?

  • Librem Key
  • Nitrokey Pro 2
  • Nitrokey Storage
  • Yubikey
  • Other

5. Are you using the PGP key to provide HOTP verification?

  • Yes
  • No
  • I don't know

B. Identify how the board was flashed

1. Is this problem related to updating heads or flashing it for the first time?

  • First-time flash
  • Updating heads

2. If the problem is related to an update, how did you attempt to apply the update?

  • Using the Heads GUI
  • Flashrom via the Recovery Shell
  • External flashing

3. How was Heads initially flashed

  • External flashing
  • Internal-only / 1vyrain
  • Don't know

4. Was the board flashed with a maximized or non-maximized/legacy rom?

  • Maximized
  • Non-maximized / legacy
  • I don't know

5. If Heads was externally flashed, was IFD unlocked?

  • Yes
  • No
  • Don't know

C. Identify the rom related to this bug report

1. Did you download or build the rom at issue in this bug report?

  • I downloaded it
  • I built it

2. If you downloaded your rom, where did you get it from?

  • Heads CircleCi
  • Purism
  • Nitrokey
  • Somewhere else (please identify)

Please provide the release number or otherwise identify the rom downloaded

3. If you built your rom, which repository:branch did you use?

  • Heads:Master
  • Other (please identify)

4. What version of coreboot did you use in building?

  • 4.8.1 (current default in heads:master)
  • 4.13
  • 4.14
  • 4.15
  • Other (please specify)
  • I don't know

5. In building the rom where did you get the blobs?

  • No blobs required
  • Provided by the company that installed Heads on the device
  • Extracted from a backup rom taken from this device
  • Extracted from another backup rom taken from another device (please identify the board model)
  • Extracted from the online bios using the automated tools provided in Heads
  • I don't know

Please describe the problem

Describe the bug

This is notable for any version of the Nitrokey NV50 firmware (came installed with v2.2)

On multiple occassions the CPU temperature soars significantly (+30° C). On observation this seems highly likely to be related to the nvme storage which seems to suffer on repeat, infrequent disk writes(?) other specific patterns.

Multiple reproducible software were found such as but not limited to: suricata, mailspring (daemon)

To Reproduce
Steps to reproduce the behavior: (suricata)

  1. start the suricata service
  2. observe as temperature soars fast with even low volume disk access (fan kicks in very quickly)
  3. stop the suricata service
  4. observe as temperature and CPU load drop notably

Expected behavior
much less impact on the CPU when using the nvme storage

Screenshots
possible but not provided at this time

Additional context
the particular nvme shows impairment to specific workloads
it is likley the firmware is not optimally configured to work with this nvme
recommendation is to switch to nvme which do not depend on CPU or correct configuration issue(s)

@tlaurion
Copy link
Collaborator

tlaurion commented Dec 12, 2023

@commandline-be cc:@daringer

I'm not sure the symptom you are witnessing is related to nvme at all even if you were to measure heat generated by nvme drive directly. My assumption/hypothesis here is that the issue you are witnessing (heat) is mostly related to CPU having to drive graphics in the absence of proper graphic handling which is related to #1522 which will be merged upstream most probably today.

Other issues related to those changes not being applied (including NK 2.3 release having retracted) were related to CPU overhead linked to graphic not being driven by the GPU, see Nitrokey#25.


I worked with snort/suricata before in my past life.
Let's remember that no copy operations are required, binding IO of network card to pin a CPU and that the more traffic on that network interface, the more IO IN/OUT and captured there is. This is interesting stressing of the whole network IO<->memory, even more if your setup is inline to apply patterns to block packets in streams prior of letting those actually go in or out. In a workstation/laptop setting where other things are happening, as opposed to a inline platform which only does that job, mostly headless normally and controlled remotely, anything else wrongly configured on the machine will trigger IO bottlenech, CPU overhead, heat and problems. Not trying to discourage you of doing such thing, simply saying that unless GPU is doing GPU job, jumping to conclusion that NVME is the cause of your issue seems moslty unrelated knowing that NK 2.2 is having massive issue rendering graphics, which is most probably the real cause of your issues here.

@tlaurion
Copy link
Collaborator

tlaurion commented Dec 12, 2023

Cross-linking duplicate Nitrokey#31

@commandline-be
Copy link
Author

@tlaurion in my experience GPU does work, the CPU load and thus also heating is significantly lower with v2.3 (Nitrokey/heads)

Suricata is not using GPU acceleration on that host. ANY device with frequent disk I/O is causing high CPU load and temperature soaring. This for suricata/collectd/pcscd/mailspring at any stage disk I/O is increased.

Replacing the nvme is a no-go so I'd like to explore how to diagnose better.

@tlaurion
Copy link
Collaborator

tlaurion commented Dec 12, 2023

@tlaurion in my experience GPU does work, the CPU load and thus also heating is significantly lower with v2.3 (Nitrokey/heads)

Suricata is not using GPU acceleration on that host. ANY device with frequent disk I/O is causing high CPU load and temperature soaring. This for suricata/collectd/pcscd/mailspring at any stage disk I/O is increased.

Replacing the nvme is a no-go so I'd like to explore how to diagnose better.

I cannot advise to test heads built rom if you do not have a way to unbrick but as I commented under nk 2.3 past reported issue, neither coreboot nor the Linux config under Heads was doing the right thing. It is corrected under previous referred PR, was tested working by nitrokey under PR and should be tested first prior of digging down nvme probable issue.

2.3 was not ready and released too fast.

@commandline-be
Copy link
Author

okay. I'm aware. Among others my tests had Nitrokey put v2.3 into pre-release. Looking forward to the next release. I'll wait for other to test as I don't want to brick this machine until I have some backup and a restore procedure at hand.

I think I've just found a way to reproduce the heat soaring.

It seems this is due to what seem to be related to threading and the i5-1240P (16-core CPU)

with similar run-time duration

quickly reaching +90° Celsius with either of

stress -d 1
stress -d 2
stress -d 2 --hdd-bytes 4096

while reaching almost 80° Celsius with

stress -d 16

while reaching almost 90° Celsius with

stress -d 12 --hd-bytes 4096
stress -c 8 -d 4 --hdd-bytes 4096

The values are not typo's, just counter linear assumptions.

To me this indicates there may be challenge with what software to use, how it is compiled and how it is configured.

@tlaurion
Copy link
Collaborator

tlaurion commented Dec 12, 2023

okay. I'm aware. Among others my tests had Nitrokey put v2.3 into pre-release. Looking forward to the next release. I'll wait for other to test as I don't want to brick this machine until I have some backup and a restore procedure at hand.

I think I've just found a way to reproduce the heat soaring.

It seems this is due to what seem to be related to threading and the i5-1240P (16-core CPU)

with similar run-time duration

quickly reaching +90° Celsius with either of

stress -d 1 stress -d 2 stress -d 2 --hdd-bytes 4096

while reaching almost 80° Celsius with

stress -d 16

while reaching almost 90° Celsius with

stress -d 12 --hd-bytes 4096 stress -c 8 -d 4 --hdd-bytes 4096

The values are not typo's, just counter linear assumptions.

To me this indicates there may be challenge with what software to use, how it is compiled and how it is configured.

This is interesting and might need some Heads board configuration tuning under KERNEL_ADD statements to correct behavior with proper fan patterns or with some other tweaks.

Just want you to be aware of #1522 (comment)

@tlaurion
Copy link
Collaborator

tlaurion commented Jan 2, 2024

Duplicate of Nitrokey#31.

Reopen here if next 2.4 release doesn't fix the problem, and/or reopen here if #1561 doesn't fix this issue.

@daringer note that Nitrokey#31 (comment) is last comment on the matter.

@tlaurion tlaurion closed this as completed Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants