-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temperature reporting fix for AMD Zen CPUs #1176
base: main
Are you sure you want to change the base?
Conversation
Reconfigured temperature reporting for AMD Zen Architecture to reflect how libsensors reports temperature Need to test with additional AMD zen architectures namely ones with multiple packages Tested with: Ryzen 3600 on Artix Linux
Use list of k10temp sensor temperatures to populate temperatures for each die package. Assuming that AMD Zen CPU cores are listed by die.
Reports Package temperature across CPU dies for AMD Zen CPU's. Temperature is now selected by package die (Tccd[X]) for temperature reporting and ignores the Tctl reporting. Tested and Validated with: Ryzen 3600 Ryzen Threadripper 3970x
data[i] = data[1]; | ||
/* Check AMD Zen CPUs packages, and mirror temps across packages */ | ||
if (coreTempCount != existingCPUs) { | ||
double temp[coreTempCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is left partially uninitialized if data
contains NAN
values.
Moved conditional check for AMD Zen CPUs after HT/SMT check Added additional conditional check for AMD Zen CPUs to ensure it wont be triggered when no temps are reported Fixed possible garbage value reporting if triggered by another process
|
||
/* No further adjustments */ | ||
goto out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be one level up, because the special-casing should end here regardless of a global temperature found or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the goto statement should stay here and a fallback conditional should be created for the case where coreTempCount is 1 and data[0] has the only value in it.
This check can be added into the current conditional statement, however, from looking at all the other conditional statements, we should have that moved out into its own check so that it remains a fallback and is not coupled with the section specified for AMD Zen CPU temperature reporting. (change idea added below)
I may be needlessly worried about a impossible or niche hardware/platform config.
/* No further adjustments */ | |
goto out; | |
/* No further adjustments */ | |
goto out; | |
} | |
/* AMD Zen CPU can report coreTempCount as 1 - use fallback conditional */ | |
} | |
/* Fallback for single temperature count reporting */ | |
if (coreTempCount == 1 && !isnan(data[0])) { | |
for (unsigned int i = 1; i <= existingCPUs; i++) { | |
data[i] = data[0]; | |
/* No further adjustments */ | |
goto out; | |
} | |
Created fallback for when only one temperature is actually reported. Fixed static analyzers by setting initial value of temp[0]
Oh sorry about that. I was thinking of including that initially.
|
@CyberX5 |
Sure 🙂 ryzen.log |
@CyberX5 What OS/kernel/Distro are you on? From the log, your system seems to be reporting coreTempCount as 3. I am assuming it is counting the Tctl, which so far seems to be an issue not with your CPU, but it may be how htop is dealing with polling the k10temp sensors output on older kernels. If this issue will exist on major versions of the kernel, namely LTS, additional changes may need to be made. |
@JohnsonGlenT Well, I do want to apologize it does seem to be an issue on my end, I'm on Arch Linux, 6.1 LTS Kernel, but I run a custom kernel config and I'm guessing that's the problem, now I do keep that in mind so I did test this on the stock Arch kernel before reporting, but must have messed something up... ryzen-6.1.19-1-lts.log Here's the log from the stock Arch LTS kernel. CoreTempCount is 2 now. Also, I have no idea what config option could be causing this... Haven't seen anything like this in any other program 🤷 Now, I have encountered a different issue. To my understanding the CCD layout of this CPU is 0-5,12-17 = CCD1 and 6-11,18-23 = CCD2. I actually have the columns set to 2 for this reason, so visually 1 row would contain both threads of a core. For example thread 0 and 12 would be core 0. Here's a command I found that lists sibling threads. Found here.
And again sorry about that ❤️ |
My test log above with stable htop that is mostly broken is from I believe Arch Linux with stock 6.2.2 kernel. |
@CyberX5 This is likely a problem. However, if htop reports CPU based on logical numbering, not physical, then the current temperature readout is correct. I will take a look into this, and will update on my findings. |
Ah I see, interesting, didn't think of that. 🙂 |
I have confirmed that htop reads CPU number by physical id. I will update when I have a fix that is ready for testing. |
So, as of doing looking with other zen architectures (threadripper 3970X, file attached) there seems to be no uniform way to ensure mapping of the CPU thread siblings will receive the same temperature reading, except for polling the thread siblings on every update tick. In this case I think mapping the temp by logical process number is a good enough solution without possibly affecting other systems, or needing to overhaul CPU startup/update loop. |
Closes htop-dev#806. Closes htop-dev#1048. Closes htop-dev#1176. Closes htop-dev#1335. Addresses htop-dev#879 (on Linux).
Closes htop-dev#806. Closes htop-dev#1048. Closes htop-dev#1176. Closes htop-dev#1335. Addresses htop-dev#879 (on Linux).
Closes htop-dev#806. Closes htop-dev#1048. Closes htop-dev#1176. Closes htop-dev#1335. Addresses htop-dev#879 (on Linux).
Reports Package temperature across CPU dies for
AMD Zen CPU's. Temperature is now selected by
package die Tccd[X] for for temperature reporting
and ignores the Tctl reporting.
Tested and Validated with:
Ryzen 3600 on Artix Linux
Ryzen Threadripper 3970x on Arch Linux