-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipmctl create -goal creates regions on wrong numa_nodes #156
Comments
Thanks for the report. We will look into it. |
Hello, Thank you! |
Hi Dimitris, sorry for the delay. It sounds like a BIOS issue, I'm going to look into it internally first. Steven is getting back from sabbatical next week so I'll see what he was trying to do before as well. You should get an update by next Wednesday. |
I tried reproducing this on my 2-socket reference platform with a BIOS from mid-2020 with Fedora 27 and the region creation process is working fine for me. I assume namespaces and formatting would not modify the below:
It is sounding like the SRAT ACPI table specifies these proximity domains, and it's possible your BIOS vendor didn't implement this properly. Can you double check this? |
Hi Nolan, thanks for the quick response.
The memory topology of the system:
This is the
I am not really familiar with the SRAT ACPI table but I found out here that it might be disabled if Node Memory Interleaving is enabled (which I suppose is the case for me). |
I am not sure whether it makes sense or not but in my case the SRAT ACPI table does not mark any region as Non-Volatile. I extracted the SRAT ACPI table information by using:
If you have any further hints or things that I could try out, please let me know. In the meantime, I performed some straighforward benchmark measurements and when I pin a process to the first socket's CPU and access pmem0. the latency is lower than when accessing pmem1. So I can assume that |
Interesting. I did the same steps and found two entries with Non-Volatile set to 1 (true). I assume that corresponds to my two regions. It is mounting the regions on your end I think, or else ndctl wouldn't list them. Perhaps the BIOS vendor is just neglecting to set the proximity domain field? Mine looks to be set correctly as it is changing from region to region and more importantly is listed by ndctl correctly in the end.
|
I will check the PM mappings and their respective entries in the ACPI table and report what I get. |
After looking into that further, I identified these regions in my ACPI table. The
So, this field is not the one that causes this issue in the end. Any further hints would be welcome to solve this weird mystery. |
Ah, interesting. I emailed the developers of ndctl and they'll hopefully chime in here tomorrow. |
The numa_node ndctl is reporting ultimately comes from the ACPI NFIT table's SPA Range structure. |
@dimstav23 - The NFIT Table has pointers to other tables described in 5.2.25. NVDIMM Firmware Interface Table (NFIT)
|
Hi @stellarhopper and @sscargal, From a quick look, I could see that |
Looks like proximity domain is set to 0 - I think this would be a question for the BIOS vendor. |
I think |
Good point. I will get back to them to ask about it. |
The data provided has been provided from the 2-Socket Supermicro system. Can you provide the same data from you 8-Socket system too please? I noted that your 8-Socket system is Cascade Lake and your 2-Socket looks to be Ice Lake (a best guess based on 8 DIMM Slots). These have different BIOS code bases, so it would be very odd for both platforms to have the same BIOS issue. I have several Ice Lake Supermicro systems (X12DPU-6) that work correctly. Q) What's the manufacturer & model of your 2- and 8-Socket system? ( |
Hi @sscargal,
Q2)
Q3) Currently it's not that simple to try a different distro in this machine as it is managed centrally and people are having some workloads. If I find a time-window I'll give it a try to see whether this causes any issue. In the meantime, maybe @altexa could provide his |
@dimstav23 - Sorry for the confusion on the 8-Socket. I have the same Supermicro X12DPU-6 server running Ubuntu 22.04 with 5.15.0-30-generic. My BIOS is older 'Release Date: 04/21/2021', so I'll update it to see if this changes anything. If not, I'll try to install NixOS (minimum) to see if that replicates your issue. |
@dimstav23 There seems to be a BIOS regression problem that needs to be filed with Supermicro. The only thing I changed was updating the BIOS from v1.1 (Release Date: 04/21/2021) to v1.2 (Release Date: 04/21/2021) and I now see the same issue as you. No other changes were made to the OS or PMem config. Working (BIOS v1.1):
Not Working (BIOS v1.2):
Looking at the NFIT, I see PMem modules on both sockets point to the correct SpaRange Table, eg:
/proc/iomem only has one entry rather than multiple that I would expect: From the Supermicro system:
From a working Cascade Lake Fedora host (5.18.0 Kernel)
|
@sscargal Thank you very much for your time and effort. |
Thank you all for your effort. |
@dimstav23 Were you able to get a support ticket created with Supermicro? If needed, Supermicro can engage the Intel BIOS team through an Intel Premier Support (IPS) ticket. |
@sscargal I haven't got in touch with them yet because I was quite busy the past week. I will either write them tomorrow or latest Monday and keep you updated. I'll also mention explicitly the IPS possibility (I am pretty sure they already are aware of it) |
No problem. Thanks for the update. |
I have an 8-socket machine, which is half-populated with DRAM, and half with DCPMM.
When i use ipmctl create -goal to create my regions (and after a reboot!), the regions have all been created, but are 'attached' to numa_node 0. This is incorrect, as they should of course be created on numa_nodes 0 to 7. Note no options were passed to create -goal, as I want the DIMMS to be in AppDirect mode, which is the default.
When I then attempt to use these DCPMM (in appdirect mode, after creating namespaces, and correctly formatting and configuring) with SAP HANA, the use of the namespaces is refused by the database because of the wrong numa_node setting on the regions.
Here is the DIMM layout:
Here are the regions defined:
The regions as seen by ndctl:
Software versions:
And the DCPMM firmware version (I confirm all DIMMS are on the same firmware level):
The numa setup of the machine:
and the CPU assignments:
And finally the error messages I get from HANA (note that the namespace which is on region 0, on numa_node 0 (which is correct) is accepted for use by the DB):
My question: how can I correct the numa_node assignment for the regions? I have not found any way to specify which numa_node I want, and I thought the assignment would be automatic according to the DIMM placements.
Thanks!
The text was updated successfully, but these errors were encountered: