-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ipmctl-03.00.00.0468 show AEP memory Non-functional and fail to upgrade AEP Firmware. #206
Comments
Q) Where did the Optane modules come from? The first task is to resolve the "Non-functional" status. This means:
I'll break down the investigation into sub-sections. Optane ModulesPlease ensure the Optane Modules were sourced from a known supplier or originate from a good source. If not, this could be challenging to restore their functionality. If the modules were originally installed in another host, or hosts, a factory reset may be required. CPU SupportThe Intel Optane Modules are Apache Pass (Optane 100). These modules are supported only by the Intel Xeon Cascade Lake (aka 2nd Generation Intel® Xeon® Scalable Processors). I can't determine what CPUs you have in this system as the BIOS is returning an unknown ID: From
The Socket is "LGA3647" which supports Skylake, Cascade Lake, and Xeon Phi. If you have a Skylake CPU, it will not support Optane, which would cause the issue.
ipmctlThe Optane 100 modules were intended to be used with ipmctl version 1.x.x. Optane 200 with ipmctl version 2.x.x, and Optane 300 with ipmctl version 3.x.x. You are using 3.x.x, which may not fully support the 1st generation modules. Try ipmctl version 1.x.x to see if this improves your manageability of the modules. BIOS SupportLook in the BIOS and you should find one or more sub-menus under 'Advanced -> Memory' that allow you to correctly configure the platform in either Memory Mode or App Direct. Your OEM/BIOS vendor determines the exact location, so please take a look at their BIOS manual. This is orthogonal to Some OEMs distributed ipmctl in the UEFI also. Suggested Next Actions
HTH |
@sscargal Thanks. Q) Where did the Optane modules come from?
CPU is Cascade Lake, the CPUID is 050655h in the CPU-Z. And only one cpu is used in the dual cpu motherboards, does it matter? dmidecodeProcessor Information cat /proc/cpuinfo.....
I have try ipmctl version 1.x.x and 2.x.x before, it also failed to use in Non-functional status.
In the BIOS manual, it show the Optane Modules is supported. |
Thanks for the background info, it helps. Given the PMem modules were previously installed in another host, this could be one reason for the current problems. The configuration of the Regions and interleaving is stored on the PMem in the Platform Config Data (PCD), and the BIOS tries to reconstruct this during POST. In the pmemchk output, the ipmctl tool is core dumping when trying to collect this information, so I can't see how many modules should be part of the interleave set. If you don't put all the PMem modules back in the same slots with their interleave friends, the memory training will fail and result in 'Non-Functional' state. Given you don't know how many PMem modules were installed per socket, I recommend trying to Factory Reset them.
I found 3 BIOS manuals under the "Config and Deploy" for your server documentation. Specifically, the H3C Servers Purley Platform Text-Mode BIOS User Guide has what you need in the "Intel(R) Optane(TM) DC Persistent Memory Configuration submenu" on Page 86. I see a 'Secure Erase' option in the Security menu (Page 101). You can try that to see if the DIMMs will Factory Reset. Page 103 shows the 'Regions submenu screen' where you can 'Delete Goal' and 'Create Goal'. You want to create a new goal for 'Memory Mode (2LM)' to switch from the current AppDirect mode. I don't see any staged goals, so the 'Delete Goal' option may not be visible or may not do anything. |
Thanks for the follow up Steve. I would also think a secure erase/factory reset would be the thing to try. |
OK,I will reboot into the bios to check the possibility of Secure Erase/Factory Reset. |
@jlin127 - Have you been able to get your Modules working? if so please leave a comment of what fixed your situation and close this thread. |
@StevenPontsler Sorry, it has some taskes running in the server now and it will still take more than three days. I will return the report as soon as possible after trying Factory Reset. |
I am trying to two AEP memories on a H3C R4700 G3 platform, but it can't be used. All reports of pmemchk is attathed.
ndctl version: 71.1
ipmctl version: 03.00.00.0468
ipmctl show -dimm
DimmID | Capacity | LockState | HealthState | FWVersion
0x0001 | 126.742 GiB | Disabled | Non-functional | 01.00.00.4178
0x0101 | 126.742 GiB | Disabled | Non-functional | 01.00.00.4178
ipmctl show -topology
DimmID | MemoryType | Capacity | PhysicalID| DeviceLocator
0x0001 | Logical Non-Volatile Device | 0.000 GiB | 0x003c | CPU0_A1
0x0101 | Logical Non-Volatile Device | 0.000 GiB | 0x0045 | CPU0_D1
N/A | DDR4 | 0.000 GiB | 0x003b | CPU0_A0
N/A | DDR4 | 16.000 GiB | 0x003e | CPU0_B0
N/A | DDR4 | 16.000 GiB | 0x0041 | CPU0_C0
N/A | DDR4 | 0.000 GiB | 0x0044 | CPU0_D0
N/A | DDR4 | 16.000 GiB | 0x0047 | CPU0_E0
N/A | DDR4 | 16.000 GiB | 0x004a | CPU0_F0
ipmctl show -dimm -sensor
DimmID | Type | CurrentValue
0x0001 | Health | Fatal failure
0x0001 | MediaTemperature | 43C
0x0001 | ControllerTemperature | 46C
0x0001 | PercentageRemaining | 100%
0x0001 | LatchedDirtyShutdownCount | 1
0x0001 | PowerOnTime | 844073s
0x0001 | UpTime | 422022s
0x0001 | PowerCycles | 113
0x0001 | FwErrorCount | 0
0x0001 | UnlatchedDirtyShutdownCount | 42
0x0101 | Health | Fatal failure
0x0101 | MediaTemperature | 47C
0x0101 | ControllerTemperature | 48C
0x0101 | PercentageRemaining | 100%
0x0101 | LatchedDirtyShutdownCount | 0
0x0101 | PowerOnTime | 439924s
0x0101 | UpTime | 422025s
0x0101 | PowerCycles | 32
0x0101 | FwErrorCount | 0
0x0101 | UnlatchedDirtyShutdownCount | 7
Then, I try to upgrade AEP Firmware. It also fail.
ipmctl load -source ./fw_ekvb0_1.2.0.5446_rel.bin -dimm
Starting update on 2 PMem module(s)...
pmemchk-log.zip
Load FW failed: Error 2 - Command not run
The text was updated successfully, but these errors were encountered: