Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can't create provisioning goal #193

Closed
zktsd813 opened this issue May 27, 2022 · 6 comments
Closed

I can't create provisioning goal #193

zktsd813 opened this issue May 27, 2022 · 6 comments

Comments

@zktsd813
Copy link

With newest ipmctl version, I tried to provision our optane with Appdirect mode.
But, it doesn't work.

#ipmctl show -memoryresources
 MemoryType   | DDR         | PMemModule  | Total
 Volatile     | 320.000 GiB | 0.000 GiB   | 320.000 GiB
 AppDirect    | -           | 0.000 GiB   | 0.000 GiB
 Cache        | 0.000 GiB   | -           | 0.000 GiB
 Inaccessible | 0.000 GiB   | 506.969 GiB | 506.969 GiB
 Physical     | 320.000 GiB | 506.969 GiB | 826.969 GiB

#ipmctl create -goal PersistentMemoryType=AppDirect
The following configuration will be applied:
 SocketID | DimmID | MemorySize | AppDirect1Size | AppDirect2Size
==================================================================
 0x0000   | 0x0010 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0110 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0210 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0310 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
y
Created following region configuration goal
 SocketID | DimmID | MemorySize | AppDirect1Size | AppDirect2Size
==================================================================
 0x0000   | 0x0010 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0110 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0210 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
 0x0000   | 0x0310 | 0.000 GiB  | 126.000 GiB    | 0.000 GiB
A reboot is required to process new memory allocation goals.

After reboot

There is no change in memory resources

# ipmctl show -system pcat
   CreatorRevision: 0x20091013
   ---TableType=0x0
      Length: 16 bytes
      TypeEquals: PlatformCapabilityInfoTable
      PMemModuleMgmtSWConfigInputSupport: 0x1 (Yes)
      MemoryModeCapabilities: 0x7 (1LM, 2LM, AppDirect)
      CurrentMemoryMode: 0x10
         -Current Volatile Memory Mode: 1LM
         -Allowed Persistent Memory Mode: None
         -Allowed Volatile Memory Mode: 1LM or 2LM
      MaxPMInterleaveSets: 0x28
         -Per CPU Die: 0x8
         -Per PMem module: 0x2

OS : Ubuntu 20.04.2 LTS
CPU : Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz
two socket system.

It seems that there is no allowed persistent memory mode.
How can I fix this?

@nolanhergert
Copy link
Contributor

nolanhergert commented May 27, 2022

Interesting, I've never seen that before! Since ipmctl is allowing you to create the goal, the PCAT table value of "None" looks a little funny but is not actually limiting you. It is "AppDirect" on my system.

I would run ipmctl start -diagnostic and potentially ipmctl show -pcd and see what they say about why BIOS is not provisioning those modules.

My guess is that there's a BIOS setting you need to change to allow 1LM provisioning or you don't have the modules in a POR configuration. There might be a knob for the latter located at "Socket Configuration → Memory Configuration → Enforce Population POR", but likely in both cases you'll need to ask your hardware vendor for assistance. Let me know what you find out!

@sscargal
Copy link
Contributor

sscargal commented Jun 1, 2022

@zktsd813 What OEM/ODM server are you using?

For a two socket system, I would expect the PMem modules to be physically installed on both sockets, two on each socket. The output from creating the goal shows all four PMem modules are listed on Socket0 only, so as Nolan alluded to this could be outside the validated configuration and as such, the BIOS may refuse to train the memory correctly. If true, you should see an error/message early in POST and/or in the platform manager logs (BMC, iDRAC, iLO, etc).

I also see the PMem is "Inaccessible" which is similar to the issue discussed in #153. There's a recommendation/suggested action in the last note of that issue from spawnflagger. See if that helps.

@zktsd813
Copy link
Author

zktsd813 commented Jun 2, 2022

@nolanhergert Thanks, I have checked out PMem, Out PMem pass all the test

--Test = Quick
   State = Ok
   Message = The quick health check succeeded.
   --SubTest = Manageability
      State = Ok
   --SubTest = Boot status
      State = Ok
   --SubTest = Health
      State = Ok
      Message.1 = The quick health check detected that the platform FW did not map a region to SPA on PMem module 0x0010. ACPI NFIT NVDIMM State Flags Error Bit 6 Set
      Message.2 = The quick health check detected that the platform FW did not map a region to SPA on PMem module 0x0110. ACPI NFIT NVDIMM State Flags Error Bit 6 Set
      Message.3 = The quick health check detected that the platform FW did not map a region to SPA on PMem module 0x0210. ACPI NFIT NVDIMM State Flags Error Bit 6 Set
      Message.4 = The quick health check detected that the platform FW did not map a region to SPA on PMem module 0x0310. ACPI NFIT NVDIMM State Flags Error Bit 6 Set

--Test = Config
   State = Ok
   Message = The platform configuration check succeeded.
   --SubTest = PMem module specs
      State = Ok
   --SubTest = Duplicate PMem module
      State = Ok
   --SubTest = System Capability
      State = Ok
   --SubTest = Namespace LSA
      State = Ok
   --SubTest = PCD
      State = Ok

--Test = Security
   State = Ok
   Message = The security check succeeded.
   --SubTest = Encryption status
      State = Ok
   --SubTest = Inconsistency
      State = Ok

--Test = FW
   State = Ok 
   Message = The firmware consistency and settings check succeeded.
   --SubTest = FW Consistency
      State = Ok
   --SubTest = Viral Policy
      State = Ok
   --SubTest = Threshold check
      State = Ok
   --SubTest = System Time
      State = Ok

Also, I have checked BIOS and It shows that option Enforce POR is enabled.

@nolanhergert
Copy link
Contributor

I would try disabling the Enforce POR knob if you haven't already and see if that fixes your issue. If not, then maybe you need to enable BIOS logging and see what shows up.

@sscargal
Copy link
Contributor

I agree with Nolan that this is likely to be a BIOS setting. One such setting to check is Advanced -> Memory Configuration -> Volatile Memory Mode = 1LM/2LM/Auto. You want to set this to 'Auto'. If it's currently set to 2LM (Memory Mode), the BIOS will enforce this configuration regardless of what configuration is written to the PMem modules, ie: what you requested with ipmctl create -goal ...

Volatile Memory Mode
Value: 1LM/2LM/Auto
Help Text: Selects whether 1LM or 2LM memory mode. If 2LM Volatile Memory Mode, BIOS will try to configure 2LM but if BIOS is unable to configure 2LM, volatile memory mode will fall back to 1LM. 1LM+2LM will enable the 'DDR Cache' option. When 1LM + 2LM option is selected, the UEFI FW will use the DDR Cache Size option to determine the DDR Cache Side for each populated channel. Any remaining DDR will be mapped as 1LM memory. 

You could reset the BIOS to factory defaults which should allow the BIOS to read the implement the goal configuration written to the PMem modules.

Would you mind running my pmemchk tool to see if it detects anything? At a minimum, it'll collect some data we can use to help troubleshoot. Though it does not collect BIOS information.

$ git clone https://github.com/sscargal/pmemchk
$ cd pmemchk
$ sudo ./pmemchk

This will collect data and analyze it. The collected data will be written to a new directory and the output from the analyzer will show PASS | FAIL | INFO message to STDOUT. An example is in the README. If you encounter issues or errors, please report them.

You'll need to tar.gz the output directory and attach it to this issue, please. Other than 'messages', there should be no user-identifiable data collected.

@zktsd813
Copy link
Author

zktsd813 commented Jul 5, 2022

Thank you for all. I fixed this issue by reset BIOS setting to factory default.

@zktsd813 zktsd813 closed this as completed Jul 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants