From a6c62330f8b08032434f7cf9587ac5bfb79ffe91 Mon Sep 17 00:00:00 2001 From: Stewart Smith Date: Wed, 28 Mar 2018 15:25:31 +1100 Subject: [PATCH] skiboot 5.10.3 release notes Signed-off-by: Stewart Smith --- doc/release-notes/skiboot-5.10.3.rst | 82 ++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 doc/release-notes/skiboot-5.10.3.rst diff --git a/doc/release-notes/skiboot-5.10.3.rst b/doc/release-notes/skiboot-5.10.3.rst new file mode 100644 index 000000000000..0dc87ed31def --- /dev/null +++ b/doc/release-notes/skiboot-5.10.3.rst @@ -0,0 +1,82 @@ +.. _skiboot-5.10.3: + +============== +skiboot-5.10.3 +============== + +skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces +:ref:`skiboot-5.10.2` as the current stable release in the 5.10.x series. + +It is recommended that 5.10.3 be used instead of any previous 5.10.x version +due to the bug fixes and debugging enhancements in it. + +Over :ref:`skiboot-5.10.2`, we have a few improvements and bug fixes: + +- NPU2: dump NPU2 registers on npu2 HMI + + Due to the nature of debugging npu2 issues, folk are wanting the + full list of NPU2 registers dumped when there's a problem. + + This is different than the solution introduced in 5.10.1 + as there we would dump the registers in a way that would trigger a FIR + bit that would confuse PRD. +- npu2: Add performance tuning SCOM inits + + Peer-to-peer GPU bandwidth latency testing has produced some tunable + values that improve performance. Add them to our device initialization. + + File these under things that need to be cleaned up with nice #defines + for the register names and bitfields when we get time. + + A few of the settings are dependent on the system's particular NVLink + topology, so introduce a helper to determine how many links go to a + single GPU. +- hw/npu2: Assign a unique LPARSHORTID per GPU + + This gets used elsewhere to index items in the XTS tables. +- occ: Set up OCC messaging even if we fail to setup pstates + + This means that we no longer hit this bug if we fail to get valid pstates + from the OCC. :: + + [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear + echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear + [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 + [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 + [ 10.318805] Disabling lock debugging due to kernel taint + [ 10.318808] Severe Machine check interrupt [Not recovered] + [ 10.318812] NIP [000000003003e434]: 0x3003e434 + [ 10.318813] Initiator: CPU + [ 10.318815] Error type: Real address [Load/Store (foreign)] + [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception + [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 + [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 + [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) + [ 10.318826] MSR: 9000000000201002 CR: 48002888 XER: 20040000 + [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 +- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors + + This disables fast reboot in several more cases where serious errors + like lock corruption or call re-entrancy are detected. +- core/opal: allow some re-entrant calls + + This allows a small number of OPAL calls to succeed despite re-entering + the firmware, and rejects others rather than aborting. + + This allows a system reset interrupt that interrupts OPAL to do something + useful. Sreset other CPUs, use the console, which allows xmon to work or + stack traces to be printed, reboot the system. + + Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is + used for many other things that does not mean a serious permanent error. +- core/opal: abort in case of re-entrant OPAL call + + The stack is already destroyed by the time we get here, so there + is not much point continuing. +- npu2: Disable fast reboot + + Fast reboot does not yet work right with the NPU. It's been disabled on + NVLink and OpenCAPI machines. Do the same for NVLink2. + + This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") + from the npu code to npu2.