Replies: 6 comments 8 replies
-
I'm still getting familiarized with the JSON spec. Here's how I'm making sense of the proposal: I conceptualize the LMT hierarchies as System::Link::TorV::Port::Lane::Offset. The testStep corresponds to the Port scope, because a Port has one common test spec, and Ports are tested sequentially. On the other end of the hierarchy, the lowest level of an LMR measurement has the following raw data: step (with +/- direction), status (response payload[7:6]), error count, sample count. That doesn't align with the MeasurementSeriesElement vs. the validator scheme. I see why we have to interpret it into a "bit errors" measurement. Do we plan to keep the raw data in this output? Somehow, I feel it's more appropriate to assign the lane number, instead of the BDF, to the measurementSeriesId. Then each measurementSeriesElement can represent an offset. Overall, the LMT outputs a lot more details that we need to tuck away in the extension fields. The LMR reported capability parameters may go into the MeasurementSeriesStart.Metadata. We should standardize the format for that. |
Beta Was this translation helpful? Give feedback.
-
Summarizing the proposal provided by Dan via email:
Summarizing the discussion happened on 10/5 meeting with Dan (Google), Hua (Google), Francesco (Meta), Leland (Meta), Adrian (Meta), Sathish (Meta):
Next steps:
|
Beta Was this translation helpful? Give feedback.
-
@sksekar, Adrian and I had a 2.5hr meeting on 2024-02-23. Here are my takeaways from the meeting: The spec we are defining is to facilitate the consumer of the diag output to interpret, to process, and to present the test result. The diag themselves may also play a role in processing and presenting the readings from the hardware. The spec should support a variety of LMT diags and use cases. Therefore, it needs to be expandable and unrestrictive as much as possible. Our approach is to each come up with a spec proposal that suits our existing diags: https://github.com/opencomputeproject/ocp-diag-pci_lmt and https://github.com/google/pcie_lmt. We then resolve any conflict between them and find common ground. We have agreed on a hierarchy of the LMT subjects (LMT is the diag. LMR is the PCI-SIG-specified PCIe feature. LMT conducts LMR.):
At the very basic level, an LMR step margin operation returns three readings from the hardware: status, error_count, sample_count. We should start with mapping those to the output spec. This enables raw measurement collection. Here are a few rule-of-thumb I learned about mapping info to the output spec:
Based on the above takeaways from the meeting, here's my proposal. Let's first review the ideas. The details need to be refined. An LMT subject is mapped to a subcomponent. Here's an example :
The name The "location" parser pattern for "PCIELMT-MARGINPOINT-PCI" is
There are three measurements per a
The value is specified by the PCIe spec: 3:NAK; 2:Margining in progress; 1:Set up for margin in progress. 0:Too many errors
There are another set of raw measurements which are the RX LMR parameters read from the lane. These include Those measurements has the RXLANE subcomponent:
Also at this RXLANE level, the diag can output processed measurements, such as I can't think of anything need to be specified at the TestStep level. One consideration is that the amount of measurement output can be a lot if we require the raw measurements. |
Beta Was this translation helpful? Give feedback.
-
Thanks @mimir-d. Some comments/follow-up
SGTM. My preference would be use
Looking closer, I think we can surface these information as part of Device/Margining Capabilities measurement which is already surfaced per Device.
Yes, that's correct. In our case, user is allowed to select the goal (eye-scan vs spot-check) using the config file provided as input.
Yes, there is no restriction in doing both (eye-scan and spot-check) as different steps. |
Beta Was this translation helpful? Give feedback.
-
As discussed, attached is an output example for our discussion: |
Beta Was this translation helpful? Give feedback.
-
We saw the ocp-diag-core-viewer demo a few weeks ago. With that, I tuned the pcie_lmt OCP output the way I'd like to see as a user. This pcie_lmt_ocp.json is a sample output. I can also demo it from the ocp-diag-core-viewer in the meeting. The pcie_lmt can stream the OCP artifacts to a file or a named pipe. Our use case has a diag-runner which creates this OCP pipe and listens to it. It converts the PCIe-domain BDF info to the DUT-specific HW-Info.This way, the pcie_lmt can stay generic and only PCI-SIG-aware. The diag-runner is also generic in the sense that it can run various diags as a sub-process. The pcie_lmt runs parallel TestSteps, each maps to an RX-port. Within each TestStep, the lanes are also running in parallel. I'm counting on the sorting and filtering features of the result viewers. Instead of dumping all the raw measurements, I now only output what "matters". The raw measurements are still dumped in a log for reference. As a user, I'd like to see the interpreted results upfront. So the pcie_lmt outputs eye size, eye corner margin, BER, and/or status. Irrelevant and/or implied information, such as 0-error margin points in an eye-scan, are omitted. Still, there are more info to fit in a Measurement artifact. I'm overloading the |
Beta Was this translation helpful? Give feedback.
-
Objective
Add support for emitting test results compliant with OCP Test and Validation Output Specification from the PCIe LMT Diagnostic tool.
Background
A sample LMT test run performs:
Current Output Format
CSV format
JSON format
Proposal
Current proposal is to use:
Sample Execution
pci_lmt -o ocp config_file
To Be Discussed
Beta Was this translation helpful? Give feedback.
All reactions