You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
oxidecomputer/management-gateway-service#370 adds code to the
`gateway-messages` and `gateway-sp-comms` crates to implement the MGS
side of the ereport ingestion protocol. For more information on the
protocol itself, refer to the following RFDs:
- [RFD 520 Control Plane Fault Ingestion and Data Model][RFD 520]
- [RFD 544 Embedded E-Report Formats][RFD 544]
- [RFD 545 Firmware E-Report Aggregation and Evacuation][RFD 545]
This branch integrates the changes from those crates into the actual
MGS application, as well as adding simulated ereports to the SP
simulator. I've added some simple tests based on this.
In addition, this branch restructures the initial implementation of
the control plane ereport API I added in #7833. That branch proposed
a single dropshot API that would be implemented by both sled-agent and
MGS. This was possible because the initial design would have indexed all
ereport producers (reporters) by a UUID. However, per recent
conversations with @cbiffle and @jgallagher, we've determined that Nexus
will instead request ereports from service processors indexed by SP
physical topology (e.g. type and slot), like the rest of the MGS HTTP
API. Therefore, we can no longer have a single HTTP API for ereporters
that's implemented by both MGS and sled-agents, and instead, SP ereport
ingestion should be a new endpoint on the MGS API.
This branch does that, moving the ereport query params into
`ereport-types`, eliminating the separate `ereport-api` and
`ereport-client` crates, and adding an ereport-ingestion-by-SP-location
endpoint to the management gateway API.
Furthermore, there are some terminology changes. The ereport
protocol has a value which we've variously referred to as an "instance
ID", a "generation ID", and a "restart nonce", all of which have
unfortunate name collisions that are potentially confusing or just
unpleasant. We've agreed to refer to this value everywhere as a
"restart ID", so this commit also changes that.
[RFD 520]: https://rfd.shared.oxide.computer/rfd/0520
[RFD 544]: https://rfd.shared.oxide.computer/rfd/0544
[RFD 545]: https://rfd.shared.oxide.computer/rfd/0545
0 commit comments