Skip to content

Commit 5982442

Browse files
authored
[mgs] API for ingesting ereports from SPs (#7903)
oxidecomputer/management-gateway-service#370 adds code to the `gateway-messages` and `gateway-sp-comms` crates to implement the MGS side of the ereport ingestion protocol. For more information on the protocol itself, refer to the following RFDs: - [RFD 520 Control Plane Fault Ingestion and Data Model][RFD 520] - [RFD 544 Embedded E-Report Formats][RFD 544] - [RFD 545 Firmware E-Report Aggregation and Evacuation][RFD 545] This branch integrates the changes from those crates into the actual MGS application, as well as adding simulated ereports to the SP simulator. I've added some simple tests based on this. In addition, this branch restructures the initial implementation of the control plane ereport API I added in #7833. That branch proposed a single dropshot API that would be implemented by both sled-agent and MGS. This was possible because the initial design would have indexed all ereport producers (reporters) by a UUID. However, per recent conversations with @cbiffle and @jgallagher, we've determined that Nexus will instead request ereports from service processors indexed by SP physical topology (e.g. type and slot), like the rest of the MGS HTTP API. Therefore, we can no longer have a single HTTP API for ereporters that's implemented by both MGS and sled-agents, and instead, SP ereport ingestion should be a new endpoint on the MGS API. This branch does that, moving the ereport query params into `ereport-types`, eliminating the separate `ereport-api` and `ereport-client` crates, and adding an ereport-ingestion-by-SP-location endpoint to the management gateway API. Furthermore, there are some terminology changes. The ereport protocol has a value which we've variously referred to as an "instance ID", a "generation ID", and a "restart nonce", all of which have unfortunate name collisions that are potentially confusing or just unpleasant. We've agreed to refer to this value everywhere as a "restart ID", so this commit also changes that. [RFD 520]: https://rfd.shared.oxide.computer/rfd/0520 [RFD 544]: https://rfd.shared.oxide.computer/rfd/0544 [RFD 545]: https://rfd.shared.oxide.computer/rfd/0545
1 parent b3ff3f4 commit 5982442

File tree

42 files changed

+1770
-729
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1770
-729
lines changed

Cargo.lock

+63-55
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+6-8
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ members = [
1313
"clients/cockroach-admin-client",
1414
"clients/ddm-admin-client",
1515
"clients/dns-service-client",
16-
"clients/ereport-client",
1716
"clients/gateway-client",
1817
"clients/installinator-client",
1918
"clients/nexus-client",
@@ -53,7 +52,6 @@ members = [
5352
"dns-server",
5453
"dns-server-api",
5554
"end-to-end-tests",
56-
"ereport/api",
5755
"ereport/types",
5856
"gateway",
5957
"gateway-api",
@@ -164,7 +162,6 @@ default-members = [
164162
"clients/cockroach-admin-client",
165163
"clients/ddm-admin-client",
166164
"clients/dns-service-client",
167-
"clients/ereport-client",
168165
"clients/gateway-client",
169166
"clients/installinator-client",
170167
"clients/nexus-client",
@@ -206,7 +203,6 @@ default-members = [
206203
"dns-server",
207204
"dns-server-api",
208205
"end-to-end-tests",
209-
"ereport/api",
210206
"ereport/types",
211207
"gateway",
212208
"gateway-api",
@@ -419,8 +415,6 @@ dpd-client = { git = "https://github.com/oxidecomputer/dendrite" }
419415
dropshot = { version = "0.16.0", features = [ "usdt-probes" ] }
420416
dyn-clone = "1.0.19"
421417
either = "1.14.0"
422-
ereport-api = { path = "ereport/api" }
423-
ereport-client = { path = "clients/ereport-client" }
424418
ereport-types = { path = "ereport/types" }
425419
expectorate = "1.2.0"
426420
fatfs = "0.3.6"
@@ -441,8 +435,10 @@ gateway-client = { path = "clients/gateway-client" }
441435
# is "fine", because SP/MGS communication maintains forwards and backwards
442436
# compatibility, but will mean that faux-mgs might be missing new
443437
# functionality.)
444-
gateway-messages = { git = "https://github.com/oxidecomputer/management-gateway-service", rev = "f9566e68e0a0ccb7c3eeea081ae1cea279c11b2a", default-features = false, features = ["std"] }
445-
gateway-sp-comms = { git = "https://github.com/oxidecomputer/management-gateway-service", rev = "f9566e68e0a0ccb7c3eeea081ae1cea279c11b2a" }
438+
#
439+
gateway-ereport-messages = { git = "https://github.com/oxidecomputer/management-gateway-service", rev = "57536869418e08667824c9a1b2cf115ed91b713f", default-features = false, features = ["debug-impls"] }
440+
gateway-messages = { git = "https://github.com/oxidecomputer/management-gateway-service", rev = "57536869418e08667824c9a1b2cf115ed91b713f", default-features = false, features = ["std"] }
441+
gateway-sp-comms = { git = "https://github.com/oxidecomputer/management-gateway-service", rev = "57536869418e08667824c9a1b2cf115ed91b713f" }
446442
gateway-test-utils = { path = "gateway-test-utils" }
447443
gateway-types = { path = "gateway-types" }
448444
gethostname = "0.5.0"
@@ -647,6 +643,7 @@ scopeguard = "1.2.0"
647643
secrecy = "0.10.0"
648644
semver = { version = "1.0.25", features = ["std", "serde"] }
649645
serde = { version = "1.0", default-features = false, features = [ "derive", "rc" ] }
646+
serde_cbor = "0.11.2"
650647
serde_human_bytes = { git = "https://github.com/oxidecomputer/serde_human_bytes", branch = "main" }
651648
serde_json = "1.0.139"
652649
serde_tokenstream = "0.2"
@@ -743,6 +740,7 @@ wicket-common = { path = "wicket-common" }
743740
wicketd-api = { path = "wicketd-api" }
744741
wicketd-client = { path = "clients/wicketd-client" }
745742
xshell = "0.2.7"
743+
zerocopy = "0.8.25"
746744
zeroize = { version = "1.8.1", features = ["zeroize_derive", "std"] }
747745
zip = { version = "2.6.0", default-features = false, features = ["deflate","bzip2"] }
748746
zone = { version = "0.3.1", default-features = false, features = ["async"] }

0 commit comments

Comments
 (0)