diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..4762cf1 --- /dev/null +++ b/404.html @@ -0,0 +1,528 @@ + + + +
+ + + + + + + + + + + + + + +Draft, WIP
+We'd like the collector to show how memory resource contention influences container performance.
+To do that, we'd need to monitor:
+1. Resource contention - can do this with resctrl
, or by monitoring LLC Misses using perf counters
+2. Container performance - current plan is to do this by monitoring CPI (cycles per instruction)
For CPI monitoring, we'd need to have an inventory of containers on the system, and correctly instrument them as they arrive/go. In this issue, we add a component to monitor the arrival and departure of containers in the system.
+If we're focusing on Kubernetes, kubelet provides an HTTP API accessible locally. This appears to be an undocumented, unstable API, that is nevertheless available in kubelet.
+Stack overflow discussion points to a project kubeletctl. The referenced blog post shows several curl
commands to interact with the API. According to the blog post, this is available because the default kubelet configuration allows for anonymous (unauthenticated) requests, so this relies on users not fortifying their systems to this vulnerability. The specific implementation in kubeletctl appears a thin implementation of HTTP calls, so it might be best to reimplement this in our on library rather than take a dependency.
Pros: +- Should provide metadata on Pods, not only containers +- Does not rely on a specific container runtime (docker, containerd, etc.)
+Cons:
+- Undocumented, unstable API
+- Requires access to kubelet, which may not be available in all environments
+- Appears to require polling (no watch
). If so, will react slowly and incur more overhead.
inotify
)This is the method used by Koordinator.sh in its PLEG component. It watches the cgroup root path for each of the Kubernetes QoS classes, for new pod directories. A new pod directory adds that pod subdirectory to a container watcher, which then issues container events.
+Pros: +- Does not require access to kubelet +- Does not depend on a container runtime +- ABI is stable and well-documented +- Supports inotify, which is efficient and low-overhead
+Cons: +- Does not provide metadata beyond the pod and container IDs
+Accepted
+We'd like to run some tests on AWS to check the availability of PMC (Performance Monitoring Counters) and Linux resctrl on different instance types. To do this, we'll want an automated way to run tests on different instance types.
+As of writing, the main check will be cpi-count
, which checks the availability of cycles and instructions, and compares the results of go-perf
and perf
to sanity-check the results.
In the future, we'll want to add more tests and similarly run them on different instance types. For example:
+resctrl
in Linuxresctrl
is able to control memory bandwidth and cache allocationThis decision is about individual, relatively simple checks that run on a single instance. Tests that require complex workloads (e.g., DeathStarBench) are out of scope for this decision.
+Pros:
+Cons:
+This is the strawman: spin up an EC2 instance, install the necessary tools, run the tests, and then tear down the instance. User Data is a way to run commands when the instance is first launched.
+Additional pros:
+Additional cons:
+This spins up an EC2 instance that runs a GitHub Actions runner. The runner is labeled specifically for the test that spins it up. The Action then runs the test workflow on the runner it just spun up. At the end of the test, the workflow tears down the runner.
+Additional pros:
+Additional cons:
+Pros:
+Cons:
+This is the approach the Cilium uses for its EKS conformance tests..
+Additional pros:
+kubectl
).Additional cons:
+actions-runner-controller
following GitHub's "Autoscaling with self-hosted runners":
+I believe we can add a nodeSelector in the AutoscalingRunnerSet from the values.yaml when deploying the controller (under template.spec). So this might require a controller deployment per node type.
+Additional pros:
+Additional cons:
+It is a service that spins up full Kubernetes clusters for testing, and bills by usage.
+Additional pros:
+Additional cons:
+We'll use the EC2 + GitHub Actions Runner approach, because it is the simplest way that returns results and is easy to check for completion.
+