-
Notifications
You must be signed in to change notification settings - Fork 25
Replace container-stats with golang client #2188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## master #2188 +/- ##
=======================================
Coverage 28.83% 28.83%
=======================================
Files 96 96
Lines 5799 5799
Branches 2551 2551
=======================================
Hits 1672 1672
Misses 3408 3408
Partials 719 719
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Use CRI/Docker API clients to get container stats information, instead of spinnig up a new container (which can fail to mount docker socket and thus provide no information).
4dfd831
to
0518786
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @erthalion - I've reviewed your changes and they look great!
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, I'm a bit worried about the synchronization aspect of the array of stats, though I might be worrying too much.
var stats *pb.ContainerStats | ||
ctx := context.Background() | ||
|
||
stats, err := c.runtimeService.ContainerStats(ctx, containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] This can be a little less verbose
var stats *pb.ContainerStats | |
ctx := context.Background() | |
stats, err := c.runtimeService.ContainerStats(ctx, containerID) | |
stats, err := c.runtimeService.ContainerStats(context.Background(), containerID) |
var stats container.StatsResponse | ||
ctx := context.Background() | ||
|
||
statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Similar to the last comment
var stats container.StatsResponse | |
ctx := context.Background() | |
statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID) | |
var stats container.StatsResponse | |
statsResp, err := d.client.ContainerStatsOneShot(context.Background(), containerID) |
func (d *dockerAPIExecutor) GetContainerStats(containerID string) ( | ||
*ContainerStat, error) { | ||
|
||
var stats container.StatsResponse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We can probably move this line closer to the first use of stats
on line 107.
var stats container.StatsResponse | ||
ctx := context.Background() | ||
|
||
statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ContainerStatsOneShot
return a snapshot of the usage at the time of the call or does it do some sort of aggregation over time? Is the behavior consistent with the CRI counterpart?
I tried to figure it out from reading the docs but I couldn't find an answer, I can dig a little more later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is sparse, but I guess it's the same one-shot as in the API, which just eliminates waiting and returns empty per_cpu:
https://docs.docker.com/reference/api/engine/version/v1.45/#tag/Container/operation/ContainerStats
No aggregation is happening on the client either:
https://github.com/moby/moby/blob/master/client/container_stats.go
if err := decoder.Decode(&stats); err == io.EOF { | ||
return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this situation that the response was empty when we queried for the stats? Do we know what could cause this? Should we return some sort of error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will bubble up an warning "no stats" later. A reason could be e.g. an empty response body, if the stats api was called before Collector container was started.
Mem string | ||
Cpu float64 | ||
stats []executor.ContainerStat | ||
statsCtx context.Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use this same context for the queries to the CRI and docker APIs? That way we can ensure everything is cancelled together.
It would require us to pass it down to the executor for that to work though.
} | ||
|
||
// StopCollector will tear down the collector container and stop | ||
// the MockSensor if it was started. | ||
func (s *IntegrationTestSuiteBase) StopCollector() { | ||
// Stop stats collection first | ||
s.statsCancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the cancellation function does not mean the goroutine has successfully stopped, we might need some additional synchronization mechanism if we want to ensure the stats goroutine stops before collector to prevent gathering bogus data on the last measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other thing that worries me is the concurrency between getting new stat entries appended to the array and actually reading it when the tests are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling the cancellation function does not mean the goroutine has successfully stopped
Why not, the go routine checks the context channel:
case <-s.statsCtx.Done():
return
If the go routine for whatever reason still continues to run, it will get an empty response from the API, and will not add anything to the stats array (at least that's what happened when I caused this by mistake during development).
@@ -187,33 +187,23 @@ func (s *IntegrationTestSuiteBase) RegisterCleanup(containers ...string) { | |||
}) | |||
} | |||
|
|||
func (s *IntegrationTestSuiteBase) GetContainerStats() []ContainerStat { | |||
func (s *IntegrationTestSuiteBase) GetContainerStats() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to rename this method, looks like its behavior has changed significantly. For one, it's called Get...
but it doesn't return anything and it now seems to capture stats in an array, rather than doing a single read like it used to. Maybe something like SnapshotContainerStats
is more adequate?
Description
Currently we use a separate container to capture Collector container
performance stats. This container mounts docker/podman socket and uses
corresponding cli to get the data. But in some cases (e.g. SUSE) we can't mount
the socket, hence the approach doesn't work.
Use CRI/Docker API clients to get container stats information instead. This is
more direct approach, which allows to trim number of containers and overall
tests churn. The stats capturing is implemented as a cancelable background go
routine, which is started/stopped accordingly to start/stop of the Collector
container.
One side effect of this approach is different reporting units. Previously we
were reporting data in human readable form (megabytes or cpu utilization
percentage), now it's going to be more prometheus style -- bytes and millicores
used. Since the idea is to compare stats before different commits, I don't
think it's worth the effort to convert the numbers to keep the original format.
Note that docker go package bump is needed to use some functionality.
On top of that fix small anomaly -- ProcfsScraper test was using executor
object directly, instead of going through a getter.
Checklist
- [ ] Updated documentation accordinglyAutomated testing
- [ ] Added unit tests- [ ] Added integration tests- [ ] Added regression testsCI is sufficient.
Testing Performed
Running TestCollectorStartup and verifying stats.