Skip to content

Replace container-stats with golang client #2188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

erthalion
Copy link
Contributor

@erthalion erthalion commented Jun 20, 2025

Description

Currently we use a separate container to capture Collector container
performance stats. This container mounts docker/podman socket and uses
corresponding cli to get the data. But in some cases (e.g. SUSE) we can't mount
the socket, hence the approach doesn't work.

Use CRI/Docker API clients to get container stats information instead. This is
more direct approach, which allows to trim number of containers and overall
tests churn. The stats capturing is implemented as a cancelable background go
routine, which is started/stopped accordingly to start/stop of the Collector
container.

One side effect of this approach is different reporting units. Previously we
were reporting data in human readable form (megabytes or cpu utilization
percentage), now it's going to be more prometheus style -- bytes and millicores
used. Since the idea is to compare stats before different commits, I don't
think it's worth the effort to convert the numbers to keep the original format.

Note that docker go package bump is needed to use some functionality.

On top of that fix small anomaly -- ProcfsScraper test was using executor
object directly, instead of going through a getter.

Checklist

  • Investigated and inspected CI test results
    - [ ] Updated documentation accordingly

Automated testing
- [ ] Added unit tests
- [ ] Added integration tests
- [ ] Added regression tests

CI is sufficient.

Testing Performed

Running TestCollectorStartup and verifying stats.

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 28.83%. Comparing base (614e6c5) to head (6d7e570).
Report is 3 commits behind head on master.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2188   +/-   ##
=======================================
  Coverage   28.83%   28.83%           
=======================================
  Files          96       96           
  Lines        5799     5799           
  Branches     2551     2551           
=======================================
  Hits         1672     1672           
  Misses       3408     3408           
  Partials      719      719           
Flag Coverage Δ
collector-unit-tests 28.83% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Use CRI/Docker API clients to get container stats information, instead
of spinnig up a new container (which can fail to mount docker socket and
thus provide no information).
@erthalion erthalion force-pushed the feature/container-stats-via-api branch from 4dfd831 to 0518786 Compare June 23, 2025 12:23
@erthalion
Copy link
Contributor Author

/test

@erthalion erthalion marked this pull request as ready for review June 24, 2025 08:28
@erthalion erthalion requested a review from a team as a code owner June 24, 2025 08:28
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @erthalion - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, I'm a bit worried about the synchronization aspect of the array of stats, though I might be worrying too much.

Comment on lines +295 to +298
var stats *pb.ContainerStats
ctx := context.Background()

stats, err := c.runtimeService.ContainerStats(ctx, containerID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] This can be a little less verbose

Suggested change
var stats *pb.ContainerStats
ctx := context.Background()
stats, err := c.runtimeService.ContainerStats(ctx, containerID)
stats, err := c.runtimeService.ContainerStats(context.Background(), containerID)

Comment on lines +98 to +101
var stats container.StatsResponse
ctx := context.Background()

statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Similar to the last comment

Suggested change
var stats container.StatsResponse
ctx := context.Background()
statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID)
var stats container.StatsResponse
statsResp, err := d.client.ContainerStatsOneShot(context.Background(), containerID)

func (d *dockerAPIExecutor) GetContainerStats(containerID string) (
*ContainerStat, error) {

var stats container.StatsResponse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] We can probably move this line closer to the first use of stats on line 107.

var stats container.StatsResponse
ctx := context.Background()

statsResp, err := d.client.ContainerStatsOneShot(ctx, containerID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ContainerStatsOneShot return a snapshot of the usage at the time of the call or does it do some sort of aggregation over time? Is the behavior consistent with the CRI counterpart?

I tried to figure it out from reading the docs but I couldn't find an answer, I can dig a little more later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is sparse, but I guess it's the same one-shot as in the API, which just eliminates waiting and returns empty per_cpu:
https://docs.docker.com/reference/api/engine/version/v1.45/#tag/Container/operation/ContainerStats

No aggregation is happening on the client either:
https://github.com/moby/moby/blob/master/client/container_stats.go

Comment on lines +107 to +108
if err := decoder.Decode(&stats); err == io.EOF {
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this situation that the response was empty when we queried for the stats? Do we know what could cause this? Should we return some sort of error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will bubble up an warning "no stats" later. A reason could be e.g. an empty response body, if the stats api was called before Collector container was started.

Mem string
Cpu float64
stats []executor.ContainerStat
statsCtx context.Context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use this same context for the queries to the CRI and docker APIs? That way we can ensure everything is cancelled together.

It would require us to pass it down to the executor for that to work though.

}

// StopCollector will tear down the collector container and stop
// the MockSensor if it was started.
func (s *IntegrationTestSuiteBase) StopCollector() {
// Stop stats collection first
s.statsCancel()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the cancellation function does not mean the goroutine has successfully stopped, we might need some additional synchronization mechanism if we want to ensure the stats goroutine stops before collector to prevent gathering bogus data on the last measurement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing that worries me is the concurrency between getting new stat entries appended to the array and actually reading it when the tests are done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the cancellation function does not mean the goroutine has successfully stopped

Why not, the go routine checks the context channel:

    case <-s.statsCtx.Done():
        return

If the go routine for whatever reason still continues to run, it will get an empty response from the API, and will not add anything to the stats array (at least that's what happened when I caused this by mistake during development).

@@ -187,33 +187,23 @@ func (s *IntegrationTestSuiteBase) RegisterCleanup(containers ...string) {
})
}

func (s *IntegrationTestSuiteBase) GetContainerStats() []ContainerStat {
func (s *IntegrationTestSuiteBase) GetContainerStats() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to rename this method, looks like its behavior has changed significantly. For one, it's called Get... but it doesn't return anything and it now seems to capture stats in an array, rather than doing a single read like it used to. Maybe something like SnapshotContainerStats is more adequate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants