Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(cluster): split tests to improve parallelism #5383

Closed
wants to merge 1 commit into from

Conversation

ddebko
Copy link
Collaborator

@ddebko ddebko commented Dec 18, 2024

Summary

There are a number of tests in the cluster package that must run sequentially (due to global variables that are being manipulated), but there are also many tests that can run in parallel. The golang sdk does not allow us to configure how we want to run these tests in a specific order. Therefore, I split the tests into two sub packages. This has greatly improved the runtime of the tests, without introducing flaky behaviors.

Before:
Screenshot 2024-12-18 at 2 25 05 PM

After:
Screenshot 2024-12-18 at 2 24 42 PM

Copy link
Collaborator

@johanbrandhorst johanbrandhorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super excited about this change, but I'm a little nervous about running cleanup in goroutines which we are not waiting to finish. I don't think that's safe, generally.

internal/daemon/controller/testing.go Outdated Show resolved Hide resolved
internal/daemon/controller/testing.go Outdated Show resolved Hide resolved
internal/daemon/controller/testing.go Outdated Show resolved Hide resolved
internal/daemon/worker/testing.go Outdated Show resolved Hide resolved
internal/daemon/worker/testing.go Outdated Show resolved Hide resolved
internal/tests/cluster/sequential/session_cleanup_test.go Outdated Show resolved Hide resolved
internal/tests/helper/testing_helper.go Outdated Show resolved Hide resolved
Comment on lines +454 to +481
workerMap := map[string]*worker.TestWorker{}
for _, w := range workers {
workerMap[w.Name()] = w
}
updateTimes.Range(func(k, v any) bool {
require.NotNil(t, k)
require.NotNil(t, v)
if workerMap[k.(string)] == nil {
// We don't remove from updateTimes currently so if we're not
// expecting it we'll see an out-of-date entry
return true
}
assert.WithinDuration(t, time.Now(), v.(time.Time), 30*time.Second)
delete(workerMap, k.(string))
return true
})
assert.Empty(t, workerMap)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the use of this logic. It seems to populate the workerMap with no entries, since len(workers) == 0, then range over all the update times and return true for every entry (since workerMap is empty), then assert that workerMap is empty, which of course it is because we never added anything to it? I must be missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move away from the original implementation, specifically using the sleep command and clearing the worker status update times map.

I feel like we should create a different helper function called RemoveExpectedWorkers. In this function, if no workers are provided then we simply just check if the WorkerStatusUpdateTime is empty. If workers are provided, we invoke the WaitForNextSuccessfulStatusUpdate method for each worker and then check if the WorkerStatusUpdateTime is empty. This should be a more accurate way to validate.

In the original ExpectWorkers, we will fail the test is the worker list is empty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code is a little confusing, but it makes sense to me that ExpectWorkers with no workers would assert that there are no active workers. What does RemoveExpectedWorkers do? It can't just check if WorkerStatusUpdateTime is empty, because it'll contain updates from workers that were previously active. It could clear the map and check for workers, which is what the existing code does. I think the existing code relying on a sleep isn't great, so it'd be good to change that, but do we need another function?

internal/tests/helper/testing_helper.go Outdated Show resolved Hide resolved
@johanbrandhorst johanbrandhorst added this to the 0.19.x milestone Dec 18, 2024
@ddebko ddebko force-pushed the ddebko-optimize-cluster-tests branch 4 times, most recently from ed25a19 to 1b0edba Compare December 19, 2024 04:42
@ddebko ddebko force-pushed the ddebko-optimize-cluster-tests branch from 1b0edba to 849b7c3 Compare December 19, 2024 05:26
@ddebko
Copy link
Collaborator Author

ddebko commented Dec 20, 2024

Work was moved to this PR:
#5390

@ddebko ddebko closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants