test(cluster): split tests to improve parallelism #5383

ddebko · 2024-12-18T22:14:48Z

Summary

There are a number of tests in the cluster package that must run sequentially (due to global variables that are being manipulated), but there are also many tests that can run in parallel. The golang sdk does not allow us to configure how we want to run these tests in a specific order. Therefore, I split the tests into two sub packages. This has greatly improved the runtime of the tests, without introducing flaky behaviors.

Before:

After:

johanbrandhorst

Super excited about this change, but I'm a little nervous about running cleanup in goroutines which we are not waiting to finish. I don't think that's safe, generally.

internal/daemon/controller/testing.go

internal/daemon/worker/testing.go

internal/tests/cluster/sequential/session_cleanup_test.go

internal/tests/helper/testing_helper.go

johanbrandhorst · 2024-12-18T23:22:06Z

internal/tests/helper/testing_helper.go

+		workerMap := map[string]*worker.TestWorker{}
+		for _, w := range workers {
+			workerMap[w.Name()] = w
+		}
+		updateTimes.Range(func(k, v any) bool {
+			require.NotNil(t, k)
+			require.NotNil(t, v)
+			if workerMap[k.(string)] == nil {
+				// We don't remove from updateTimes currently so if we're not
+				// expecting it we'll see an out-of-date entry
+				return true
+			}
+			assert.WithinDuration(t, time.Now(), v.(time.Time), 30*time.Second)
+			delete(workerMap, k.(string))
+			return true
+		})
+		assert.Empty(t, workerMap)


I don't understand the use of this logic. It seems to populate the workerMap with no entries, since len(workers) == 0, then range over all the update times and return true for every entry (since workerMap is empty), then assert that workerMap is empty, which of course it is because we never added anything to it? I must be missing something?

I think we should move away from the original implementation, specifically using the sleep command and clearing the worker status update times map.

I feel like we should create a different helper function called RemoveExpectedWorkers. In this function, if no workers are provided then we simply just check if the WorkerStatusUpdateTime is empty. If workers are provided, we invoke the WaitForNextSuccessfulStatusUpdate method for each worker and then check if the WorkerStatusUpdateTime is empty. This should be a more accurate way to validate.

In the original ExpectWorkers, we will fail the test is the worker list is empty.

The old code is a little confusing, but it makes sense to me that ExpectWorkers with no workers would assert that there are no active workers. What does RemoveExpectedWorkers do? It can't just check if WorkerStatusUpdateTime is empty, because it'll contain updates from workers that were previously active. It could clear the map and check for workers, which is what the existing code does. I think the existing code relying on a sleep isn't great, so it'd be good to change that, but do we need another function?

internal/tests/helper/testing_helper.go

ddebko · 2024-12-20T21:39:22Z

Work was moved to this PR:
#5390

ddebko requested a review from a team as a code owner December 18, 2024 22:14

github-actions bot added core core/daemon labels Dec 18, 2024

ddebko requested review from moduli and johanbrandhorst December 18, 2024 22:36

johanbrandhorst requested changes Dec 18, 2024

View reviewed changes

johanbrandhorst added this to the 0.19.x milestone Dec 18, 2024

ddebko force-pushed the ddebko-optimize-cluster-tests branch 4 times, most recently from ed25a19 to 1b0edba Compare December 19, 2024 04:42

test(cluster): split tests to improve parallelism

849b7c3

ddebko force-pushed the ddebko-optimize-cluster-tests branch from 1b0edba to 849b7c3 Compare December 19, 2024 05:26

ddebko closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(cluster): split tests to improve parallelism #5383

test(cluster): split tests to improve parallelism #5383

ddebko commented Dec 18, 2024 •

edited

Loading

johanbrandhorst left a comment

johanbrandhorst Dec 18, 2024

ddebko Dec 19, 2024

johanbrandhorst Dec 19, 2024

ddebko commented Dec 20, 2024

test(cluster): split tests to improve parallelism #5383

test(cluster): split tests to improve parallelism #5383

Conversation

ddebko commented Dec 18, 2024 • edited Loading

Summary

johanbrandhorst left a comment

Choose a reason for hiding this comment

johanbrandhorst Dec 18, 2024

Choose a reason for hiding this comment

ddebko Dec 19, 2024

Choose a reason for hiding this comment

johanbrandhorst Dec 19, 2024

Choose a reason for hiding this comment

ddebko commented Dec 20, 2024

ddebko commented Dec 18, 2024 •

edited

Loading