TestMachineSetReconciler is flaky #11722

sbueringer · 2025-01-21T10:38:57Z

Which jobs are flaking?

At least pull-cluster-api-test-main

Which tests are flaking?

sigs.k8s.io/cluster-api/internal/controllers/machineset: TestMachineSetReconciler_syncReplicas_WithErrors/should_hold_off_on_sync_replicas_when_create_Infrastructure_of_machine_failed_

Since when has it been flaking?

Since we merged this unit test

Testgrid link

No response

Reason for failure (if possible)

The race detector found a data race: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api/11718/pull-cluster-api-test-main/1881643904957157376

Anything else we need to know?

Test was merged just yesterday

Label(s) to be applied

/kind flake
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

sbueringer · 2025-01-21T10:39:01Z

/help

k8s-ci-robot · 2025-01-21T10:39:04Z

@sbueringer:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sbueringer · 2025-01-21T10:42:06Z

/triage accepted
/priority important-longterm

Probably we just have to use a separate scheme for TestMachineSetReconciler_syncReplicas_WithErrors

Karthik-K-N · 2025-01-21T13:39:00Z

I will work on it if no one started yet.

enxebre · 2025-01-21T13:42:56Z

/assign @Karthik-K-N

Karthik-K-N · 2025-01-21T14:06:36Z

Even after using -race flag not able to reproduce locally, Any tips?

sbueringer · 2025-01-21T14:38:26Z

Not sure, probably just not happening that often or only with CPU starvation. I think in general it's fine to just see if we can use a separate scheme for this unit test (+ then check via periodics in Prow over time if the flaky test goes away)

(you can check some other tests / places how we create a fake client with a scheme)

sbueringer · 2025-01-21T14:40:11Z

CI link: https://storage.googleapis.com/k8s-triage/index.html?pr=1&job=.*cluster-api.*main&test=should_hold_off_on_sync_replicas_when_create_Infrastructure_of_machine_failed_&xjob=.*-e2e-.*%7C.*-provider-.*

Karthik-K-N · 2025-01-21T15:50:41Z

Got the point, will do thanks.

cprivitere · 2025-01-21T21:07:52Z

The test this PR is about was added in PR: #11211

k8s-ci-robot added the kind/flake Categorizes issue or PR as related to a flaky test. label Jan 21, 2025

sbueringer mentioned this issue Jan 21, 2025

✨ Add MachineDrainRule "WaitCompleted" #11545

Merged

k8s-ci-robot assigned Karthik-K-N Jan 21, 2025

sbueringer mentioned this issue Jan 21, 2025

🌱 Add SSA cache metrics #11635

Merged

Karthik-K-N mentioned this issue Jan 22, 2025

🐛 Fix flake TestMachineSetReconciler test #11728

Merged

k8s-ci-robot closed this as completed in #11728 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestMachineSetReconciler is flaky #11722

TestMachineSetReconciler is flaky #11722

sbueringer commented Jan 21, 2025

sbueringer commented Jan 21, 2025

k8s-ci-robot commented Jan 21, 2025

sbueringer commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

enxebre commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

sbueringer commented Jan 21, 2025 •

edited

Loading

sbueringer commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

cprivitere commented Jan 21, 2025 •

edited

Loading

TestMachineSetReconciler is flaky #11722

TestMachineSetReconciler is flaky #11722

Comments

sbueringer commented Jan 21, 2025

Which jobs are flaking?

Which tests are flaking?

Since when has it been flaking?

Testgrid link

Reason for failure (if possible)

Anything else we need to know?

Label(s) to be applied

sbueringer commented Jan 21, 2025

k8s-ci-robot commented Jan 21, 2025

Guidelines

sbueringer commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

enxebre commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

sbueringer commented Jan 21, 2025 • edited Loading

sbueringer commented Jan 21, 2025

Karthik-K-N commented Jan 21, 2025

cprivitere commented Jan 21, 2025 • edited Loading

sbueringer commented Jan 21, 2025 •

edited

Loading

cprivitere commented Jan 21, 2025 •

edited

Loading