Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix accidental ignition and stuck graceful termination #2212

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

tnozicka
Copy link
Contributor

@tnozicka tnozicka commented Nov 28, 2024

Description of your changes:
This PR fixes the ignition touching the file before it's actually ignited and adds e2e coverage to prevent it from happening again. (Looks like it got broken in the early iterations.)

It also fixes stuck graceful termination before ignition is done because fixing one surfaces the other is broken and they can't be split without disabling some tests. (For this reason the ignition file is changed twice - best viewed as an overall diff then "by commit"). It also fixes graceful termination after ignition that was broken because of the manager agent container wasn't using exec. (@rzetelskik found that previously and it also failed in the test I wrote.)

Which issue is resolved by this Pull Request:
Resolves #2213

@tnozicka tnozicka added kind/bug Categorizes issue or PR as related to a bug. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Nov 28, 2024
Copy link
Contributor

@tnozicka: GitHub didn't allow me to request PR reviews from the following users: tnozicka.

Note that only scylladb members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Description of your changes:
This PR fixes the ignition touching the file before it's actually ignited and add e2e coverage to prevent it from happening again. (Looks like it got broken in the early iterations.)

Which issue is resolved by this Pull Request:
Resolves #

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@scylla-operator-bot scylla-operator-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Nov 28, 2024
@tnozicka
Copy link
Contributor Author

tnozicka commented Nov 28, 2024

(the e2e test on it's own failed as expected - the run was https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_scylla-operator/2212/pull-scylla-operator-master-e2e-gke-serial/1862139318282227712#1:test-build-log.txt%3A994)

 NodeConfig Optimizations [It] should correctly project state for each scylla pod [Serial]
github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_optimizations.go:194
  [FAILED] container in Pod "basic-dr65w-us-east-1-us-east-1a-0" don't match the expected state
  Expected
      <map[string]bool | len:5>: {
          "sidecar-injection": true,
          "scylla": true,
          "scylla-manager-agent": true,
          "scylladb-api-status-probe": true,
          "scylladb-ignition": false,
      }
  to have {key: value}
      <map[interface {}]interface {} | len:1>: {
          <string>"scylla": <bool>false,
      }
  In [It] at: github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_optimizations.go:342 @ 11/28/24 14:57:40.478
  Full Stack Trace
    github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig.init.func2.4()
    	github.com/scylladb/scylla-operator/test/e2e/set/nodeconfig/nodeconfig_optimizations.go:342 +0x2353 

@tnozicka tnozicka changed the title [WIP] Fix accidental ignition Fix accidental ignition Nov 28, 2024
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2024
@tnozicka tnozicka requested review from rzetelskik and zimnx and removed request for rzetelskik November 28, 2024 15:19
Copy link
Member

@rzetelskik rzetelskik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one typo
lgtm
/assign zimnx

o.HaveKeyWithValue(naming.ScyllaContainerName, false),
o.HaveKeyWithValue(naming.ScyllaDBIgnitionContainerName, false),
),
fmt.Sprintf("container in Pod %q don't match the expected state", pod.Name),
Copy link
Member

@rzetelskik rzetelskik Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
fmt.Sprintf("container in Pod %q don't match the expected state", pod.Name),
fmt.Sprintf("containers in Pod %q don't match the expected state", pod.Name),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I suppose container(s) as it's 1-N

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tnozicka tnozicka changed the title Fix accidental ignition [WIP] Fix accidental ignition Nov 29, 2024
@scylla-operator-bot scylla-operator-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2024
@tnozicka
Copy link
Contributor Author

tnozicka commented Dec 2, 2024

blocked on #2216

@scylla-operator-bot scylla-operator-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 2, 2024
@scylla-operator-bot scylla-operator-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 3, 2024
@tnozicka tnozicka force-pushed the fix-ignition branch 2 times, most recently from 2c8f717 to baa2b8a Compare December 9, 2024 12:58
@tnozicka
Copy link
Contributor Author

tnozicka commented Dec 9, 2024

#2246 landed
/test all

@tnozicka tnozicka force-pushed the fix-ignition branch 3 times, most recently from 2d63a42 to 42ca696 Compare December 10, 2024 08:07
@scylla-operator-bot scylla-operator-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 10, 2024
@tnozicka tnozicka changed the title [WIP] Fix accidental ignition and stuck graceful termination Fix accidental ignition and stuck graceful termination Dec 10, 2024
@scylla-operator-bot scylla-operator-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2024
@tnozicka tnozicka requested review from rzetelskik and zimnx December 10, 2024 12:31
Copy link
Collaborator

@zimnx zimnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise lgtm. Please fix PR description, one sentence contains duplicated parts.

test/e2e/set/scyllacluster/scyllacluster_restarts.go Outdated Show resolved Hide resolved
@tnozicka
Copy link
Contributor Author

Please fix PR description, one sentence contains duplicated parts.

I fixed the typos and added more context, but one fix is for graceful termination before ignition and the other fix is after ignition.

Copy link
Collaborator

@zimnx zimnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
/assign rzetelskik

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzetelskik, tnozicka, zimnx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@rzetelskik rzetelskik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment/question, rest lgtm

@rzetelskik
Copy link
Member

/lgtm

@scylla-operator-bot scylla-operator-bot bot added the lgtm Indicates that a PR is ready to be merged. label Dec 11, 2024
@tnozicka
Copy link
Contributor Author

bug #2249
/retest

@scylla-operator-bot scylla-operator-bot bot merged commit 91b1fe0 into scylladb:master Dec 11, 2024
12 checks passed
@tnozicka tnozicka deleted the fix-ignition branch December 11, 2024 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ScyllaDB container starts before it's ignited
3 participants