Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

Closed
wants to merge 26 commits into from

Conversation

munnerz
Copy link
Contributor

@munnerz munnerz commented Jan 29, 2018

What this PR does / why we need it:

This change refactors the Elasticsearch controller to use a state-machine 'style' of operation.

We introduce an 'Action' interface with a simple spec:

type Action interface {
	Name() string
	// Execute should attempt to execute the action. If it is not possible to
	// apply the specified changes (e.g. due to the cluster not being in a
	// 'ready state', or some transient error) then an error will be returned
	// so the action can be requeued. This allows for non-blocking blocking of
	// actions, with retries. The workqueues default scheduling and rate limit
	// will thus handle fairness within Navigator, and handle backing off on
	// retries.
	Execute(state *State) error
}

The cluster_control.go contains the 'state machine' like logic. For now, this is implemented as a simple function that uses if/else branches to determine which 'Action' to return depending on the current state of the cluster.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #215, fixes #156, fixes #32

Special notes for your reviewer:

I'm currently working on a general purpose unit testing fixture, StateFixture that can be used to mock the State structure. This should allow for easy testing of all of our Action implementations, between Elasticsearch and in future Cassandra.

Right now, this PR does not properly wait for document count to be zero during scale down, however it does manage a rolling upgrade across multiple statefulsets by careful management of the partition field on statefulset.spec.updateStrategy.rollingUpdate`.

Things still to do:

  • managed scale down
  • support changing node pool resources
  • support changing node pool roles
  • support changing sysctls
  • support changing pilot image
  • support changing securityContext

Release note:

Introduce a managed node pool upgrade process

/cc @mattbates @simonswine @wallrj

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: wallrj

Assign the PR to them by writing /assign @wallrj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@munnerz munnerz force-pushed the introduce-state-structure branch from e82260c to ed305d4 Compare February 1, 2018 17:31
@munnerz munnerz force-pushed the introduce-state-structure branch from 641d8c8 to 7a5236d Compare February 1, 2018 18:25
@munnerz munnerz force-pushed the introduce-state-structure branch from 7a5236d to 403c5d5 Compare February 1, 2018 19:52
@munnerz munnerz force-pushed the introduce-state-structure branch from f54d0bf to 1aca3a1 Compare February 1, 2018 23:16
@munnerz munnerz force-pushed the introduce-state-structure branch from fcc58f6 to baabd83 Compare February 2, 2018 16:32
@munnerz munnerz force-pushed the introduce-state-structure branch from baabd83 to a9b6f2e Compare February 2, 2018 17:14
@munnerz
Copy link
Contributor Author

munnerz commented Feb 3, 2018

/test verify

@munnerz
Copy link
Contributor Author

munnerz commented Feb 3, 2018

/test e2e v1.9

@munnerz munnerz force-pushed the introduce-state-structure branch from 7f8af77 to a62a64e Compare February 3, 2018 00:56
@munnerz munnerz changed the title WIP: Refactor Elasticsearch controller to use a semi-state machine design Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests Feb 5, 2018
@munnerz
Copy link
Contributor Author

munnerz commented Feb 5, 2018

@wallrj this PR has grown large, so I'm not going to continue any more work here. Our e2e tests are passing and stand up an ElasticsearchCluster (both single and multi-node), plus wait for the cluster to transition into a Yellow/Green state respectively. I'll be following up with more PRs to address the remaining items:

  • managed scale down
  • support changing node pool resources
  • support changing node pool roles
  • support changing sysctls
  • support changing pilot image
  • support changing securityContext

I'll also be expanding the test suite further, to:

  • test performing an upgrade
  • test writing data to one node and reading from another
  • test reading & writing data whilst the cluster is in a degraded state
  • test updating resources/other fields in the cluster

This PR also introduces two new test frameworks:

  1. the StateFixture, which is useful for unit/integration testing controller.Actions. I've also added e2e tests for the upgrade and scale actions, that ensure the controller behaves as expected under various 'initial states'

  2. the ginkgo e2e test framework. This is based off the kubernetes/kubernetes test framework.

Once merged, we can start to migrate the remaining cassandra e2e tests to this new format too. It appears to be running stable, and cleans up after itself nice and quickly 😄

Please reach out if you need a hand with review/if there's anything I can do to help. Big PRs are big 😬

@munnerz
Copy link
Contributor Author

munnerz commented Feb 6, 2018

Closing in favour of #239, #240, #241, #242

@munnerz munnerz closed this Feb 6, 2018
jetstack-ci-bot added a commit that referenced this pull request Feb 15, 2018
Automatic merge from submit-queue.

Refactor Elasticsearch node pool controller into Actions structures

**What this PR does / why we need it**:

This gives us a generic way to implement 'Actions' against a given 'State' structure, and additionally introduces a unit testing framework that can be reused between controllers for testing Actions.

See #228 

fixes #215 

**Special notes for your reviewer**:

Stacks on #240, #239 

**Release note**:
```release-note
Updated Elasticsearch controller that carefully manages upgrade rollouts
```
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
2 participants