Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

munnerz · 2018-01-29T11:03:44Z

What this PR does / why we need it:

This change refactors the Elasticsearch controller to use a state-machine 'style' of operation.

We introduce an 'Action' interface with a simple spec:

type Action interface {
	Name() string
	// Execute should attempt to execute the action. If it is not possible to
	// apply the specified changes (e.g. due to the cluster not being in a
	// 'ready state', or some transient error) then an error will be returned
	// so the action can be requeued. This allows for non-blocking blocking of
	// actions, with retries. The workqueues default scheduling and rate limit
	// will thus handle fairness within Navigator, and handle backing off on
	// retries.
	Execute(state *State) error
}

The cluster_control.go contains the 'state machine' like logic. For now, this is implemented as a simple function that uses if/else branches to determine which 'Action' to return depending on the current state of the cluster.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #215, fixes #156, fixes #32

Special notes for your reviewer:

I'm currently working on a general purpose unit testing fixture, StateFixture that can be used to mock the State structure. This should allow for easy testing of all of our Action implementations, between Elasticsearch and in future Cassandra.

Right now, this PR does not properly wait for document count to be zero during scale down, however it does manage a rolling upgrade across multiple statefulsets by careful management of the partition field on statefulset.spec.updateStrategy.rollingUpdate`.

Things still to do:

managed scale down
support changing node pool resources
support changing node pool roles
support changing sysctls
support changing pilot image
support changing securityContext

Release note:

Introduce a managed node pool upgrade process

/cc @mattbates @simonswine @wallrj

jetstack-bot · 2018-01-29T11:03:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
We suggest the following additional approver: wallrj

Assign the PR to them by writing /assign @wallrj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

munnerz · 2018-02-03T00:16:29Z

/test verify

munnerz · 2018-02-03T00:27:40Z

/test e2e v1.9

munnerz · 2018-02-05T11:51:52Z

@wallrj this PR has grown large, so I'm not going to continue any more work here. Our e2e tests are passing and stand up an ElasticsearchCluster (both single and multi-node), plus wait for the cluster to transition into a Yellow/Green state respectively. I'll be following up with more PRs to address the remaining items:

managed scale down
support changing node pool resources
support changing node pool roles
support changing sysctls
support changing pilot image
support changing securityContext

I'll also be expanding the test suite further, to:

test performing an upgrade
test writing data to one node and reading from another
test reading & writing data whilst the cluster is in a degraded state
test updating resources/other fields in the cluster

This PR also introduces two new test frameworks:

the StateFixture, which is useful for unit/integration testing controller.Actions. I've also added e2e tests for the upgrade and scale actions, that ensure the controller behaves as expected under various 'initial states'
the ginkgo e2e test framework. This is based off the kubernetes/kubernetes test framework.

Once merged, we can start to migrate the remaining cassandra e2e tests to this new format too. It appears to be running stable, and cleans up after itself nice and quickly 😄

Please reach out if you need a hand with review/if there's anything I can do to help. Big PRs are big 😬

munnerz · 2018-02-06T17:55:20Z

Closing in favour of #239, #240, #241, #242

Automatic merge from submit-queue. Refactor Elasticsearch node pool controller into Actions structures **What this PR does / why we need it**: This gives us a generic way to implement 'Actions' against a given 'State' structure, and additionally introduces a unit testing framework that can be reused between controllers for testing Actions. See #228 fixes #215 **Special notes for your reviewer**: Stacks on #240, #239 **Release note**: ```release-note Updated Elasticsearch controller that carefully manages upgrade rollouts ```

jetstack-bot requested review from mattbates, simonswine and wallrj January 29, 2018 11:03

jetstack-bot added do-not-merge/work-in-progress release-note labels Jan 29, 2018

jetstack-bot added the size/XL label Jan 29, 2018

munnerz force-pushed the introduce-state-structure branch from e82260c to ed305d4 Compare February 1, 2018 17:31

jetstack-bot added size/XXL and removed size/XL labels Feb 1, 2018

munnerz force-pushed the introduce-state-structure branch from 641d8c8 to 7a5236d Compare February 1, 2018 18:25

James Munnelly added 6 commits February 1, 2018 19:51

Refactor Elasticsearch controller to use a semi-state machine design

3ff2fad

Add StateFixture testing framework

8cabcaf

Don't scale if pilots that may be effected do not exist

67de8a5

Add Scale action tests

f42b472

Run dep ensure

efdd1f4

Add update version action tests

403c5d5

munnerz force-pushed the introduce-state-structure branch from 7a5236d to 403c5d5 Compare February 1, 2018 19:52

James Munnelly added 5 commits February 1, 2018 19:58

Use action name when creating events

0f76460

Skip lister update test until we update to k8s 1.10 api machinery

201e7fa

Move actions into separate subpackage

675ed4a

Remove old unit test framework utils

1bddc85

Add pilot.status.elasticsearch.version field, remove nodepool package

1aca3a1

munnerz force-pushed the introduce-state-structure branch from f54d0bf to 1aca3a1 Compare February 1, 2018 23:16

James Munnelly added 6 commits February 2, 2018 12:35

Less spammy Event messages. Create 'generate' testutil package.

2f4c7fd

Add golang e2e test suite

b7fcbf6

Run dep ensure

a1cbf47

Fix Makefile env substitution. Set report-dir during e2e tests.

55bf215

Fix lint errors

b1f032c

Set context flag when running e2e tests

df44949

James Munnelly added 4 commits February 2, 2018 14:56

Update go_test target to not run e2e tests

b99ac03

Correctly set namespace when deploying navigator

5c2de4b

Fix pkg/util/api unit test

dcaf84e

Fix validation tests for int32 replicas

6070dbf

munnerz force-pushed the introduce-state-structure branch from fcc58f6 to baabd83 Compare February 2, 2018 16:32

James Munnelly added 2 commits February 2, 2018 17:14

Update e2e test resource requests/limits. Make Resources a non-ptr.

7c042f7

Run hack/update-client-gen.sh

a9b6f2e

munnerz force-pushed the introduce-state-structure branch from baabd83 to a9b6f2e Compare February 2, 2018 17:14

Set vm.max_map_count during es e2e tests

20c47af

James Munnelly added 2 commits February 3, 2018 00:44

Add node pool scale up e2e test

86b362e

Add DeleteAllStatefulSets function to framework

a62a64e

munnerz force-pushed the introduce-state-structure branch from 7f8af77 to a62a64e Compare February 3, 2018 00:56

munnerz mentioned this pull request Feb 5, 2018

ElasticSearch E2E tests fail on non-minikube cluster #156

Open

munnerz changed the title ~~WIP: Refactor Elasticsearch controller to use a semi-state machine design~~ Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests Feb 5, 2018

jetstack-bot removed the do-not-merge/work-in-progress label Feb 5, 2018

munnerz mentioned this pull request Feb 6, 2018

Refactor Elasticsearch node pool controller into Actions structures #241

Merged

munnerz closed this Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

munnerz commented Jan 29, 2018 •

edited

Loading

jetstack-bot commented Jan 29, 2018

munnerz commented Feb 3, 2018

munnerz commented Feb 3, 2018

munnerz commented Feb 5, 2018

munnerz commented Feb 6, 2018

Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

Refactor Elasticsearch controller to use a semi-state machine design & add ginkgo based e2e tests #228

Conversation

munnerz commented Jan 29, 2018 • edited Loading

jetstack-bot commented Jan 29, 2018

munnerz commented Feb 3, 2018

munnerz commented Feb 3, 2018

munnerz commented Feb 5, 2018

munnerz commented Feb 6, 2018

munnerz commented Jan 29, 2018 •

edited

Loading