Skip to content
This repository was archived by the owner on Apr 4, 2023. It is now read-only.

Refactor Elasticsearch node pool controller into Actions structures #241

Merged
merged 14 commits into from
Feb 15, 2018

Conversation

munnerz
Copy link
Contributor

@munnerz munnerz commented Feb 6, 2018

What this PR does / why we need it:

This gives us a generic way to implement 'Actions' against a given 'State' structure, and additionally introduces a unit testing framework that can be reused between controllers for testing Actions.

See #228

fixes #215

Special notes for your reviewer:

Stacks on #240, #239

Release note:

Updated Elasticsearch controller that carefully manages upgrade rollouts

@jetstack-bot jetstack-bot requested a review from wallrj February 6, 2018 11:36
@munnerz munnerz force-pushed the refactor-es-controller branch 2 times, most recently from ed142dc to 82941f9 Compare February 6, 2018 11:49
@munnerz munnerz force-pushed the refactor-es-controller branch 2 times, most recently from bc8dfa0 to 71296a7 Compare February 6, 2018 14:14
@munnerz
Copy link
Contributor Author

munnerz commented Feb 6, 2018

/hold

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @munnerz

I really like this change.
It's taken me a long time to absorb it and I haven't finished the review, but I'm being kicked out of the office now.
So here are some comments to be considering.


func (e *ExcludeShards) Execute(state *controllers.State) error {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems incomplete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 removed

func (s *Scale) Execute(state *controllers.State) error {
if s.NodePool == nil || s.Cluster == nil {
return fmt.Errorf("internal error: nodepool and cluster must be set")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should never happen in production and if these are somehow nil, we should let it panic and restart.
I don't see any such checks in the other actions....so far.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have any 'leaf' bits of code throw panics. If the call-site wants to panic as a result of an error, then sure. But this really should be set. I was just trying to be safe here and catch it, but you are correct it should be all or nothing. I'll update other actions to do the same...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any way that we'd create a scale action with a nil nodepool or nil cluster pointer.
IMO If we're going to do this check it should be done in the nextAction function before we create the action rather than having to do it everytime we attempt to use the pointer.
Happy to change it in a followup branch if you agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more just about making our structures safe - imo, anywhere you touch an exported field you should probably check if it's nil. We could 'gate' these fields with a NewConstructor as an alternative.

statefulSet, err := state.StatefulSetLister.StatefulSets(s.Cluster.Namespace).Get(statefulSetName)
if k8sErrors.IsNotFound(err) {
state.Recorder.Eventf(s.Cluster, core.EventTypeWarning, "Err"+s.Name(), "Cannot get statefulset %q for node pool %q: %v", statefulSetName, s.NodePool.Name, err)
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this return the error?
If we've got the state changes correct, we should have always created a statefulset before we attempt to scale, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return nil to prevent an endless loop in the case where the cluster doesn't exist. Reasoning being that when the StatefulSet is added to the listers store, it will trigger the informers OnAdd to fire again, thus re-triggering this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have always created a statefulset before we attempt to scale, right?

Race issues could cause the StatefulSet to no longer exist after the initial check for it existing happens. Also the lister may be stale, which could cause a 404.

// TODO: not sure if we should treat nil as 1 or 0 instead of erroring
if currentReplicas == nil {
return fmt.Errorf("current number of replicas on statefulset cannot be nil")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statefulset controller seems to dereference it...which I think ends up being 0 for a nil pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dereferencing any nil pointer will cause a panic: https://play.golang.org/p/6eQA-QmxJWv

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry. That was a stupid comment. But in that case, I take it that the statefulset controller doesn't handle statefulSet.Spec.Replicas either.

In which case I assume that there is a validation / defaulting mechanism in the API server which checks for nil / missing replicas. Ensuring that it will always have a value by the time it reaches controllers.

// ensure that at least one data, ingest and master node exists already
if !utilapi.ContainsElasticsearchRole(s.NodePool.Roles, v1alpha1.ElasticsearchRoleData) {
return true, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a general point, but maybe it's useful to be able to scale down to 0.
We do this with GKE clusters, so why not with Navigator Clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps - but this could create unstable clusters, and potentially then causing a lot of churn in control loops (Pilots might be updating their resources with error conditions frequently, causing the navigator control loop to keep resyncing).

I'm not against scale-to-zero, but I don't think it needs to be incorporated right now/as part of this PR?

return false
}
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks useful.
Consider adding it to a public package so that we can use it in the implementation too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These packages can be used from any subpackage of github.com/jetstack/navigator. This specifically stops anything outside of our repo from importing it. We can use this to convey the fact certain functions/interfaces, whilst exported, are NOT supported long-term. More info: https://docs.google.com/document/d/1e8kOo3r51b2BWtTs_1uADIA5djfXhPT36s6eHVRIvaU/edit

},
},
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks like a really nice pattern for generating the resources we need in tests.
Is this what they do in kubernetes too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - it made things a lot cleaner though 😄

@@ -45,8 +45,7 @@ func ValidateElasticsearchClusterNodePool(np *navigator.ElasticsearchClusterNode
el = append(el, field.Invalid(fldPath.Child("replicas"), np.Replicas, "must be greater than zero"))
}
// TODO: call k8s.io/kubernetes/pkg/apis/core/validation.ValidateResourceRequirements on np.Resources
// this will require vendoring kubernetes/kubernetes and switching to use the corev1 ResourceRequirements
// struct
// this will require vendoring kubernetes/kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Might be simpler with latest version of dep which shouldn't need to vendor the entire kubernetes tree....assuming these validation functions don't have too many dependencies.

_, err = state.Clientset.AppsV1beta1().StatefulSets(toCreate.Namespace).Create(toCreate)
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use IsAlreadyExistsError(err) here and return nil in that case?
I feel like it might be good if Execute was guaranteed to be idempotent.
I feel like (not tested or verified) that there might be cases where we create the stateful set, but it doesn't get immediately listed and the reconciliation code elsewhere might calculate another create action.
In that case, we don't want to return an error because that will result in the cluster being put back in the queue.
Not sure, but worth discussing I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather hold on this one for now. It's an overbearing requirement to impose on all actions, and (right now) there would no noticeable gain. Right now, if the call site wants to check if a CreateNodePool failed because it already exists, they can call IsAlreadyExists themselves and take appropriate action.

Once we start implementing more features with Actions, I hope it'll become clearer whether we want to require all actions be idempotent.

Resources: apiv1.ResourceRequirements{
Requests: np.Resources.Requests,
Limits: np.Resources.Limits,
InitialDelaySeconds: 240,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This figure changed from 60 to 240. How come?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During testing I was having issues where the liveness probe wasn't passing within 60s, causing crash loops.

Really, we should probably change the Elasticsearch Pilot liveness probe to just check that Pilot is running (i.e. Pilot should always return true, and not attempt to query ES itself)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what it does already.

func (p *Pilot) LivenessCheck() probe.Check {
	return probe.CombineChecks(
		func() error {
			return nil
		},
	)
}

@munnerz munnerz force-pushed the refactor-es-controller branch from cd43d5a to 9c61c2b Compare February 6, 2018 20:37
@jetstack-ci-bot
Copy link
Contributor

@munnerz PR needs rebase

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further comments.

// }
// if len(pilots) != 0 {
// t.Errorf("Expected pilot to be deleted, but Pilot is still present in lister: %v", pilots)
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, if we're waiting for longer than the resync period in WaitForResync above, the lister should (most of the time) not list the deleted object.
How flaky is this in practice?

elemsLen := elemsV.Len()
if listLen < elemsLen {
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that ContainsAll([1], [1,1]) == false, which might cause confusion.
It's a shame golang doesn't allow you to create generic sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just a property of the function - it could be said that [1] does not contain all the elements of [1, 1] (depending on how you're looking at it)

}
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone accidentally deletes a Navigator statefulset during an ongoing version upgrade from ES v2 to ES v4?

  • Navigator will attempt to recreate the statefulset, but
  • with a pod template containing the new version and
  • start the newer version of the pilot and
  • start the new version of ES
  • without starting the intermediate ES v3 pilot and waiting for it to complete data migration.

Perhaps instead, the CreateNodePool action needs to look at the last reported ESCluster.Status.Version (if it exists) and start that instead.

Does a similar problem exist for an ongoing scale down?
If the statefulset is deleted, a new one will be recreated with the desired (smaller) replica count whereas it should probably be recreated with the last known good replica count and wait for the shards to be drained from the chosen node?

This all supposes that the PVs from the deleted statefulset would be re-used by the new statefulset. Is that how it works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps in the case where a statefulset goes missing, navigator should just stop, log an error event and wait for a human to repair it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then, what would those steps be?
If a CassandraCluster resource is accidentally deleted, you still have all the data on, now orphan, PVs.
How would you recreate the cluster and safely re-use the data on those PVs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone accidentally deletes a Navigator statefulset during an ongoing version upgrade from ES v2 to ES v4?

This scenario is definitely something we are susceptible to. I think some examination of how the Deployment controller behaves in this case would be useful. If it does manage to handle it gracefully, my guess is because it stores a lot of information about the upgrade/rolling update back on the Deployment resource itself. We can definitely do something similar here.. borrowing a lot of logic from the Deployment controller itself.

I'm thinking (/hoping) that this doesn't need to be a release blocker however, as this actions change is quite architectural and it'd be good to get Cassandra working in a similar way too.

Does a similar problem exist for an ongoing scale down?

Yes

This all supposes that the PVs from the deleted statefulset would be re-used by the new statefulset. Is that how it works?

Yep, that is how it works

And then, what would those steps be?
If a CassandraCluster resource is accidentally deleted, you still have all the data on, now orphan, PVs.
How would you recreate the cluster and safely re-use the data on those PVs?

Creating a statefulset with the same name (and same PVC template) should do it 😄

NavigatorClientset: e.navigatorClient,
Recorder: e.recorder,
StatefulSetLister: e.statefulSetLister,
ConfigMapLister: e.configMapLister,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any of the actions need to list ConfigMaps in this branch?
I can't find it used in any of the actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not right now - once this is merged I'll be incorporating it though.

@munnerz
Copy link
Contributor Author

munnerz commented Feb 13, 2018

@wallrj PR ready for another review. I've addressed changes in later commits to make it easier to see my newest changes.

/hold cancel

@munnerz munnerz force-pushed the refactor-es-controller branch from 2c726fb to 12e64a9 Compare February 13, 2018 13:53
@munnerz munnerz added this to the v0.1 milestone Feb 13, 2018

func FuzzyEqualLists(left, right interface{}) bool {
return ContainsAll(left, right) && ContainsAll(right, left)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

Copy link
Member

@wallrj wallrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @munnerz

The changes all look good to me and I'm looking forward to adapting the cassandra controller to this pattern.
I made a few further comments, but nothing that needs to be addressed here.

Merge at will!

/lgtm

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wallrj

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@jetstack-ci-bot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@jetstack-ci-bot
Copy link
Contributor

Automatic merge from submit-queue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Design/plan finite state machine for ElasticsearchCluster management
4 participants