-
Notifications
You must be signed in to change notification settings - Fork 27
Refactor Elasticsearch node pool controller into Actions structures #241
Refactor Elasticsearch node pool controller into Actions structures #241
Conversation
ed142dc
to
82941f9
Compare
bc8dfa0
to
71296a7
Compare
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @munnerz
I really like this change.
It's taken me a long time to absorb it and I haven't finished the review, but I'm being kicked out of the office now.
So here are some comments to be considering.
|
||
func (e *ExcludeShards) Execute(state *controllers.State) error { | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 removed
func (s *Scale) Execute(state *controllers.State) error { | ||
if s.NodePool == nil || s.Cluster == nil { | ||
return fmt.Errorf("internal error: nodepool and cluster must be set") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should never happen in production and if these are somehow nil, we should let it panic and restart.
I don't see any such checks in the other actions....so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have any 'leaf' bits of code throw panics. If the call-site wants to panic as a result of an error, then sure. But this really should be set. I was just trying to be safe here and catch it, but you are correct it should be all or nothing. I'll update other actions to do the same...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any way that we'd create a scale action with a nil nodepool or nil cluster pointer.
IMO If we're going to do this check it should be done in the nextAction
function before we create the action rather than having to do it everytime we attempt to use the pointer.
Happy to change it in a followup branch if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more just about making our structures safe - imo, anywhere you touch an exported field you should probably check if it's nil. We could 'gate' these fields with a NewConstructor
as an alternative.
statefulSet, err := state.StatefulSetLister.StatefulSets(s.Cluster.Namespace).Get(statefulSetName) | ||
if k8sErrors.IsNotFound(err) { | ||
state.Recorder.Eventf(s.Cluster, core.EventTypeWarning, "Err"+s.Name(), "Cannot get statefulset %q for node pool %q: %v", statefulSetName, s.NodePool.Name, err) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this return the error?
If we've got the state changes correct, we should have always created a statefulset before we attempt to scale, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return nil to prevent an endless loop in the case where the cluster doesn't exist. Reasoning being that when the StatefulSet is added to the listers store, it will trigger the informers OnAdd to fire again, thus re-triggering this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have always created a statefulset before we attempt to scale, right?
Race issues could cause the StatefulSet to no longer exist after the initial check for it existing happens. Also the lister may be stale, which could cause a 404.
// TODO: not sure if we should treat nil as 1 or 0 instead of erroring | ||
if currentReplicas == nil { | ||
return fmt.Errorf("current number of replicas on statefulset cannot be nil") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statefulset controller seems to dereference it...which I think ends up being 0 for a nil pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dereferencing any nil pointer will cause a panic: https://play.golang.org/p/6eQA-QmxJWv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, sorry. That was a stupid comment. But in that case, I take it that the statefulset controller doesn't handle statefulSet.Spec.Replicas
either.
In which case I assume that there is a validation / defaulting mechanism in the API server which checks for nil / missing replicas. Ensuring that it will always have a value by the time it reaches controllers.
// ensure that at least one data, ingest and master node exists already | ||
if !utilapi.ContainsElasticsearchRole(s.NodePool.Roles, v1alpha1.ElasticsearchRoleData) { | ||
return true, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a general point, but maybe it's useful to be able to scale down to 0.
We do this with GKE clusters, so why not with Navigator Clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps - but this could create unstable clusters, and potentially then causing a lot of churn in control loops (Pilots might be updating their resources with error conditions frequently, causing the navigator control loop to keep resyncing).
I'm not against scale-to-zero, but I don't think it needs to be incorporated right now/as part of this PR?
return false | ||
} | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This looks useful.
Consider adding it to a public package so that we can use it in the implementation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These packages can be used from any subpackage of github.com/jetstack/navigator
. This specifically stops anything outside of our repo from importing it. We can use this to convey the fact certain functions/interfaces, whilst exported, are NOT supported long-term. More info: https://docs.google.com/document/d/1e8kOo3r51b2BWtTs_1uADIA5djfXhPT36s6eHVRIvaU/edit
}, | ||
}, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 This looks like a really nice pattern for generating the resources we need in tests.
Is this what they do in kubernetes too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - it made things a lot cleaner though 😄
@@ -45,8 +45,7 @@ func ValidateElasticsearchClusterNodePool(np *navigator.ElasticsearchClusterNode | |||
el = append(el, field.Invalid(fldPath.Child("replicas"), np.Replicas, "must be greater than zero")) | |||
} | |||
// TODO: call k8s.io/kubernetes/pkg/apis/core/validation.ValidateResourceRequirements on np.Resources | |||
// this will require vendoring kubernetes/kubernetes and switching to use the corev1 ResourceRequirements | |||
// struct | |||
// this will require vendoring kubernetes/kubernetes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Might be simpler with latest version of dep
which shouldn't need to vendor the entire kubernetes tree....assuming these validation functions don't have too many dependencies.
_, err = state.Clientset.AppsV1beta1().StatefulSets(toCreate.Namespace).Create(toCreate) | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use IsAlreadyExistsError(err)
here and return nil in that case?
I feel like it might be good if Execute
was guaranteed to be idempotent.
I feel like (not tested or verified) that there might be cases where we create the stateful set, but it doesn't get immediately listed and the reconciliation code elsewhere might calculate another create action.
In that case, we don't want to return an error because that will result in the cluster being put back in the queue.
Not sure, but worth discussing I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather hold on this one for now. It's an overbearing requirement to impose on all actions, and (right now) there would no noticeable gain. Right now, if the call site wants to check if a CreateNodePool
failed because it already exists, they can call IsAlreadyExists themselves and take appropriate action.
Once we start implementing more features with Actions, I hope it'll become clearer whether we want to require all actions be idempotent.
Resources: apiv1.ResourceRequirements{ | ||
Requests: np.Resources.Requests, | ||
Limits: np.Resources.Limits, | ||
InitialDelaySeconds: 240, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This figure changed from 60 to 240. How come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During testing I was having issues where the liveness probe wasn't passing within 60s, causing crash loops.
Really, we should probably change the Elasticsearch Pilot liveness probe to just check that Pilot is running (i.e. Pilot should always return true, and not attempt to query ES itself)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's what it does already.
func (p *Pilot) LivenessCheck() probe.Check {
return probe.CombineChecks(
func() error {
return nil
},
)
}
cd43d5a
to
9c61c2b
Compare
@munnerz PR needs rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some further comments.
// } | ||
// if len(pilots) != 0 { | ||
// t.Errorf("Expected pilot to be deleted, but Pilot is still present in lister: %v", pilots) | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although, if we're waiting for longer than the resync period in WaitForResync
above, the lister should (most of the time) not list the deleted object.
How flaky is this in practice?
elemsLen := elemsV.Len() | ||
if listLen < elemsLen { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that ContainsAll([1], [1,1]) == false
, which might cause confusion.
It's a shame golang doesn't allow you to create generic sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is just a property of the function - it could be said that [1]
does not contain all the elements of [1, 1]
(depending on how you're looking at it)
} | ||
if err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone accidentally deletes a Navigator statefulset during an ongoing version upgrade from ES v2 to ES v4?
- Navigator will attempt to recreate the statefulset, but
- with a pod template containing the new version and
- start the newer version of the pilot and
- start the new version of ES
- without starting the intermediate ES v3 pilot and waiting for it to complete data migration.
Perhaps instead, the CreateNodePool
action needs to look at the last reported ESCluster.Status.Version
(if it exists) and start that instead.
Does a similar problem exist for an ongoing scale down?
If the statefulset is deleted, a new one will be recreated with the desired (smaller) replica count whereas it should probably be recreated with the last known good replica count and wait for the shards to be drained from the chosen node?
This all supposes that the PVs from the deleted statefulset would be re-used by the new statefulset. Is that how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps in the case where a statefulset goes missing, navigator should just stop, log an error event and wait for a human to repair it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then, what would those steps be?
If a CassandraCluster resource is accidentally deleted, you still have all the data on, now orphan, PVs.
How would you recreate the cluster and safely re-use the data on those PVs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if someone accidentally deletes a Navigator statefulset during an ongoing version upgrade from ES v2 to ES v4?
This scenario is definitely something we are susceptible to. I think some examination of how the Deployment controller behaves in this case would be useful. If it does manage to handle it gracefully, my guess is because it stores a lot of information about the upgrade/rolling update back on the Deployment resource itself. We can definitely do something similar here.. borrowing a lot of logic from the Deployment controller itself.
I'm thinking (/hoping) that this doesn't need to be a release blocker however, as this actions change is quite architectural and it'd be good to get Cassandra working in a similar way too.
Does a similar problem exist for an ongoing scale down?
Yes
This all supposes that the PVs from the deleted statefulset would be re-used by the new statefulset. Is that how it works?
Yep, that is how it works
And then, what would those steps be?
If a CassandraCluster resource is accidentally deleted, you still have all the data on, now orphan, PVs.
How would you recreate the cluster and safely re-use the data on those PVs?
Creating a statefulset with the same name (and same PVC template) should do it 😄
NavigatorClientset: e.navigatorClient, | ||
Recorder: e.recorder, | ||
StatefulSetLister: e.statefulSetLister, | ||
ConfigMapLister: e.configMapLister, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do any of the actions need to list ConfigMaps in this branch?
I can't find it used in any of the actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not right now - once this is merged I'll be incorporating it though.
9c61c2b
to
9c6a9d7
Compare
@wallrj PR ready for another review. I've addressed changes in later commits to make it easier to see my newest changes. /hold cancel |
2c726fb
to
12e64a9
Compare
|
||
func FuzzyEqualLists(left, right interface{}) bool { | ||
return ContainsAll(left, right) && ContainsAll(right, left) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @munnerz
The changes all look good to me and I'm looking forward to adapting the cassandra controller to this pattern.
I made a few further comments, but nothing that needs to be addressed here.
Merge at will!
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wallrj The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue. |
What this PR does / why we need it:
This gives us a generic way to implement 'Actions' against a given 'State' structure, and additionally introduces a unit testing framework that can be reused between controllers for testing Actions.
See #228
fixes #215
Special notes for your reviewer:
Stacks on #240, #239
Release note: