[Bug] IP stays in exclude list when draining fails #128

otrosien · 2020-10-26T16:55:10Z

Expected Behavior

There are situations when ES will refuse to drain a given node (usually allocation constraints like max. number of shards per index and node). This will cause ES Operator to wait indefinitely for the draining to finish. At some point the scale-down event gets superseded by a scale-up event.

This should lead to the previously "to-be-drained" node to be used again.

Actual Behavior

What happens instead is that the IP stays in the cluster.routing.allocation.exclude._ip and the scale-up event only causes the statefulset to be updated, spawning new nodes. This leaves the node in a commissioned but unused state.

Steps to Reproduce the Problem

Create a cluster with two nodes (minReplicas=1, maxReplicas=2, minIndexReplicas=0), add one index with two shards, no replicas and "routing.allocation.total_shards_per_node: 1"
Wait for es-operator to start draining the second node, which will fail as ES rejects more than one shard of that same index onto the same node
Trigger a scale-out event by putting some CPU load onto ES.
Check :9200/_cluster/settings to see the IP being still in there.

Specifications

Version: latest
Platform: any
Subsystem: any

The text was updated successfully, but these errors were encountered:

mikkeloscar added the bug Something isn't working label Oct 27, 2020

otrosien mentioned this issue Apr 12, 2021

feat(esClientDrain): enhance Drain ES Client function #168

Open

1 task

A-Kamaee mentioned this issue Jun 11, 2024

Remove node IP from excluded list when draining fails #423

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] IP stays in exclude list when draining fails #128

[Bug] IP stays in exclude list when draining fails #128

otrosien commented Oct 26, 2020

[Bug] IP stays in exclude list when draining fails #128

[Bug] IP stays in exclude list when draining fails #128

Comments

otrosien commented Oct 26, 2020

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Specifications