Skip to content

Clickhouse keeper reconfiguration settings: testing and tweaks #6910

Open
@andrewjstone

Description

@andrewjstone

We trigger clickhouse keeper reconfigurations via writing new xml configuration files to each keeper and then relying on the current leader to reload the configuration, diff it with the active configuration and issue any necessary raft membership changes.

This all happens in the keeper code.

There is a little-documented parameter there called CoordinationSetting::configuration_change_tries_count. This can lead to reconfigurations lasting a long time, even if they will eventually fail. My guess is that we should instead set this to 1 or 2. Even if the configuration fails, on the next push it should succeed. On failure though, we'd likely need to remove the cached copy of settings from #6909 to allow the rewrite of the configuration to go through. The problem is that it's next to impossible to figure out when the reconfiguration failed. In essence, this issue is somewhat in conflict with #6909.

Ideally, we'd use the the reconfig command with keeper to block and see if reconfiguration has succeeded or failed immediately. However, this command is not available in the version of clickhouse we have deployed. We need to upgrade, which we should do anyway.

For now though, we need to test this thoroughly and make any necessary config changes.

More details can be found in the following clickhouse issue: ClickHouse/ClickHouse#69355

Also, while we are in here we should almost certainly configure quorum_reads=true for correctness.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions