- Unify the log format for easy collection and analysis by tools
- Simulator
- Make hotspot scheduling configurable #1412
- Add the store address as the dimension monitoring item to replace the previous Store ID #1429
- Optimize the
GetStores
overhead to speed up the Region inspection cycle #1410 - Add an interface to delete the Tombstone Store #1472
- Add
RegionStorage
to store Region metadata separately #1237 - Add shuffle hot Region scheduler #1361
- Add scheduling parameter related metrics #1406
- Add cluster label related metrics #1402
- Add the importing data simulator #1263
- Fix the
Watch
issue about leader election #1396
- Fix the Region information update issue about Region merge #1377
- Fix the issue that some configuration items cannot be set to
0
in the configuration file #1334 - Check the undefined configuration when starting PD #1362
- Avoid transferring the leader to a newly created peer, to optimize the possible delay #1339
- Fix the issue that
RaftCluster
cannot stop caused by deadlock #1370
-
Optimize availability
- Introduce the version control mechanism and support rolling update of the cluster compatibly
- Enable
Raft PreVote
among PD nodes to avoid leader reelection when network recovers after network isolation - Enable
raft learner
by default to lower the risk of unavailable data caused by machine failure during scheduling - TSO allocation is no longer affected by the system clock going backwards
- Support the
Region merge
feature to reduce the overhead brought by metadata
-
Optimize the scheduler
- Optimize the processing of Down Store to speed up making up replicas
- Optimize the hotspot scheduler to improve its adaptability when traffic statistics information jitters
- Optimize the start of Coordinator to reduce the unnecessary scheduling caused by restarting PD
- Optimize the issue that Balance Scheduler schedules small Regions frequently
- Optimize Region merge to consider the number of rows within the Region
- Add more commands to control the scheduling policy
- Improve PD simulator to simulate the scheduling scenarios
-
API and operation tools
- Add the
GetPrevRegion
interface to support theTiDB reverse scan
feature - Add the
BatchSplitRegion
interface to speed up TiKV Region splitting - Add the
GCSafePoint
interface to support distributed GC in TiDB - Add the
GetAllStores
interface, to support distributed GC in TiDB - pd-ctl supports:
- pd-recover doesn't need to provide the
max-replica
parameter
- Add the
-
Metrics
- Add related metrics for
Filter
- Add metrics about etcd Raft state machine
- Add related metrics for
-
Performance
- Optimize the performance of Region heartbeat to reduce the memory overhead brought by heartbeats
- Optimize the Region tree performance
- Optimize the performance of computing hotspot statistics
- Fix the issues related to
pd-ctl
reading the Region key #1298 #1299 #1308 - Fix the issue that the
regions/check
API returns the wrong result #1311 - Fix the issue that PD cannot restart join after a PD join failure #1279
- Fix the issue that
watch leader
might lose events in some cases #1317
- Fix the issue that the tombstone TiKV is not removed from Grafana #1261
- Fix the data race issue when grpc-go configures the status #1265
- Fix the issue that the PD server gets stuck caused by etcd startup failure #1267
- Fix the issue that data race might occur during leader switching #1273
- Fix the issue that extra warning logs might be output when TiKV becomes tombstone #1280
- Add the API to get the Region list by size in reverse order #1254
- Return more detailed information in the Region API #1252
- Fix the issue that
adjacent-region-scheduler
might lead to a crash after PD switches the leader #1250
- Support the
GetAllStores
interface - Add the statistics of scheduling estimation in Simulator
- Optimize the handling process of down stores to make up replicas as soon as possible
- Optimize the start of Coordinator to reduce the unnecessary scheduling caused by restarting PD
- Optimize the memory usage to reduce the overhead caused by heartbeats
- Optimize error handling and improve the log information
- Support querying the Region information of a specific store in pd-ctl
- Support querying the topN Region information based on version
- Support more accurate TSO decoding in pd-ctl
- Fix the issue that pd-ctl uses the
hot store
command to exit wrongly
- Introduce the version control mechanism and support rolling update of the cluster with compatibility
- Enable the
region merge
feature - Support the
GetPrevRegion
interface - Support splitting Regions in batch
- Support storing the GC safepoint
- Optimize the issue that TSO allocation is affected by the system clock going backwards
- Optimize the performance of handling Region hearbeats
- Optimize the Region tree performance
- Optimize the performance of computing hotspot statistics
- Optimize returning the error code of API interface
- Add options of controlling scheduling strategies
- Prohibit using special characters in
label
- Improve the scheduling simulator
- Support splitting Regions using statistics in pd-ctl
- Support formatting JSON output by calling
jq
in pd-ctl - Add metrics about etcd Raft state machine
- Fix the issue that the namespace is not reloaded after switching Leader
- Fix the issue that namespace scheduling exceeds the schedule limit
- Fix the issue that hotspot scheduling exceeds the schedule limit
- Fix the issue that wrong logs are output when the PD client closes
- Fix the wrong statistics of Region heartbeat latency
- Enable Raft PreVote between PD nodes to avoid leader reelection when network recovers after network isolation
- Optimize the issue that Balance Scheduler schedules small Regions frequently
- Optimize the hotspot scheduler to improve its adaptability in traffic statistics information jitters
- Skip the Regions with a large number of rows when scheduling
region merge
- Enable
raft learner
by default to lower the risk of unavailable data caused by machine failure during scheduling - Remove
max-replica
frompd-recover
- Add
Filter
metrics
- Fix the issue that Region information is not updated after tikv-ctl unsafe recovery
- Fix the issue that TiKV disk space is used up caused by replica migration in some scenarios
- Do not support rolling back to v2.0.x or earlier due to update of the new version storage engine
- Enable
raft learner
by default in the new version of PD. If the cluster is upgraded from 1.x to 2.1, the machine should be stopped before upgrade or a rolling update should be first applied to TiKV and then PD
- Improve the behavior of the unset scheduling argument
max-pending-peer-count
by changing it to no limit for the maximum number ofPendingPeer
s
- Fix the issue about scheduling of the obsolete Regions
- Fix the panic issue when collecting the hot-cache metrics in specific conditions
- Make the balance leader scheduler filter the disconnected nodes
- Make the tick interval of patrol Regions configurable
- Modify the timeout of the transfer leader operator to 10s
- Fix the issue that the label scheduler does not schedule when the cluster Regions are in an unhealthy state
- Fix the improper scheduling issue of
evict leader scheduler
- Add the
Scatter Range
scheduler to balance Regions with the specified key range
- Optimize the scheduling of Merge Region to prevent the newly split Region from being merged
- Add Learner related metrics
- Fix the issue that the scheduler is mistakenly deleted after restart
- Fix the error that occurs when parsing the configuration file
- Fix the issue that the etcd leader and the PD leader are not synchronized
- Fix the issue that Learner still appears after it is closed
- Fix the issue that Regions fail to load because the packet size is too large
- Support using pd-ctl to scatter specified Regions for manually adjusting hotspot Regions in some cases
- Improve configuration check rules to prevent unreasonable scheduling configuration
- Optimize the scheduling strategy when a TiKV node has insufficient space so as to prevent the disk from being fully occupied
- Optimize hot-region scheduler execution efficiency and add more metrics
- Optimize Region health check logic to avoid generating redundant schedule operators
- Support adding the learner node
- Optimize the Balance Region Scheduler to reduce scheduling overhead
- Adjust the default value of
schedule-limit
configuration - Fix the compatibility issue when adding a new scheduler
- Fix the issue of allocating IDs frequently
- Support splitting Region manually to handle the hot spot in a single Region
- Optimize metrics
- Fix the issue that the label property is not displayed when
pdctl
runsconfig show all
- Support Region Merge, to merge empty Regions or small Regions after deleting data
- Ignore the nodes that have a lot of pending peers during adding replicas, to improve the speed of restoring replicas or making nodes offline
- Optimize the scheduling speed of leader balance in scenarios of unbalanced resources within different labels
- Add more statistics about abnormal Regions
- Fix the frequent scheduling issue caused by a large number of empty Regions