Add an API to support splitting a given range into shards individually #11510

DuanChangfeng0708 · 2024-07-15T06:55:42Z

Writing a continuous Key space under heavy pressure can cause hot spots, causing storage_server_write_queue_size or even process_behind. However, such hot spots may be known to the upper layer services but cannot be avoided. I wish there was an API that partitioned the shards ahead of time so that the write load was evenly distributed across multiple storageservers. In this way, the upper business can actively divide the Keys space to reduce hot spots.

kakaiu · 2024-07-15T16:15:02Z

Thanks for your information. @DuanChangfeng0708 Is the skewed traffic primarily from update operations (or insertions following deletions) or from insert operations? Which FDB version are you using?

DuanChangfeng0708 · 2024-07-16T02:02:04Z

Thanks for your information. @DuanChangfeng0708 Is the skewed traffic primarily from update operations (or insertions following deletions) or from insert operations? Which FDB version are you using?

I'm using version 7.1.27. Sequential writes under heavy pressure can cause this problem, especially after a keyRange has been cleaned up over a long period of time. In the end, we found out that all the keyranges written with the same prefix were concentrated on a few StorageServers, so the version of StorageServer was too backward. A process_behind error occurred while RateKeeper was fetching the ServerList, which limited the LimitTPS to 0, causing service interruption.
The root cause is that Keys with the same prefix are all concentrated on the same shard, causing a hot spot.
In our upper layer, for better concurrency and to avoid hot spots, we also introduced the concept of shards by appending hash strings to keys with the same prefix, but it doesn't seem to work for fdb because it's still in an fdb shard. So I would like to have an api where the business layer can control the granularity of the fdb shard to make the pressure load more even and avoid hot spots.

kakaiu · 2024-07-16T19:34:40Z

Thanks! Currently, we have an experimental feature for manual shard split in release-7.1.
The fdbcli is redistribute <BeginKey> <EndKey>.
The input range is passed to data distributor and DD issues data moves including all data within the range. Suppose current shard boundary is [a, c), [c, d), [d, f) and the input range is [b, d), the DD splits the shard [a, c) into [a, b) and [b, c) and triggers data moves for [a, b), [b, c), and [c, d).
On the main branch, we are developing a better solution for this requirement.

DuanChangfeng0708 · 2024-07-19T02:34:20Z

Thanks! Currently, we have an experimental feature for manual shard split in release-7.1. The fdbcli is redistribute <BeginKey> <EndKey>. The input range is passed to data distributor and DD issues data moves including all data within the range. Suppose current shard boundary is [a, c), [c, d), [d, f) and the input range is [b, d), the DD splits the shard [a, c) into [a, b) and [b, c) and triggers data moves for [a, b), [b, c), and [c, d). On the main branch, we are developing a better solution for this requirement.

I have read the relevant implementation code and I believe that custom shards should be persistent and should never be merged.

Consider a scenario (such as backend tasks) where a batch of tasks (with the same prefix) is created in batches at 1am every day, and all tasks are completed and cleared at 2am. If that's the case, keys written at 1am will experience hotspots due to having the same prefix. After deleting at 2am, the related shards will merge again, and the same situation will occur the next day. The purpose of customizing shards to evenly distribute the load has not been achieved.

kakaiu · 2024-07-21T00:26:12Z

Thanks! Currently, we have an experimental feature for manual shard split in release-7.1. The fdbcli is redistribute <BeginKey> <EndKey>. The input range is passed to data distributor and DD issues data moves including all data within the range. Suppose current shard boundary is [a, c), [c, d), [d, f) and the input range is [b, d), the DD splits the shard [a, c) into [a, b) and [b, c) and triggers data moves for [a, b), [b, c), and [c, d). On the main branch, we are developing a better solution for this requirement.

I have read the relevant implementation code and I believe that custom shards should be persistent and should never be merged.

Consider a scenario (such as backend tasks) where a batch of tasks (with the same prefix) is created in batches at 1am every day, and all tasks are completed and cleared at 2am. If that's the case, keys written at 1am will experience hotspots due to having the same prefix. After deleting at 2am, the related shards will merge again, and the same situation will occur the next day. The purpose of customizing shards to evenly distribute the load has not been achieved.

Yes. On the main branch/release-7.3, we have introduced another experimental feature command rangeconfig. This command was initially developed to create a larger team with more storage servers to handle high read traffic on a specific range. This command allows for the customization of shard boundaries, with these custom boundaries being persisted.

DuanChangfeng0708 · 2024-08-01T03:47:52Z

If this requirement is to be done, there is a detail that needs to be noted. I want the range with the same prefix to be dispersed as much as possible among different teams, so I need to calculate the ratio of this range on each team.

DuanChangfeng0708 closed this as completed Jul 16, 2024

DuanChangfeng0708 reopened this Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an API to support splitting a given range into shards individually #11510

Add an API to support splitting a given range into shards individually #11510

DuanChangfeng0708 commented Jul 15, 2024

kakaiu commented Jul 15, 2024 •

edited

Loading

DuanChangfeng0708 commented Jul 16, 2024

kakaiu commented Jul 16, 2024 •

edited

Loading

DuanChangfeng0708 commented Jul 19, 2024

kakaiu commented Jul 21, 2024 •

edited

Loading

DuanChangfeng0708 commented Aug 1, 2024

Add an API to support splitting a given range into shards individually #11510

Add an API to support splitting a given range into shards individually #11510

Comments

DuanChangfeng0708 commented Jul 15, 2024

kakaiu commented Jul 15, 2024 • edited Loading

DuanChangfeng0708 commented Jul 16, 2024

kakaiu commented Jul 16, 2024 • edited Loading

DuanChangfeng0708 commented Jul 19, 2024

kakaiu commented Jul 21, 2024 • edited Loading

DuanChangfeng0708 commented Aug 1, 2024

kakaiu commented Jul 15, 2024 •

edited

Loading

kakaiu commented Jul 16, 2024 •

edited

Loading

kakaiu commented Jul 21, 2024 •

edited

Loading