Detect a slow raidz child during reads #16900
Open
+347
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
There is a concern, and has been observed in practice, that a slow disk can bring down the overall read performance of raidz. Currently in ZFS, a slow disk is detected by comparing the disk read latency to a custom threshold value, such as 30 seconds. This can be tuned to a lower threshold but that requires understanding the context in which it will be applied. And hybrid pools can have a wide range of expected disk latencies.
What might be a better approach, is to identify the presence of a slow disk outlier based on its latency distance from the latencies of its peers. This would offer a more dynamic solution that can adapt to different type of media and workloads.
Description
The solution proposed here comes in two parts
Detecting Outliers
The most recent latency value for each child is saved in the
vdev_t
. Then periodically, the samples from all the children are sorted and a statistical outlier can be detected if present. The code uses a Tukey's fence, with K = 2, for detecting extreme outliers. This rule defines extreme outliers as data points outside the fence of the third quartile plus two times the Interquartile Range (IQR). This range is the distance between the first and third quartile.Sitting Out
After a vdev has encounter multiple outlier detections (> 50), it is marked for being in a sit out period that by default lasts for 10 minutes.
Each time a slow disk is placed into a sit out period, its
vdev_stat.vs_slow_ios count
is incremented and azevent
classereport.fs.zfs.delay
is posted.The length of the sit out period can be changed using the
raid_read_sit_out_secs
module parameter. Setting it to zero disables slow outlier detection.Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
How Has This Been Tested?
Tested with various configs, including dRAID.
For an extreme example, an HDD was used in an 8 wide SSD raidz2 and it was compared to taking the HDD offline. This was using a
fio(1)
streaming read workload across 4 threads to 20GB files. Both the record size and IO request size were 1MB.Also measured the cost over time of vdev_child_slow_outlier() where the statistical analysis occurs (every 20ms).
Types of changes
Checklist:
Signed-off-by
.