-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ZED tunable for disk label wait during auto-replace #15327
Conversation
4c1a1f4
to
fd9c23e
Compare
# to show up before giving up. The time is in milliseconds and | ||
# defaults to 3000 if not specified here. | ||
# | ||
#ZED_DISK_LABEL_WAIT_TIME=3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default is now 30,000 ms right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the default for ZED is still 3000 ms. And the default for zpool create
, i.e., DISK_LABEL_WAIT
, remains 30,000 ms. This PR currently is only making it tunable.
In the ZED case, it is waiting after consuming a disk add event for the partition (after the partition was made and sdh1
exists, but other links like under /dev/disk/by-id/
might be inflight) , whereas in the zpool create
case it is waiting just after issuing the partition request (before partitioning may have even started) so it makes sense that it is a bit longer.
But you bring up a good point, the wait time is the maximum time it will wait so why not make the default longer for an auto-replace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does seem like we are here fixing this because the wait is too short, so yeah, changing the default while here seems like a good idea
During an auto-replace of a VDEV device, the device is labeled asynchronously in ZED and it waits until the partition link appears before proceeding with the zpool_vdev_attach(). Added a new ZED configuration tunable, ZED_DISK_LABEL_WAIT_TIME which can override the previously hard-coded value of 3000 ms. Sponsored-By: OpenDrives Inc. Sponsored-By: Klara Inc. Signed-off-by: Don Brady <[email protected]>
fd9c23e
to
a49613d
Compare
|
||
static int | ||
zed_lookup_tunable(const char *zedlet_dir, const char *tuneable, | ||
unsigned int *wait_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is meant to be generic, maybe rename wait_time
to tunable_value
or something
Maybe also name the function for the type, zed_lookup_uint_tunable()
, incase there are other types in the future
#define CONFIG_LINE_MAX 100 | ||
|
||
static int | ||
zed_lookup_tunable(const char *zedlet_dir, const char *tuneable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zed_lookup_tunable()
is basically a lightweight shell parser. It's going to get tripped up if you do things like ZED_DISK_LABEL_WAIT = 1
or ZED_DISK_LABEL_WAIT=1
or:
if dmidecode | grep "system-that-needs-more-spin-up-time" ; then
ZED_DISK_LABEL_WAIT = 6000
else
ZED_DISK_LABEL_WAIT = 3000
fi
I was thinking you could just "execute" zed.rc and then print out all its variables. Or more specifically, have zed call libzfs_run_process_get_stdout_nopath()
with bash -c 'set -o allexport && source zed.rc && env'
as the command. That function will return an array of lines of output which you can extract the tuneable
from. You can use vdev_run_cmd()
as a template for how to use libzfs_run_process_get_stdout_nopath()
since it's a little complicated.
I am happy that with this PR we would now have a true zed-readable config file (not just for zedlets)! That's something we've wanted for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sscanf
does correctly process whitespace on either side of the '='. I tested with many variations, including tab characters.
The format of zed.rc is only implied (name=value) but not otherwise defined AFAIKT. If it is intended to be a bash script then I'm more inclined to used a different config file for the non-zedlet tunables. I'd like to keep it simple and useable for other future non-zedlet tunables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's good that it's interpreting white-space correctly.
The format of zed.rc is only implied (name=value) but not otherwise defined AFAIKT.
The rc
(run command) extension typically implies the file is executed as configuration commands. Since we're already executing zed.rc
at the beginning of our zedlets, I think it's fair for users to assume it's a script and that they could put other shell commands in there.
I think you're right that a separate zed daemon config file might be the way to go.
Further investigations revealed that the issue causing auto-replace to fail was caused by a regression in ZED and not caused by a short waiting period. For VDEV paths that are under I'll open a different PR to address the regression and close this one. |
Motivation and Context
During an auto-replace of a VDEV device, the device is labeled asynchronously in ZED and it waits until the partition link appears before proceeding with the zpool_vdev_attach().
We have seen cases where ZED doesn't wait long enough for the partition link to appear and so the auto-replace fails. Below is the ZED log message observed:
Note that during a
zpool create
orzpool add
it waits up to 30 seconds when callingzpool_label_disk_wait()
.Description
Added a new ZED configuration tunable,
ZED_DISK_LABEL_WAIT_TIME
which can override the previously hard-coded value of 3000 ms.Also re-worded the log message when the timeout is reached before the link appears:
Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
How Has This Been Tested?
Tested manually and with ZTS
functional/fault/auto_replace_001_pos
Types of changes
Checklist:
Signed-off-by
.