-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NBD: "Possible stuck request ... " #230
Comments
The discard operation allows you to garbage collect (i.e., delete) S3 objects that are no longer used by the filesystem that is running on top of your s3backer slab (brtfs in this case). It looks like btrfs has requested to discard 0x40000000 bytes which is 400MB i.e. 4,096 of your 256k s3backer blocks. It's possible perhaps due to network congestion that this operation is taking more than 6 minutes and therefore triggering the warning. So in summary Linux is complaining about a slow disk, but the disk is really s3backer running over the network, so maybe that's not so surprising. So there's no hard proof here yet that anything is actually wrong, though of course that's always possible. You'd need to run s3backer in the foreground in debug mode ( |
Yes, I thought it likely that these timeouts were to be expected with a high data volume.
I would like to get the notices out of my logs so I don't miss a real issue, however. I can silence the write timeouts via /proc/sys/kernel/hung_task_timeout_secs but can find no way to set the nbd discard timeout duration. Do you have a suggestion? Would using a different block size help?
And do I really need to do garbage collection? The blocks will be reused when I write new data, won't they? They are just occupying space on the backend until then. Worst that can happen is I get billed for the whole 20TB I configured the disk for. Not sure I care. Wasabi is pretty inexpensive.
Paul King
…________________________________
From: Archie L. Cobbs ***@***.***>
Sent: Monday, December 30, 2024 9:38 AM
To: archiecobbs/s3backer ***@***.***>
Cc: Paul King ***@***.***>; Author ***@***.***>
Subject: Re: [archiecobbs/s3backer] NBD: "Possible stuck request ... " (Issue #230)
control ***@***.***,1073741824B). Runtime 360 seconds
The discard operation allows you to garbage collect (i.e., delete) S3 objects that are no longer used by the filesystem that is running on top of your s3backer slab (brtfs in this case).
It looks like btrfs has requested to discard 0x40000000 bytes which is 400MB i.e. 4,096 of your 256k s3backer blocks. It's possible perhaps due to network congestion that this operation is taking more than 6 minutes and therefore triggering the warning.
So in summary Linux is complaining about a slow disk, but the disk is really s3backer running over the network, so maybe that's not so surprising.
So there's no hard proof here yet that anything is actually wrong, though of course that's always possible. You'd need to run s3backer in the foreground in debug mode (--debug -f) to get more details on why it's being slow.
—
Reply to this email directly, view it on GitHub<#230 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFOUF4ANSYAY76FKVDB2KSD2IGAJFAVCNFSM6AAAAABUMKSYWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVG42TCOJYGI>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
You can completely disable discard by mounting with
Possibly. If the problem is that it's just taking too long to delete each individual block, then a larger block size would mean fewer blocks to delete per trim operation.
You are correct - this only helps to the extent you don't want to pay for storage of blocks that were created but no longer used. For example, if your disk stays mostly full all the time then you probably don't need this. |
I am evaluating s3backer in NBD mode but seeing lots of:
[64768.754431] block nbd0: Possible stuck request 00000000c5cc3961: control (trim/discard@16144924672,1073741824B). Runtime 360 seconds
and
Jun 05 16:21:03 mema kernel: block nbd0: Possible stuck request 000000004f708c80: control (write@11913195> Jun 05 16:21:08 mema kernel: INFO: task kworker/u4:11:7659 blocked for more than 120 seconds. Jun 05 16:21:08 mema kernel: Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1 Jun 05 16:21:08 mema kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 05 16:21:08 mema kernel: task:kworker/u4:11 state:D stack:0 pid:7659 ppid:2 flags:0x00004> Jun 05 16:21:08 mema kernel: Workqueue: btrfs-worker btrfs_work_helper [btrfs]
in
dmesg -k
Are these normal?
Why is discard even required?
My .conf file:
The text was updated successfully, but these errors were encountered: