Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBD: "Possible stuck request ... " #230

Open
Lord-Dimwit-Flathead-the-Excessive opened this issue Dec 30, 2024 · 3 comments
Open

NBD: "Possible stuck request ... " #230

Lord-Dimwit-Flathead-the-Excessive opened this issue Dec 30, 2024 · 3 comments

Comments

@Lord-Dimwit-Flathead-the-Excessive
Copy link

Lord-Dimwit-Flathead-the-Excessive commented Dec 30, 2024

I am evaluating s3backer in NBD mode but seeing lots of:
[64768.754431] block nbd0: Possible stuck request 00000000c5cc3961: control (trim/discard@16144924672,1073741824B). Runtime 360 seconds
and
Jun 05 16:21:03 mema kernel: block nbd0: Possible stuck request 000000004f708c80: control (write@11913195> Jun 05 16:21:08 mema kernel: INFO: task kworker/u4:11:7659 blocked for more than 120 seconds. Jun 05 16:21:08 mema kernel: Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1 Jun 05 16:21:08 mema kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 05 16:21:08 mema kernel: task:kworker/u4:11 state:D stack:0 pid:7659 ppid:2 flags:0x00004> Jun 05 16:21:08 mema kernel: Workqueue: btrfs-worker btrfs_work_helper [btrfs]

in dmesg -k

Are these normal?
Why is discard even required?

My .conf file:

--baseURL=https://s3.us-central-1.wasabisys.com/
--accessFile=/etc/s3backer.creds
--size=20T
--blockSize=256K
--listBlocks
--listBlocksThreads=50
--ssl
--encrypt
--passwordFile=/etc/s3backer.pswd
--timeout=90
--blockCacheSize=10000
--blockCacheFile=/opt/s3backer/cachefile
--blockCacheWriteDelay=15000
--blockCacheThreads=10
--blockCacheRecoverDirtyBlocks
--blockCacheNumProtected=1000

@archiecobbs
Copy link
Owner

control (trim/discard@16144924672,1073741824B). Runtime 360 seconds

The discard operation allows you to garbage collect (i.e., delete) S3 objects that are no longer used by the filesystem that is running on top of your s3backer slab (brtfs in this case).

It looks like btrfs has requested to discard 0x40000000 bytes which is 400MB i.e. 4,096 of your 256k s3backer blocks. It's possible perhaps due to network congestion that this operation is taking more than 6 minutes and therefore triggering the warning.

So in summary Linux is complaining about a slow disk, but the disk is really s3backer running over the network, so maybe that's not so surprising.

So there's no hard proof here yet that anything is actually wrong, though of course that's always possible. You'd need to run s3backer in the foreground in debug mode (--debug -f) to get more details on why it's being slow.

@Lord-Dimwit-Flathead-the-Excessive
Copy link
Author

@archiecobbs
Copy link
Owner

I can ... find no way to set the nbd discard timeout duration.

You can completely disable discard by mounting with -o nodiscard. See https://btrfs.readthedocs.io/en/latest/btrfs-man5.html and also https://btrfs.readthedocs.io/en/latest/Trim.html.

Would using a different block size help?

Possibly. If the problem is that it's just taking too long to delete each individual block, then a larger block size would mean fewer blocks to delete per trim operation.

And do I really need to do garbage collection? The blocks will be reused when I write new data, won't they? They are just occupying space on the backend until then. Worst that can happen is I get billed for the whole 20TB I configured the disk for. Not sure I care.

You are correct - this only helps to the extent you don't want to pay for storage of blocks that were created but no longer used. For example, if your disk stays mostly full all the time then you probably don't need this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants