-
System information
Describe the problem you're observingMy L2ARC fill ratio stays at 2% and therefore makes the cache for less useful than it could be. The graphs below show that after reboot (around 14:00), my L2ARC cache warms up and fills up completely. Then over the next 24h, it bleeds out to only around 3-4GB and remains there. I presume that the cache is expiring items, as the box is pretty idle. However this behavior is wrong, as the L2ARC space isn't competing with anything (in contrast to ARC space which is competing with system memory). In short, I never want L2ARC items to expire without pressure. Am I missing a tunable for that? Note that my zpool is a raidz2 and currently a leg is missing:
Here are some graphs from my Prometheus node exporter: Describe how to reproduce the problemCreate an raidz2 array, attach 2 cache devices, reboot, and monitor your L2ARC size over the next hours without pressure. Include any warning/errors/backtraces from the system logsARC summary: arc_summary.txt I would appreciate any hints. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
I wonder what the 'zpool iostat -v' looks like. I think there are 2 ways to expire the cache unless there is a bug: stream of new l2arc writes and deletions of files/snapshots/datasets. iostat for cache device may give a clue what is going on. I wonder if there a small set of files which is constantly overwritten with new data which makes l2arc rotate and get filled with deleted blocks. |
Beta Was this translation helpful? Give feedback.
-
TL:DR; A misconfigured qemu VM created the guest's memory as file on a ZFS partition and generated tons of useless IO. The L2ARC filled up with short-lived blocks, and in the end all L2ARC blocks were dead. Can we new L2ARC writes target dead block space over pressuring blocks that are alive? @IvanVolosyuk: Thanks for your poke towards iostat. My assertion that this is an system without pressure turned out to be wrong. I finally found that I had a VM instance running that was generating ZFS IO even when the VM was idle. It turned out that the backing file for shared memory access between the host and the guest landed not in /dev/shm but on my ZFS root fs. Shared memory is required for the virtiofsd access that I use between guest/host. Of course, the memory of a VM should not actually be a ZFS backed file. /dev/shm (=tmpfs) is the right location for that. That my config ever worked is surprising. So Ivan, your speculation that a small set of files (=1 in this case) get constantly overwritten is exactly what happened here, probably filling up the L2ARC with blocks that soon after get deleted. What's surprising though is, that L2ARC writes wouldn't overwrite dead L2ARC blocks. Instead they seem to evict blocks that are alive and eventually the L2ARC ends up with mostly dead blocks. Can we make dead block space to be the target of L2ARC writes? |
Beta Was this translation helpful? Give feedback.
-
@clefru thanks for following up. Glad you got to the root cause. The L2ARC is implemented as an on-disk ring buffer which is why eventually all of the block were overwritten. This is at the heart of the design and wouldn't be easy to change. |
Beta Was this translation helpful? Give feedback.
TL:DR; A misconfigured qemu VM created the guest's memory as file on a ZFS partition and generated tons of useless IO. The L2ARC filled up with short-lived blocks, and in the end all L2ARC blocks were dead. Can we new L2ARC writes target dead block space over pressuring blocks that are alive?
@IvanVolosyuk: Thanks for your poke towards iostat. My assertion that this is an system without pressure turned out to be wrong. I finally found that I had a VM instance running that was generating ZFS IO even when the VM was idle. It turned out that the backing file for shared memory access between the host and the guest landed not in /dev/shm but on my ZFS root fs. Shared memory is required for the…