Replies: 1 comment 5 replies
-
I suspect it might be related to block cloning, used by mv between dataset. While in theory after source file is deleted, respective records in BRT table should be deleted also, reducing memory consumption back. In perfect case if it happen within the same transaction group it should not even reach the disk. But I guess that in case of a large files window between cloning and deleting can be big enough to require on-disk BRT writes, that may cause unaccounted additional memory overhead that might be impossible to evict on kernel request, that might trigger the OOMs you see. Last week I've made couple PR to reduce the memory overhead a bit, but that is a drop in the bucket. It needs a deeper look. PS: What recordsize do you use and what are the sizes of files you move? |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm quite new to ZFS (coming from btrfs) and am currently attempting to mv my data from a pool's root into a dataset:
The process starts fine, though a bit slow for my taste as it should be using reflinks, not copying (bcloneused rising), then slows down more and more until it stalls the whole PC and I get messages about processes getting killed due to going OOM.
In htop I can see most of my CPU usage coming from the kernel, the mv process being stuck in disk sleep most of the time and weirdly, even starting something like a browser gets stuck in disk sleep even thought it's on a completely different disk (SSD) and even controller!
Memory usage in green (=used) also seems to be steadily climbing, with cache (orange) declining.
What I have tried so far (one fix active at a time) was:
zfs_arc_shrinker_limit=0
-> helps a bit, but on large mv operations it still runs OOMmin_ttl_ms=0
does not really help, OOM under 20 minutesThe last three fix attempts come from reading #10255
At this point, having heard all about ZFS stability and ease of use, I'm most certainly doing something wrong. It does not really help that nothing zfs related seems to be popping up as using memory in any monitoring tools I tried so far.
Any ideas?
Beta Was this translation helpful? Give feedback.
All reactions