Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: sync mapped data on umount #16817

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions module/os/linux/zfs/zfs_vfsops.c
Original file line number Diff line number Diff line change
Expand Up @@ -1546,9 +1546,25 @@ void
zfs_preumount(struct super_block *sb)
{
zfsvfs_t *zfsvfs = sb->s_fs_info;
znode_t *zp;

/* zfsvfs is NULL when zfs_domount fails during mount */
if (zfsvfs) {
/*
* Since we have to disable zpl_prune_sb when umounting,
* because the shrinker gets freed before zpl_kill_sb is
* ever called, the umount might be unable to sync open files.
*
* Let's do it here.
*/
mutex_enter(&zfsvfs->z_znodes_lock);
for (zp = list_head(&zfsvfs->z_all_znodes); zp;
zp = list_next(&zfsvfs->z_all_znodes, zp)) {
if (zp->z_sa_hdl)
filemap_write_and_wait(ZTOI(zp)->i_mapping);
}
mutex_exit(&zfsvfs->z_znodes_lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have expected this writeback to happen in iput()->iput_final()-> when the last reference on the inode is dropped. Clearly that isn't happening, we'll need to get to the bottom of why.

Copy link
Contributor Author

@snajpa snajpa Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you have any hints where to look what can be relevant if I can only reproduce this on older pools? With a sufficiently new pool (cca 2022+) I can't reproduce it and the data ends up consistently on disk as it should...

turns out this was the only difference between my dev setup, where it doesn't reproduce - and the rest where it does, production nodes tend to have pools from install time of that machine (and sometimes it goes back a few HW generations also) - I just didn't see how it could be relevant so I left it as a last thing to try - and boom :D

if I create a new pool with exactly the same set of features as the older pools, I get nothing, so it really must be an older pool - tried meddling with xattr=on|sa too (this is all with xattr=on FWIW)

wasn't there a difference in how the root znode/dentry of a dataset is set up? could that be relevant? what I don't understand is how could it, if we're creating new datasets using new code now?


zfs_unlinked_drain_stop_wait(zfsvfs);
zfsctl_destroy(sb->s_fs_info);
/*
Expand Down