-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux: syncfs(2) should sync all cached files #16818
base: master
Are you sure you want to change the base?
Conversation
67d6731
to
88d1c8e
Compare
This is from a run where I still had mistakenly used
|
While debugging problem with data not being synced on umount, we've noticed that even syncfs(2) doesn't help to get the data written out. It is because it doesn't actually sync any of the live znodes. Signed-off-by: Pavel Snajdr <[email protected]>
So I'll use I was probably overthinking it anyway, as it seems that |
Yes, don't go down that road on I'm not sure I understand the premise here. Granted, not easy to know what If it's just a request, seems like what we have should be fine. If its a wait, then why can we not just call Though, it sounds like you might be talking about getting Ahh, now I see Yeah ok, I think your original is the right shape then (haven't thought about the locking). Flush out the maps if we can, then the rest of the filesystem. |
@robn getting the dirty pages onto the ZPL so it can deal with them <somehow> would be what I would expect It would seem that it doesn't matter if a file was open with |
There's also this issue with snapshots - if you're taking a snapshot, wouldn't you expect it to contain all the dirty data of that moment, when the snapshot was taken? |
Hmm there's another option, I could just call |
Signed-off-by: Pavel Snajdr <[email protected]>
I wouldn't necessarily from the manpage, but I've just read the Linux code and (via
Yes, I don't think As for So that's good: seems like we're doing the right thing wrt
I mean, I wouldn't expect it to contain anything that hasn't been explicitly committed, but I've got a bee in my bonnet about So I guess it's easy to argue that a txg sync should also push out dirty mapped pages. It's also easy to argue that the existing semantics are clear: until you Personally, I would be fine with flushing maps on |
I'll look into this, thx!
Agreed. |
@robn tbh I don't see why it wouldn't work from the code, what info are we supposed to fill out? I'll give it more than a few tens of minutes, of course - but if you already know why it could save me some time |
mutex_enter(&zfsvfs->z_znodes_lock); | ||
for (zp = list_head(&zfsvfs->z_all_znodes); zp; | ||
zp = list_next(&zfsvfs->z_all_znodes, zp)) { | ||
if (zp->z_sa_hdl) | ||
error = zpl_writepages(ZTOI(zp)->i_mapping, &wbc); | ||
if (error != 0) | ||
break; | ||
} | ||
mutex_exit(&zfsvfs->z_znodes_lock); | ||
if (error == 0) | ||
error = -zfs_sync(sb, wait, cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without looking on anything else, I wonder if it would be better to sync as much as we can even in case of errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably yeah, I'll just need to figure out a way that doesn't call zil_commit
at all, because that can deadlock if there's an IO error, apparently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (admittedly rare) situation of my external USB drive cages occasionally dropping a connection to a disk, causing an I/O error, which then freezes the pool and leaves me unable to reboot without a power cycle, isn't great, so thank you for trying to avoid deadlocks on IO errors.
I've been reading code this afternoon, but it's very slow, and I don't have much time or brain for it, so I might be a while yet. Mostly, I'm trying to make sure we're definitely hold mmap and friends correctly. Is there a small, isolated reproduction available? We'll need something we can put in the test suite anyway, but also, I haven't seen this myself, and I see mentions of Docker and/or containers in relation to this, which I'm not set up for. But, I also wonder if it is only seen in containers. that might be part of the problem. Nothing concrete yet, but it does seem like there's a bit more complication in the kernel if an alternate memcg is in play. Enough to make me wonder if hmm, is this only about containers? So yeah, some easy repro would be great, and I'll do more reading when I can. |
@robn thanks for looing into it; a repro for this would be along the lines of this, only with |
Motivation and Context
While debugging problem with data not being synced on umount for #16817, we've noticed that even
syncfs(2)
doesn't help to get the data written out correctly. It is because it doesn't actually sync any of the live znodes.Description
I've used
zpl_fsync
unlikefilemap_write_and_wait
in #16817 since that one callszil_commit
and thissyncfs(2)
is about on-disk permanence - I think that if the file is opened withO_SYNC
, all the dirtied pages should be flushed right away so perhapsfilemap_write_and_wait
would suffice, but I'm not 100% sure (and this is how we fixed it for the time being on our infra)edit: it seems that writes to mmaped pages aren't flushed immediately, unless
msync(2)
is called - there is theMAP_SYNC
mmap flag, but it's not supported on FS that doesn't support DAXedit: see discussion below please
How Has This Been Tested?
Locally along with #16817
Types of changes
Checklist:
Signed-off-by
.