Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement statx(STATX_DIOALIGN) so applications can discover correct O_DIRECT alignment #16972

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions include/sys/zfs_vnops.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
*/
/*
* Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2025, Rob Norris <[email protected]>
*/

#ifndef _SYS_FS_ZFS_VNOPS_H
Expand All @@ -42,6 +43,8 @@ extern int zfs_clone_range_replay(znode_t *, uint64_t, uint64_t, uint64_t,
extern int zfs_getsecattr(znode_t *, vsecattr_t *, int, cred_t *);
extern int zfs_setsecattr(znode_t *, vsecattr_t *, int, cred_t *);

extern int zfs_get_direct_alignment(znode_t *, uint64_t *);

extern int mappedread(znode_t *, int, zfs_uio_t *);
extern int mappedread_sf(znode_t *, int, zfs_uio_t *);
extern void update_pages(znode_t *, int64_t, int, objset_t *);
Expand Down
13 changes: 13 additions & 0 deletions module/os/linux/zfs/zpl_inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
/*
* Copyright (c) 2011, Lawrence Livermore National Security, LLC.
* Copyright (c) 2015 by Chunwei Chen. All rights reserved.
* Copyright (c) 2025, Rob Norris <[email protected]>
*/


Expand All @@ -30,6 +31,7 @@
#include <sys/zfs_vnops.h>
#include <sys/zfs_znode.h>
#include <sys/dmu_objset.h>
#include <sys/spa_impl.h>
#include <sys/vfs.h>
#include <sys/zpl.h>
#include <sys/file.h>
Expand Down Expand Up @@ -490,6 +492,17 @@ zpl_getattr_impl(const struct path *path, struct kstat *stat, u32 request_mask,
}
#endif

#ifdef STATX_DIOALIGN
if (request_mask & STATX_DIOALIGN) {
uint64_t align;
if (zfs_get_direct_alignment(zp, &align) == 0) {
stat->dio_mem_align = PAGE_SIZE;
stat->dio_offset_align = align;
stat->result_mask |= STATX_DIOALIGN;
}
}
#endif

#ifdef STATX_ATTR_IMMUTABLE
if (zp->z_pflags & ZFS_IMMUTABLE)
stat->attributes |= STATX_ATTR_IMMUTABLE;
Expand Down
19 changes: 19 additions & 0 deletions module/zfs/zfs_vnops.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
* Copyright (c) 2015 by Chunwei Chen. All rights reserved.
* Copyright 2017 Nexenta Systems, Inc.
* Copyright (c) 2021, 2022 by Pawel Jakub Dawidek
* Copyright (c) 2025, Rob Norris <[email protected]>
*/

/* Portions Copyright 2007 Jeremy Teo */
Expand Down Expand Up @@ -1083,6 +1084,24 @@ zfs_setsecattr(znode_t *zp, vsecattr_t *vsecp, int flag, cred_t *cr)
return (error);
}

/*
* Get the optimal alignment to ensure direct IO can be performed without
* incurring any RMW penalty on write. If direct IO is not enabled for this
* file, returns an error.
*/
int
zfs_get_direct_alignment(znode_t *zp, uint64_t *alignp)
{
zfsvfs_t *zfsvfs = ZTOZSB(zp);

if (!zfs_dio_enabled || zfsvfs->z_os->os_direct == ZFS_DIRECT_DISABLED)
return (SET_ERROR(EOPNOTSUPP));

*alignp = MAX(zp->z_blksz, PAGE_SIZE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may misremember, but I am not sure z_blksz is always a power of 2 for files of one block, and respectively always a multiple of PAGE_SIZE. I wonder if this still needs some logic from zfs_write().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be fine with z_blksz always being a power of two, even for files with a single block. But as mentioned before I think we'll see some confusion/problems with the file offset alignment restriction changing as that first block grows.

Thinking about this some more, it seems like the best thing to do is return MAX(zp->z_blksz, PAGE_SIZE) once multiple blocks have been allocated and thus the block size is fixed for the lifetime of the file. For files with only a single block we'd return MAX(zfsvfs->z_max_blksz, PAGE_SIZE). This has a couple advantages.

  1. It ensures that for new files, and files smaller than the max record size, the file offset alignment returned will work for Direct I/O even when the file size grows and multiple blocks are required.

  2. For existing files which already have multiple blocks we'll return the optimal size for the file.

The downside is that's a bit more restrictive that strictly required, but I think that's preferable to changing the alignment requirement after the application has checked for it using statx().

We'd definitely want to update statx_dioalign.ksh to include some version of this test.

  1. Set direct=standard.
  2. Create a zero-length file
  3. Open the file.
  4. Use statx(2) to grab the alignment.
  5. Append to the file using Direct IO and the provided alignments.
  6. Verify those writes never fail.


return (0);
}

#ifdef ZFS_DEBUG
static int zil_fault_io = 0;
#endif
Expand Down
2 changes: 1 addition & 1 deletion tests/runfiles/common.run
Original file line number Diff line number Diff line change
Expand Up @@ -975,7 +975,7 @@ tests = ['sparse_001_pos']
tags = ['functional', 'sparse']

[tests/functional/stat]
tests = ['stat_001_pos']
tests = ['stat_001_pos', 'statx_dioalign']
tags = ['functional', 'stat']

[tests/functional/suid]
Expand Down
9 changes: 8 additions & 1 deletion tests/test-runner/bin/zts-report.py.in
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,11 @@ idmap_reason = 'Idmapped mount needs kernel 5.12+'
#
cfr_reason = 'Kernel copy_file_range support required'

#
# Some statx fields are not supported by all kernels
#
statx_reason = 'Needed statx(2) field not supported on this kernel'

if sys.platform.startswith('freebsd'):
cfr_cross_reason = 'copy_file_range(2) cross-filesystem needs FreeBSD 14+'
else:
Expand Down Expand Up @@ -293,7 +298,8 @@ if sys.platform.startswith('freebsd'):
'block_cloning/block_cloning_cross_enc_dataset':
['SKIP', cfr_cross_reason],
'block_cloning/block_cloning_copyfilerange_cross_dataset':
['SKIP', cfr_cross_reason]
['SKIP', cfr_cross_reason],
'stat/statx_dioalign': ['SKIP', 'na_reason'],
})
elif sys.platform.startswith('linux'):
maybe.update({
Expand Down Expand Up @@ -361,6 +367,7 @@ elif sys.platform.startswith('linux'):
'mmp/mmp_active_import': ['FAIL', known_reason],
'mmp/mmp_exported_import': ['FAIL', known_reason],
'mmp/mmp_inactive_import': ['FAIL', known_reason],
'stat/statx_dioalign': ['SKIP', 'statx_reason'],
})


Expand Down
2 changes: 2 additions & 0 deletions tests/zfs-tests/cmd/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
/rename_dir
/rm_lnkcnt_zero_file
/send_doall
/statx
/stride_dd
/threadsappend
/user_ns_exec
Expand All @@ -53,3 +54,4 @@
/skein_test
/sha2_test
/idmap_util
/statx
1 change: 1 addition & 0 deletions tests/zfs-tests/cmd/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ if BUILD_LINUX
scripts_zfs_tests_bin_PROGRAMS += %D%/getversion
scripts_zfs_tests_bin_PROGRAMS += %D%/user_ns_exec
scripts_zfs_tests_bin_PROGRAMS += %D%/renameat2
scripts_zfs_tests_bin_PROGRAMS += %D%/statx
scripts_zfs_tests_bin_PROGRAMS += %D%/xattrtest
scripts_zfs_tests_bin_PROGRAMS += %D%/zed_fd_spill-zedlet
scripts_zfs_tests_bin_PROGRAMS += %D%/idmap_util
Expand Down
Loading
Loading