RFC: ABD chunk iterator #16848

robn · 2024-12-09T08:31:11Z

[Sponsors: Klara, Inc., Wasabi Technology, Inc.]

Motivation and Context

This is a prototype of "chunk" iterators for ABDs.

This is attempting to solve the problems (which of course I assert are; we can discuss that if you wish):

using an iterator is difficult, requiring a callback function and a state structure. This in turn makes ABDs difficult to use; much harder than the simple character buffers they replace
the need for a callback add function call overhead for every iteration (granted, it's likely negligible in most cases)
the iterator structure requires special knowledge of every ABD type, making adding new ABD types difficult
iterators are byte-oriented, even though their underlying storage is not
- we're rarely (never?) interested arbitrarily-sized runs of bytes
- advancing by bytes can land you in the "same" chunk, but the caller can't know this, so has to do an unmap/map cycle, which can be expensive
- writing more exotic iterators (reverse, repeat) is extremely difficult because its not easy to know how much iteration will or should yield

This PR proposes an alternative iterator that expressly yields "chunks", that is, an object (struct) representing an arbitrary run of bytes. The caller can than take operations against the chunk (request size, map, unmap, etc) separately. All iterator movement then becomes simply moving to the next chunk. The much simpler housekeeping required allows an iterator to be used directly rather than through a control function, addressing the usability concerns.

For this PR I'm looking for rough consensus on the goals and interface.

Description

For easy reading online:

See the individual commits and their comments for more details. In the commits and their comments:

definition and description of chunks, iterators and their supporting functions
an implementation of abd_iterate_func in terms of chunk iterators
conversion of many callers of abd_iterate_func to use a simpler iterator loop
an implementation of abd_iterate_func2 in terms of chunk iterators
two possible implementations of a abd_for_each_chunk macro that hides the details of the most common style of iterator loop (map each chunk)

This PR is largely aimed at getting the interface right; the implementation details are unimportant for now. My intention when/if this passes muster is to entirely remove the existing iterator code, including the callback iterators, so none of this implementation will carry over. So don't worry about it, but do point out anything you think might not be implementable.

The only thing not here is a replacement for abd_iterate_page_func(). I don't think it's hard, just a bit fiddly - needs a way to get the page pointer and data offset/size within for a chunk, and a way to indicate that each chunk should be a page, even if it could be bigger (linear), and what to do with compound pages. Maybe abd_chunk_start() will gain a flags parameter. I'd see what falls out of the "real" implementation and what feels right. I don't want to overthink it, but I still want it to be usable - more easily sharing pages with the kernel is gonna be useful!

How Has This Been Tested?

Light sanity runs, as befits a prototype.

Types of changes

One or more of:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

The existing iterators do not have any way for an external user to get the size of the data within the current iteration. This lifts that out of abd_iter_map() and exposes it as abd_iter_size() for the chunk iterator to use. FreeBSD not included for now, since this is just for my own testing.

For now, these are implemented in terms of the old iterators.

There is no real overhead change here, because it still has to call the callback function, which has to manage its own state, but the code is much simpler.

This shows how this form of iterator can be much simpler to work with. No need to set up separate state struct and callback function, just write a loop.

This version is not really simpler. (Though it did help me to see a small efficiency gain: the existing version will remap each chunk when either iterator advances. This version does not). I suspect that users of abd_iterate_func2 will be able to do a better job by using chunk iterators directly, rather than through callbacks, because they know more about the incoming data format and how and when to advance it.

This is where I really wanted to get to. Rather then needing the loop boilerplate every time, have a macro that hide all that away, having a use that looks more like a plain old for loop. This method has two downsides: - the data and size vars have to be declared outside the loop - an early exit (break or return) is not possible, as there's no place to unmap the chunk.

By making the loop block an arg to the macro, we can wrap it, and solve both the problems in the previous version. We can declare the data & size vars ourselves, so the caller doesn't have to, and we can run at loop exit, so can unmap the chunk properly (alas, we can't do it for an early return, but that's a much rarer thing to want than an early break). The cost is a tiny bit of readbility, as a code block as a macro arg is a little less familiar than one following a loop operator. The extra safety seems worth it to me.

amotin · 2024-12-11T20:45:11Z

include/sys/abd_impl.h

+ * - void abd_chunk_advance(abd_chunk_t *ch)
+ *
+ *   Move the iterator to the next chunk. If there is no next chunk, the
+ *   iterator is changed to the "done" state and is no longer useable.
+ *
+ * - boolean_t abd_chunk_done(abd_chunk_t *ch)
+ *
+ *   If true, the iterator is pointing to a valid chunk, and the underlying
+ *   memory can be accessed with the access functions. If false, the iterator
+ *   is exhausted and no longer useable.


Definitions of "done" in those paragraphs seems to be opposite.

amotin · 2024-12-12T16:53:04Z

using an iterator is difficult, requiring a callback function and a state structure.

The state structure seems not going anywhere. Ability to avoid callbacks and do thing inline indeed makes it much more flexible, but the macro style makes me shiver similar to style checker. I like the direction, but not so much the specific implementation.

the iterator structure requires special knowledge of every ABD type, making adding new ABD types difficult

I am not sure you are doing much about it here. For now you just included the old iterator with its code as is, but I worry that once you try to integrate it we'll end up about where we are now. We do need some OS-specific ABD-specific storage for mapping, etc.

iterators are byte-oriented, even though their underlying storage is not

It seems current code already operates in chunks of possible mappings. The fact that abd_iterate_func2() always remaps both abds seems like an implementation detail. I don't see what would stop from handling offsets within a map and remapping only as necessary.

tuxoko · 2025-01-23T03:41:15Z

iterators are byte-oriented, even though their underlying storage is not
we're rarely (never?) interested arbitrarily-sized runs of bytes

For context, when abd was still out of tree, abd was directly attached to dmu buf.
So arbitrary byte iteration was needed.

In fact, I think we should add it back again, at least for user data dmu buf.
Because right now, if arc buf is scatter, dmu will have to allocate extra buf and do memcpy.
This is inevitable for metadata, but for user data it's just extra memcpy for no good reason.

So for this reason I don't think we should remove the ability to do arbitrary byte iteration.

adamdmoss · 2025-02-07T18:06:44Z

Since this is an RFC I just have one C: yes I think that chunk iterators are a good idea! Years ago I started to make the decompressors ABD-aware to avoid one (or two?) potential memcpy's. It got too annoying and dull TBH but a chunk iterator would have been really nice.
I have no strong opinion about the proposed API details... looks reasonable enough at first glance.

robn added 7 commits December 9, 2024 16:04

abd: chunk iterators

07d2611

For now, these are implemented in terms of the old iterators.

abd: rework abd_iterate_func using chunk iterators

6ab44cd

There is no real overhead change here, because it still has to call the callback function, which has to manage its own state, but the code is much simpler.

abd: use chunk iterator for basic ops

ca4747f

This shows how this form of iterator can be much simpler to work with. No need to set up separate state struct and callback function, just write a loop.

amotin added the Status: Design Review Needed Architecture or design is under discussion label Dec 11, 2024

amotin reviewed Dec 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: ABD chunk iterator #16848

RFC: ABD chunk iterator #16848

Uh oh!

robn commented Dec 9, 2024 •

edited

Loading

Uh oh!

amotin Dec 11, 2024

Uh oh!

amotin commented Dec 12, 2024

Uh oh!

tuxoko commented Jan 23, 2025

Uh oh!

adamdmoss commented Feb 7, 2025

Uh oh!

Uh oh!

RFC: ABD chunk iterator #16848

Are you sure you want to change the base?

RFC: ABD chunk iterator #16848

Uh oh!

Conversation

robn commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

amotin Dec 11, 2024

Choose a reason for hiding this comment

Uh oh!

amotin commented Dec 12, 2024

Uh oh!

tuxoko commented Jan 23, 2025

Uh oh!

adamdmoss commented Feb 7, 2025

Uh oh!

Uh oh!

robn commented Dec 9, 2024 •

edited

Loading