Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async/await compatible wrapper for librados AIO methods #79

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jun 15, 2021

This PR adds an async/await compatible interface for librados's aio methods.

A new Completion type wraps Ceph's completions into a Future, including the cancel-on-drop behaviour expected of rust futures.

For reading and writing large objects, there are streaming variants that return Stream/Sink compatible objects, with configurable chunk sizes and numbers of IOs in flight.

@cholcombe973 cholcombe973 self-requested a review June 15, 2021 18:46
@cholcombe973
Copy link
Contributor

cholcombe973 commented Jun 15, 2021

Thanks for tackling this. I started on something like this awhile back but didn't know where to go with it. I'm kinda leaning towards your first approach with passing around an Arc all over the place.

@jcsp jcsp changed the title [Draft] async/await compatible wrapper for librados AIO methods async/await compatible wrapper for librados AIO methods Jun 18, 2021
@jcsp
Copy link
Contributor Author

jcsp commented Jun 18, 2021

I've modified this to stop using Arcs for the Completion->IoCtx relationship. Originally I wanted Arcs to make it easier to detach Completions and move them around outside the original &IoCtx lifetime, but because we're returning async{}-generated futures in the public interface anyway, the public facing futures already had a lifetime bound from their self argument. Using plain references makes everything quite a bit cleaner.

The async-no-cancel branch was a dead end, we definitely need the cancellation to avoid use-after-free on buffers that have been passed to async rados calls.

src/completion.rs Show resolved Hide resolved
src/ceph.rs Outdated Show resolved Hide resolved
jcsp added 8 commits August 30, 2021 14:56
It would also be possible for callers to do this for themselves
with an equivalent background thread wrapper, but it's better
to avoid forcing every user to reinvent that.
This helps writers who would like to stream
data into a RADOS object without managing their own
backlog of futures.
@jcsp
Copy link
Contributor Author

jcsp commented Aug 30, 2021

I've just pushed an updated version of this. I haven't worked on it for a few months, but I had some extra code lying around so took the time to clean it up a bit today. There are now streams for reads & writes of large objects, streams for object listing, and xattr methods.

I originally wrote this alongside a spare time project (https://gitlab.com/putget-io/capra/-/blob/main/src/server/backend/rados.rs) and I have a lot less spare time right now, so I'm not sure if it makes sense to merge this without a more real user, but hope this code is useful to the next Rustacean that wants to talk to rados from an async application!

@badone
Copy link

badone commented Sep 1, 2021

Thanks for this @jcsp

@dae
Copy link
Contributor

dae commented Sep 6, 2021

I've been testing the July 29 version of this under fairly heavy load for a few weeks, and have been hitting the occasional segfault, and a few instances where a call to rados_async_object_read() returned successfully but with the buffer still in its initial zeroed state. Can't rule out Luminous as the cause, as I only have experience with it in blocking mode in the past. Once I find the time to update to a newer Ceph I'll see if the issue remains and report back.

@jcsp
Copy link
Contributor Author

jcsp commented Sep 6, 2021

@dae thanks for the feedback - there were definitely some bugs in that earlier code (including segfaults) that I fixed in the intervening period while testing it under load with the read/write streams, so hopefully you should see better stability with the latest.

@dae
Copy link
Contributor

dae commented Sep 6, 2021

Ah, thanks, that's good to hear - will give that a go first.

@dae
Copy link
Contributor

dae commented Sep 8, 2021

Same issues after updating the unfortunately, so I'll still need to try updating ceph - getting segfaults with messages like "double free or corruption (!prev)" and "mismatching next->prev_size (unsorted)", and still the occasional invalid data returned from a read.

@dae
Copy link
Contributor

dae commented Sep 11, 2021

Initial results from updating to a newer Ceph look promising - will let it bake for a week, but it looks like it may well have been a bug in the older librados.

Later edit: zero problems since the Ceph upgrade.

@leseb
Copy link
Member

leseb commented Feb 8, 2022

Is there anything missing for this to merge? Thanks!

@Ten0
Copy link
Contributor

Ten0 commented Jul 29, 2023

I'm not super clear on what's blocking this at this point. Any insight?

@dae
Copy link
Contributor

dae commented Jul 29, 2023

I've been using this successfully in production since my previous post, FWIW.

@Xuanwo Xuanwo force-pushed the master branch 2 times, most recently from ead7751 to 9b39b74 Compare September 3, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants