Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor I/O heap #147

Open
cevans87 opened this issue Oct 8, 2021 · 0 comments
Open

Actor I/O heap #147

cevans87 opened this issue Oct 8, 2021 · 0 comments
Labels

Comments

@cevans87
Copy link

cevans87 commented Oct 8, 2021

Actors need the ability to submit asynchronous system calls and continue to make forward progress on other work while waiting for the results. While researching #141, we realized that all pointer arguments in an io_uring submission must point to memory that will not move while the kernel might access it. There is currently no facility for non-moving memory in Hemlock, so we've determined that we need to create one specifically for actor I/O.

Algorithm

Memory location

We plan to create an additional (8th) region in each actor context where we'll create these non-moving allocations.

Allocation

Allocation in the non-moving span-allocation area only happens during system call submission. It is part of the runtime implementation specifically for operations with io_uring and the API is not exposed for general use.

Individual allocations are made as close to the front of the memory region as possible. To reduce potential fragmentation as much as possible, we do not allow arbitrary allocation sizes. Allocation sizes must be one of the allowed size classes(arbitrary allocation sizes will actually create an allocation with the smallest size class that can fit the allocation request).

When an allocation is freed, the empty space is combined (coalesced) with any empty space before and after it.

Size classes

We expect the vast majority of the system calls to be read and write for either socket I/O or file I/O. An actor may read or write from multiple file descriptors at once. Appropriate buffer sizes for such calls tend to be powers of 2 in the 2 KiB to 2 MiB range. Since this is the common case, we'll provide multiple size classes in this range.

Some system calls require very little non-moving data (a few bytes for pathname argument in openat).

System call submission

Upon an actor making a system call (via submission to the executor's io_uring instance), we allocate span-allocated memory for any reference (a C pointer) that we'll pass to the kernel. The data for such references are copied into the span-allocated memory. The reference passed to the kernel points to that copied data. We create a submission queue event in the executor's io_uring instance with the new references that point into non-moving span-allocated memory.

There are three cases to handle when an actor makes a submission in the executor's io_uring instance.

  1. There is a free submission queue event and the actor is able to utilize it. We will not interrupt the actor during the time it reserves a submission, fills it in, and submits it. Interrupting the actor while it owns a submission queue event would make it unusable for any other actor until the interrupted actor resumes and finishes its submission.
  2. There are no free submission queue events. The actor will attempt to call io_uring_enter to flush the submission queue. Upon success, the remaining behavior is identical to 1.
  3. Same as 2, but the call to io_uring_enter fails with EBUSY. In this situation, the executor must deal with completion queue events before io_uring_enter can succeed. The actor yields to the executor so that it may clean up completions and successfully call io_uring_enter. The suspended actor goes into a high-priority runnable queue. Upon resuming actor execution, the remaining behavior is identical to 1.

System call completion

During every time-slice pause, the executor checks its io_uring completion queue for completions. Using the user_data, the executor determines which actor the completion is associated with. Upon determining the actor, the executor writes a message to the actor's mailbox. The contents of the message is the completion queue event, which is copied out of the io_uring instance's completion queue. If the actor was not previously runnable, it becomes runnable.

SQE/CQE user_data

The user_data field points to a memory location within the actor's I/O heap. This has a disadvantage that even things like a NOP would need an allocation. We do this that we can associate specific submissions to specific completions.

Examples

Openat

TODO

Read

There are some interesting considerations here. The normal system call signature can be adapted to Hemlock with little effort, though we may be better off transforming it.

Without transformation ssize_t read(int fd, void *buf, size_t count);

module File:
  type t
  val read: t -> !&buffer -> uns >os-> pending (unit >!&buffer-> int)

let pending_read = read my_file my_buf 1024
match inbox with
| error e -> ...
| fulfilled pending_read fulfilled_read ->  # We don't have syntax for mailbox matching yet. Just go with it.
  match fulfilled_read with
  | 0 -> # Do something with my_buf
  | _ -> # Handle the error code 

More likely, I'd make it so that we don't have to create a buffer beforehand.

module File:
  type t
  val read: t -> uns >os-> pending (unit -> result buffer)

let pending_read = read my_file 1024
match inbox with
| error e -> ...
| fulfilled pending_read fulfilled_read ->
  match fulfilled_read with
  | buffer my_buf -> # Do something with my_buf
  | error e -> # Handle the error code 

Poll + multiple reads

TODO

@jasone jasone added the design label Oct 8, 2021
@cevans87 cevans87 changed the title (in-progress) Actor I/O heap Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants