You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actors need the ability to submit asynchronous system calls and continue to make forward progress on other work while waiting for the results. While researching #141, we realized that all pointer arguments in an io_uring submission must point to memory that will not move while the kernel might access it. There is currently no facility for non-moving memory in Hemlock, so we've determined that we need to create one specifically for actor I/O.
Algorithm
Memory location
We plan to create an additional (8th) region in each actor context where we'll create these non-moving allocations.
Allocation
Allocation in the non-moving span-allocation area only happens during system call submission. It is part of the runtime implementation specifically for operations with io_uring and the API is not exposed for general use.
Individual allocations are made as close to the front of the memory region as possible. To reduce potential fragmentation as much as possible, we do not allow arbitrary allocation sizes. Allocation sizes must be one of the allowed size classes(arbitrary allocation sizes will actually create an allocation with the smallest size class that can fit the allocation request).
When an allocation is freed, the empty space is combined (coalesced) with any empty space before and after it.
Size classes
We expect the vast majority of the system calls to be read and write for either socket I/O or file I/O. An actor may read or write from multiple file descriptors at once. Appropriate buffer sizes for such calls tend to be powers of 2 in the 2 KiB to 2 MiB range. Since this is the common case, we'll provide multiple size classes in this range.
Some system calls require very little non-moving data (a few bytes for pathname argument in openat).
System call submission
Upon an actor making a system call (via submission to the executor's io_uring instance), we allocate span-allocated memory for any reference (a C pointer) that we'll pass to the kernel. The data for such references are copied into the span-allocated memory. The reference passed to the kernel points to that copied data. We create a submission queue event in the executor's io_uring instance with the new references that point into non-moving span-allocated memory.
There are three cases to handle when an actor makes a submission in the executor's io_uring instance.
There is a free submission queue event and the actor is able to utilize it. We will not interrupt the actor during the time it reserves a submission, fills it in, and submits it. Interrupting the actor while it owns a submission queue event would make it unusable for any other actor until the interrupted actor resumes and finishes its submission.
There are no free submission queue events. The actor will attempt to call io_uring_enter to flush the submission queue. Upon success, the remaining behavior is identical to 1.
Same as 2, but the call to io_uring_enter fails with EBUSY. In this situation, the executor must deal with completion queue events before io_uring_enter can succeed. The actor yields to the executor so that it may clean up completions and successfully call io_uring_enter. The suspended actor goes into a high-priority runnable queue. Upon resuming actor execution, the remaining behavior is identical to 1.
System call completion
During every time-slice pause, the executor checks its io_uring completion queue for completions. Using the user_data, the executor determines which actor the completion is associated with. Upon determining the actor, the executor writes a message to the actor's mailbox. The contents of the message is the completion queue event, which is copied out of the io_uring instance's completion queue. If the actor was not previously runnable, it becomes runnable.
SQE/CQE user_data
The user_data field points to a memory location within the actor's I/O heap. This has a disadvantage that even things like a NOP would need an allocation. We do this that we can associate specific submissions to specific completions.
Examples
Openat
TODO
Read
There are some interesting considerations here. The normal system call signature can be adapted to Hemlock with little effort, though we may be better off transforming it.
Without transformation ssize_t read(int fd, void *buf, size_t count);
module File:
type t
val read: t -> !&buffer -> uns >os-> pending (unit >!&buffer-> int)
let pending_read = read my_file my_buf 1024
match inbox with
| error e -> ...
| fulfilled pending_read fulfilled_read -> # We don't have syntax for mailbox matching yet. Just go with it.
match fulfilled_read with
| 0 -> # Do something with my_buf
| _ -> # Handle the error code
More likely, I'd make it so that we don't have to create a buffer beforehand.
module File:
type t
val read: t -> uns >os-> pending (unit -> result buffer)
let pending_read = read my_file 1024
match inbox with
| error e -> ...
| fulfilled pending_read fulfilled_read ->
match fulfilled_read with
| buffer my_buf -> # Do something with my_buf
| error e -> # Handle the error code
Poll + multiple reads
TODO
The text was updated successfully, but these errors were encountered:
Actors need the ability to submit asynchronous system calls and continue to make forward progress on other work while waiting for the results. While researching #141, we realized that all pointer arguments in an io_uring submission must point to memory that will not move while the kernel might access it. There is currently no facility for non-moving memory in Hemlock, so we've determined that we need to create one specifically for actor I/O.
Algorithm
Memory location
We plan to create an additional (8th) region in each actor context where we'll create these non-moving allocations.
Allocation
Allocation in the non-moving span-allocation area only happens during system call submission. It is part of the runtime implementation specifically for operations with
io_uring
and the API is not exposed for general use.Individual allocations are made as close to the front of the memory region as possible. To reduce potential fragmentation as much as possible, we do not allow arbitrary allocation sizes. Allocation sizes must be one of the allowed size classes(arbitrary allocation sizes will actually create an allocation with the smallest size class that can fit the allocation request).
When an allocation is freed, the empty space is combined (coalesced) with any empty space before and after it.
Size classes
We expect the vast majority of the system calls to be
read
andwrite
for either socket I/O or file I/O. An actor may read or write from multiple file descriptors at once. Appropriate buffer sizes for such calls tend to be powers of 2 in the 2 KiB to 2 MiB range. Since this is the common case, we'll provide multiple size classes in this range.Some system calls require very little non-moving data (a few bytes for
pathname
argument inopenat
).System call submission
Upon an actor making a system call (via submission to the executor's io_uring instance), we allocate span-allocated memory for any reference (a C pointer) that we'll pass to the kernel. The data for such references are copied into the span-allocated memory. The reference passed to the kernel points to that copied data. We create a submission queue event in the executor's io_uring instance with the new references that point into non-moving span-allocated memory.
There are three cases to handle when an actor makes a submission in the executor's io_uring instance.
io_uring_enter
to flush the submission queue. Upon success, the remaining behavior is identical to 1.io_uring_enter
fails withEBUSY
. In this situation, the executor must deal with completion queue events beforeio_uring_enter
can succeed. The actor yields to the executor so that it may clean up completions and successfully callio_uring_enter
. The suspended actor goes into a high-priority runnable queue. Upon resuming actor execution, the remaining behavior is identical to 1.System call completion
During every time-slice pause, the executor checks its io_uring completion queue for completions. Using the
user_data
, the executor determines which actor the completion is associated with. Upon determining the actor, the executor writes a message to the actor's mailbox. The contents of the message is the completion queue event, which is copied out of the io_uring instance's completion queue. If the actor was not previously runnable, it becomes runnable.SQE/CQE
user_data
The
user_data
field points to a memory location within the actor's I/O heap. This has a disadvantage that even things like a NOP would need an allocation. We do this that we can associate specific submissions to specific completions.Examples
Openat
TODO
Read
There are some interesting considerations here. The normal system call signature can be adapted to Hemlock with little effort, though we may be better off transforming it.
Without transformation
ssize_t read(int fd, void *buf, size_t count);
More likely, I'd make it so that we don't have to create a buffer beforehand.
Poll + multiple reads
TODO
The text was updated successfully, but these errors were encountered: