Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Bootstrap.File.seek #165

Closed
1 task
Tracked by #157
cevans87 opened this issue Oct 27, 2021 · 3 comments
Closed
1 task
Tracked by #157

Refactor Bootstrap.File.seek #165

cevans87 opened this issue Oct 27, 2021 · 3 comments
Assignees
Labels

Comments

@cevans87
Copy link

cevans87 commented Oct 27, 2021

  • Since this task is prompting a switch from kernel-side to hemlock-side file position tracking, investigate how this might affect Socket.{read|write}. I'd originally thought the signatures would look exactly the same as with File.{read|write}.
@cevans87 cevans87 mentioned this issue Oct 27, 2021
5 tasks
@cevans87 cevans87 self-assigned this Nov 1, 2021
@cevans87
Copy link
Author

cevans87 commented Nov 2, 2021

I ran into an unexpected problem with implementing this and my investigation leads me to wonder whether I need to make a change to the way we keep track of position in file read/write operations.

Firstly, there just isn't an IORING_OP_LSEEK available like I expected. The fact that IORING_FEAT_RW_CUR_POS exists made me think that lseek(2) functionality must exist, but it does not. We can still have the kernel keep track of the current file position for our read/write operations by setting sqe->off = -1. Doing so will update the current file position. We can also specify an absolute position in the file to read/write, but doing so does not update the file position within the kernel.

These findings make me think we have a few options moving forward.

  1. Keep our current seek implementation. It's still valid to use the raw lseek(2) system call. It simply doesn't use io_uring. It also may not fit with Hemlock's model of completely asynchronous I/O. I don't think the system call can block, but more investigation is needed.
  2. We can simulate lseek(2) by keeping track of our own file position. So, our File.t might become
    type t = {
      fd: uns
      off: uns
    }
    
    We'd then have to update the offset internally after every read/write and we'd also have to provide it to the sqe->off = off. We'd completely get away from having the kernel keeping track of file position. With that, simulating SEEK_SET and SEEK_CUR would be trivial. SEEK_END would require us to stat the file to get the end position. SEEK_DATA and SEEK_HOLE would still require calling into seek(2), so we may not be able to do it.
  3. Get rid of our seek implementation entirely for now. Continue letting the kernel track file position.

@cevans87
Copy link
Author

cevans87 commented Nov 3, 2021

After discussing with @jasone , option 2 seems best. Letting the kernel track our file position has some serious footguns, some of which are made worse by the concurrent nature of io_uring (two back-to-back read submissions that are not linked will not be explicitly serialized by the kernel, so read results are unstable).

Doing away with kernel-side file position tracking makes me wonder what read/write will look like for socket I/O. I need to look at some examples before moving forward.

@cevans87
Copy link
Author

Opened #169 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant