Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux's io_uring IO interface (2x performance vs libevent) #10768

Draft
wants to merge 88 commits into
base: master
Choose a base branch
from

Conversation

lbguilherme
Copy link
Contributor

@lbguilherme lbguilherme commented May 30, 2021

This is a WIP implementation of the io_uring interface into Crystal's scheduler. I'm opening this PR for early review.

This was only tested with Linux 5.12 on an x86_64 machine but should work on Linux 5.4+.

Here is a benchmark to demonstrate the current performance gains:

require "http/server"

{% if flag?(:preview_iouring) %}
  require "./stdlib_patch"
{% end %}

server = HTTP::Server.new do |context|
  context.response.content_type = "text/plain"
  context.response.print "Hello world!"
end

address = server.bind_tcp 8080
puts "Listening on http://#{address}"
server.listen

Before:

$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
  12 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.85ms  268.64us  11.66ms   88.43%
    Req/Sec     9.51k   344.83    16.35k    74.86%
  6814608 requests in 1.00m, 656.39MB read
Requests/sec: 113544.61
Transfer/sec:     10.94MB

After (with -Dpreview_iouring):

$ wrk -t12 -c100 -d60s http://127.0.0.1:8080
Running 1m test @ http://127.0.0.1:8080
  12 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   422.39us  196.79us  14.97ms   95.13%
    Req/Sec    18.96k     1.96k  156.16k    92.56%
  13583654 requests in 1.00m, 1.28GB read
Requests/sec: 226021.87
Transfer/sec:     21.77MB

(Linux 5.12, Intel Xeon E-2174G, inside a Docker container)

Hopefully, this can come in time for Crystal 1.1.0 :D

Fixes #10740. Informs #10766.

@straight-shoota
Copy link
Member

Wow, this looks great 👍

It's a lot of stuff 😄 We should try to extract some parts into preliminary PRs.

The syscall feature should definitely be a separate item.

Copy link
Contributor

@yxhuvud yxhuvud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say this: I'm not convinced using raw syscalls instead of liburing is the right way to go. There is a lot of ways to get small details wrong, and liburing exposes a much friendlier interface than the raw calls.

src/crystal/system/unix/syscall/aarch64-linux.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/io_uring.cr Show resolved Hide resolved
src/crystal/system/unix/syscall/linux.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/syscall/linux.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/syscall/linux.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/io_uring.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/io_uring.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/io_uring.cr Show resolved Hide resolved
…to keep the current event_loop interface working (upcomming commit)
@lbguilherme
Copy link
Contributor Author

As a reminder, there is no public API changes and everything is behind a "preview" compile flag, so there is very little risk of breaking something in production.

Either way I believe the primary concern is about manutenability in the long run, as this is a fairly complex code to adopt. It is understandable for a change like this to be not so fast to accept.

If there is anything I can do, I'm available.

@carlhoerberg
Copy link
Contributor

In /usr/share/crystal/src/crystal/system/unix/event_libevent.cr:19:11

 19 | def add(timeout : Time::Span?) : Nil
              ^------
Warning: positional parameter 'timeout' corresponds to parameter 'time_span' of the overridden method Crystal::Event#add(time_span : Time::Span | ::Nil), which has a different name and may affect named argument passing

src/crystal/system/unix/event_io_uring.cr Outdated Show resolved Hide resolved
src/crystal/system/unix/event_loop_io_uring.cr Outdated Show resolved Hide resolved
src/kernel.cr Outdated
->Crystal::EventLoop.after_fork,
->{ Crystal::Scheduler.event_loop.after_fork },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: why is that change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event loop is now a class variable instead of a module. This is because the event loop implementation is detected at runtime, so it can't be a module anymore.

@lbguilherme
Copy link
Contributor Author

The latest merges from master I did introduced some bug or flaky behavior on the CI tests and I'll have to investigate what's going on to fix that.

Before I put any more effort into this work I want to ask the core maintainer: Is this PR something that would eventually be accepted into Crystal or the added complexity will do more harm than good and this shouldn't be merged anytime soon? If it's the later, let's close this.

@straight-shoota
Copy link
Member

I would very much like to merge this. More in #10740 (comment)

Thanks for your great work pushing it forward! I think with the event loop for windows now in place we should be able to continue with the refactoring necessary for io_uring integration.

@lbguilherme lbguilherme marked this pull request as draft November 9, 2022 16:41
in .writable_fd?
@io_uring.wait_writable(@fd, @callback, timeout: timeout)
@io_uring.submit_poll_add(@fd, Crystal::System::Syscall::POLLOUT, action_id: @action_id, timeout: timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So hmm, if I understand this correctly, this make it so that instead of going through events and wait_read/writeable it just submits a poll instead? I see one issue with that and that is the case when several fibers wait on the same file descriptor - it will wake every fiber that has submitted a poll instead of just waking up the first fiber. This is a difference to how the non-uring fiber loop works and I'm not certain that behavior is something that is good enough as it may mean that when a fiber actually gets woken up the fd is no longer actionable as some other fiber may have acted on it.

Also known as "thundering herd problem"

Copy link
Contributor Author

@lbguilherme lbguilherme Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the goal is to mimic the exact same behavior as libevent. This action_id there points to a callback (see create_fd_write_event). It will call io.resume_write that will wake only one single fiber.

If the action_id were a Fiber, then it would wake up that fiber upon completion. It is more suitable for other operations like read/write/connect/send/etc. Currently only we only do poll with a callback.

@z64
Copy link
Contributor

z64 commented Dec 27, 2022

Hey there @lbguilherme @straight-shoota - highly looking forward to having this for Crystal! ❤️

What are the next steps here? Could we update the tracking issue #10740 with a roadmap, if necessary?

@lbguilherme
Copy link
Contributor Author

Hi @z64! I'm having very little time to work on this lately, so things are going slowly. The current status is that the code done and has been refactored in-line with current master. It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet. Help is always appreciated, ofc.

@HertzDevil
Copy link
Contributor

Apparently there are also "I/O rings" on Windows since 21H2: https://learn.microsoft.com/en-us/windows/win32/api/ioringapi/

They are apparently very similar to io_uring: https://windows-internals.com/ioring-vs-io_uring-a-comparison-of-windows-and-linux-implementations/

@yxhuvud
Copy link
Contributor

yxhuvud commented May 26, 2023

There is, but they are different enough that it would still take a lot of work to support windows. And I don't know how much is supported, I only that original article about it and that was not really sufficient when it comes to what operations are there

@paulocoghi
Copy link

It should be working, but there is a bug causing the CI to seg fault and couldn't find the cause yet.

Could any Crystal maintainer take a look? This improvement is of great importance.

@crysbot
Copy link
Collaborator

crysbot commented May 7, 2024

This pull request has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/curious-about-the-eventloop-updates/6825/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux's IO_Uring interface (2x IO performance!)
10 participants