Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec/std/channel_spec.cr gets stuck on Windows with -Dpreview_mt #14222

Open
HertzDevil opened this issue Jan 11, 2024 · 3 comments · May be fixed by #14949
Open

spec/std/channel_spec.cr gets stuck on Windows with -Dpreview_mt #14222

HertzDevil opened this issue Jan 11, 2024 · 3 comments · May be fixed by #14949
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:windows Windows support based on the MSVC toolchain / Win32 API topic:multithreading topic:stdlib:concurrency

Comments

@HertzDevil
Copy link
Contributor

Running that spec file will block at the raises if channel was closed test. Increasing CRYSTAL_WORKERS from the default 4 makes two more specs run and pass, but after that the awakes all waiting selects test still blocks.

This reduction, probably the first Windows-specific one we have, similarly blocks at the CRYSTAL_WORKERS + 1-th loop:

100.times do |i|
  p i

  x = Atomic(Int32).new(0)

  spawn do
    x.set(1)

    spawn(same_thread: true) do
      while true
        Fiber.yield
      end
    end
  end

  while x.get == 0
    Fiber.yield
  end
end

The snippet doesn't block if run without -Dpreview_mt or run on Linux.

@HertzDevil HertzDevil added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib:concurrency platform:windows Windows support based on the MSVC toolchain / Win32 API topic:multithreading labels Jan 11, 2024
@HertzDevil
Copy link
Contributor Author

It looks like all the worker threads eventually got stuck on the fiber_channel.receive line inside Crystal::Scheduler#run_loop. This fiber_channel is a Crystal::FiberChannel backed by ::IO.pipe. This honestly looks suspicious to me, since non-blocking pipes themselves depend on the event loop. Is a more lightweight option available here? cc @ysbaddaden @oprypin

@ysbaddaden
Copy link
Contributor

All threads waiting on FiberChannel means all threads "parked" themselves but they shouldn't be blocked on it, but blocked on the EventLoop (waiting for completions).

So yeah, suspicious.

Note: with Executioncontext the FiberChannel is no longer used.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Aug 28, 2024

With two changes I now manage to get the OP snippet and the specs passing:

module Crystal::System::FileDescriptor
  def self.pipe(read_blocking, write_blocking)
    # ...

    r = IO::FileDescriptor.new(r_pipe.address, read_blocking)
    w = IO::FileDescriptor.new(w_pipe.address, write_blocking)
    r.read_buffering = false # disable buffering on the read end
    w.sync = true

    {r, w}
  end
end

lib LibC
  fun PeekNamedPipe(hNamedPipe : HANDLE, lpBuffer : Void*, nBufferSize : DWORD, lpBytesRead : DWORD*, lpTotalBytesAvail : DWORD*, lpBytesLeftThisMessage : DWORD*) : BOOL
end

struct Crystal::FiberChannel
  def receive
    # explicitly peek the pipe until there is data available
    while true
      LibC.PeekNamedPipe(LibC::HANDLE.new(@worker_out.fd), nil, 0, nil, out bytes_avail, nil)
      break if bytes_avail >= sizeof(UInt64)
      Fiber.yield
    end

    oid = @worker_out.read_bytes(UInt64)
    Pointer(Fiber).new(oid).as(Fiber)
  end
end

What's interesting is both ends of the IO.pipe can now be blocking, which makes me believe pipes are indeed unnecessary here. Everything still works if I wrote:

struct Crystal::FiberChannel
  @lock = Crystal::SpinLock.new

  def initialize
    @queue = Deque(Fiber).new
  end

  def send(fiber : Fiber)
    @lock.sync { @queue << fiber }
  end

  def receive
    while true
      fiber = @lock.sync { @queue.shift? }
      return fiber if fiber
      Fiber.yield
    end
  end
end

And eventually:

class Crystal::Scheduler
  {% flag?(:preview_mt) %}
    def run_loop
      spawn_stack_pool_collector

      loop do
        @lock.lock

        if runnable = @runnables.shift?
          @runnables << Fiber.current
          @lock.unlock
          resume(runnable)
        else
          @lock.unlock
          Fiber.yield
        end
      end
    end

    def send_fiber(fiber : Fiber)
      @lock.sync { @runnables << fiber }
    end
  {% end %}
end

Has the codebase changed so much in these years that we don't need fiber_channel anymore...? The pipe was already there since b52bfb7 as part of #8112, but there were no mentions of the rationale behind using a pipe.

On the other hand, I wonder if this also implies IO.pipe's semantics are underspecified.

@HertzDevil HertzDevil linked a pull request Aug 28, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:windows Windows support based on the MSVC toolchain / Win32 API topic:multithreading topic:stdlib:concurrency
Projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

2 participants