Skip to content

read() and read_multiple() block when consuming from python-managed pipes or sockets #354

Open
@fungs

Description

@fungs

Hi, I know that there is ongoing work to remove or reduce the dependency on raw file descriptors in #283 and #311. However, this issue I have is with the existing implementation (reference 2.0.0b2) and how it works with pipe (os.pipe) and socket (socket.socket) objects in Python, which do expose usable raw file descriptors via fileno().

We can construct a pipe using

pipe_read, pipe_write = os.pipe()

Data can be generated or read in one thread (or process) like

chunk_size = 1024**2  # 1 MiB
with os.fdopen(pipe_write, mode = "wb") as write_file:
  while chunk := file_like_source.read(chunk_size):
    write_file.write(chunk)

and be consumed in another thread like

with os.fdopen(pipe_read, mode = "rb") as read_file:
  for item in MyStruct.read_multiple(read_file):
    do_something(item)

What I observe is, that both MyStruct.read_multiple(read_file) and write_file.write(chunk) block, if chunk_size is shorter than the serialized struct item. I hypothesize, that this has to do how the reader peaks into the data, which is in fact a stream, without actually consuming it, but I don't know.

Strangely, if a process outside Python generates the stream via process = subprocess.Popen() and writes it into a pipe via standard output using stdout=subprocess.PIPE, read_multiple() can read it without issues using process.stdout.

Maybe someone has an idea why this happens and how it could be circumvented or fixed? Happy to hear your thoughts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions