Description
Hi, I know that there is ongoing work to remove or reduce the dependency on raw file descriptors in #283 and #311. However, this issue I have is with the existing implementation (reference 2.0.0b2) and how it works with pipe (os.pipe
) and socket (socket.socket
) objects in Python, which do expose usable raw file descriptors via fileno()
.
We can construct a pipe using
pipe_read, pipe_write = os.pipe()
Data can be generated or read in one thread (or process) like
chunk_size = 1024**2 # 1 MiB
with os.fdopen(pipe_write, mode = "wb") as write_file:
while chunk := file_like_source.read(chunk_size):
write_file.write(chunk)
and be consumed in another thread like
with os.fdopen(pipe_read, mode = "rb") as read_file:
for item in MyStruct.read_multiple(read_file):
do_something(item)
What I observe is, that both MyStruct.read_multiple(read_file)
and write_file.write(chunk)
block, if chunk_size
is shorter than the serialized struct item. I hypothesize, that this has to do how the reader peaks into the data, which is in fact a stream, without actually consuming it, but I don't know.
Strangely, if a process outside Python generates the stream via process = subprocess.Popen()
and writes it into a pipe via standard output using stdout=subprocess.PIPE
, read_multiple() can read it without issues using process.stdout
.
Maybe someone has an idea why this happens and how it could be circumvented or fixed? Happy to hear your thoughts.