-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Reduce dependency on file descriptors #311
Comments
This is @haata's call but FWIW this plan makes a lot of sense to me. |
Yeah, I think this makes sense (it definitely would make things a lot more straightforward).
I haven't investigated these APIs specifically, but it's usually possible to call the CPython apis directly. If done the right way, you shouldn't have to pay the Python interpreter tax (or we may need to add python dependency with a C version of that specific api). Before you spend too much effort, it might be worth measuring the performance of read/write operations with some small POC code. |
Surely you can have an optimization where you detect if the stream is backed by a raw FD and, if so, use it raw, but if not, then use the streams that call back into Python? |
I'm not aware of any C-level or Cython api's for either file objects or sockets. Everything goes through io and socket for sync communication and transports and protocols or streams for async communication. None of those seem to expose a C-level API. But then, there certainly is no low-level API for the Gzip and SSL wrapper.
It is certainly possible to just pass in a raw fd ( Personally, I would be inclined to forego such heuristics unless the resulting performance is absolutely abysmal. Python is a notoriously extremely slow language, and most python developers prefer convenience over speed (to my dismay). The people that want maximum speed can still just pass in a raw fd int. |
Oh man. Yeah that makes |
Now that #312 exists, I'd like to take this proposal one step further. I propose to entirely remove the ability to run capnproto RPC without Python's asyncio loop. In my opinion, the existing functionality does not work with capnproto's philosophy, which is fundamentally based on a well-integrated event loop, and leads to many bugs. Take for example thread_server.py and thread_client.py:
When we remove this functionality, lots of other code can also be removed. So, I propose the following:
This would obviously be a large breaking change. But #310 is already breaking and I think we might as well double down. I don't personally have the cycles to ensure a lengthy deprecation phase. And I suspect nobody does... Thoughts? |
I don't know anything about Python asyncio but abstractly speaking this proposal sounds like the right thing. |
Yeah, this is a breaking change. But I agree with it. Asyncio was the conclusion I reached after lots of fighting with libcapnproto file descriptors. If we're going to be incrementing to v2.0.0 anyways, this is the time to do it. Making pycapnp more pythonic helps everyone in the long run. The simplification is also a huge pro IMO. |
In that case, I'd suggest that a |
With #315, I believe that the only thing remaining in this issue is to clean up the |
There is bug for this in the Python issue tracker: python/cpython#100066 . I hope it will go away, but a similar bug exists since 2015. |
I'm also looking forward to having a simple way to use (sync) file object wrappers to read and write messages. This is important for integration with other Python functionality for compression, encryption, databases etc. My current, ugly workaround is to spawn a separate process to read the source data stream and feed it into a pipe or socket, which is then consumed in the parent process. UNIX pipes and sockets are OS low level tools which provide a working |
Pycapnp mostly works by passing file descriptors into the capnproto c++ library. This causes some problems:
KJ_SYSCALL
capnproto#1542.To remedy this, I propose that instead of passing file descriptors to capnproto, we create wrappers around Python's IO facilities.
kj::InputStream
andkj::OutputStream
that wraps a Python file objectkj::AsyncInputStream
andkj:AsyncOutputStream
that wraps a StreamReader and StreamWriter.With these facilities, all classes like
TwoPartyClient
andTwoPartyServer
will be modified to receive one of these wrapper classes. That should resolve all of the aforementioned problems.A potential downside is that everything might slow down, because read and write operations have to be routed through the Python interpreter. I haven't measured the impact of this. If this is a problem, I guess we can also keep the file descriptor API (with it's known deficiencies). Downside of this, is that the file descriptor API requires a fairly large amount of code to work (see
asyncProvider.cpp
in #310)The text was updated successfully, but these errors were encountered: