Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: implement native parsing and serialization #214

Open
antonagestam opened this issue Oct 13, 2024 · 0 comments
Open

Feature proposal: implement native parsing and serialization #214

antonagestam opened this issue Oct 13, 2024 · 0 comments

Comments

@antonagestam
Copy link
Contributor

antonagestam commented Oct 13, 2024

I have been looking into this periodically since very early in the life of kio. This is ticket is for tracking progress, discussing nuances and just some general brain-dumping of ideas and findings.

The idea is simple to describe. In order to speed up serde, we re-implement all parsing and serialization in Rust using PyO3. The main entry-point to parsing is entity_reader and its internal read_entity, they would both be implemented as Rust functions that identically to the current implementation introspects an entity class and from that is able to parse a stream of bytes.

The tooling for this kind of setup is mature. There is some boilerplate to set it up initially, but not a lot.

Testing strategy

My idea for this is that we have achieved a very solid test suite already, and that we should reap the fruits of that investment. We have high trust that the current suite asserts correctness. Therefore, when we rewrite implementations in Rust, it will be valuable to keep running the same test suite in Python.

There might be also cases where we want to add additional testing on the Rust level, but that would likely mostly be to cover utilities that are not exposed to Python, as I see it.

memoryview instead of IO[bytes]

In order to achieve zero copy semantics, when used in client code, we need to rewrite the current implementation to use memoryview as the main interface to read bytes from rather than IO[bytes]. Since memoryview doesn't maintain a position in the stream, this necessarily changes the interface to all parsing functions.

My solution for this is to instead of having signatures like (IO[bytes]) -> T for a parser of T, we change that into (memoryview) -> (memoryview, T), so that every function in addition to a parsed value also returns a new memoryview of the remaining bytes from the stream that are yet to be parsed. Creating new memoryviews from existing ones in this way is cheap and still maintains the zero copy semantics. It also seems more ergonomic than for instance having every function return the number of consumed bytes as an integer.

Using memoryview does come with issues though 12. It's not yet clear to me how best to approach this, and whether there have been recent improvements to best practices. I'm currently looking closer into it.

Footnotes

  1. https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/

  2. https://discuss.python.org/t/pep-draft-safer-mutability-semantics-for-the-buffer-protocol/42346/5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant