Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: Python binding in multiprocessing context #5316

Open
1 task done
TennyZhuang opened this issue Nov 13, 2024 · 3 comments
Open
1 task done

new feature: Python binding in multiprocessing context #5316

TennyZhuang opened this issue Nov 13, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@TennyZhuang
Copy link
Contributor

TennyZhuang commented Nov 13, 2024

Feature Description

The operator is stateless, so we have no reason to reject its use in Python's multiprocess environment.

Problem and Solution

Python uses the pickle format to serialize and deserialize Python objects between multiple processes. If it is a native Python object, it is naturally supported to be converted by pickle. However, as an extension, we must implement it manually.

There are two hooks for the purpose, __setstate__ and __getstate__.

However, the two methods means that a Operator must expose all information that can be used to construct it. Currently, the public interface can't do that.

@Zheaoli Can you provide some ideas to achieve the purpose?

Additional Context

Here are examples in polars:

https://github.com/pola-rs/polars/blob/18786acd8d1eb68fc87982b07ce29ecbae0923f0/crates/polars-python/src/lazyframe/serde.rs#L16-L36

Are you willing to contribute to the development of this feature?

  • Yes, I am willing to contribute to the development of this feature.
@TennyZhuang TennyZhuang added the enhancement New feature or request label Nov 13, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Nov 13, 2024

There are two hooks for the purpose, __setstate__ and __getstate__.

We could store the configuration provided by users during the operator build process and reconstruct it when needed.

@TennyZhuang
Copy link
Contributor Author

There are two hooks for the purpose, __setstate__ and __getstate__.

We could store the configuration provided by users during the operator build process and reconstruct it when needed.

To achieve this, we must add new APIs in the core.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 13, 2024

I'm guessing we can save scheme and map at python side:

let mut op = ocore::Operator::via_iter(scheme, map).map_err(format_pyerr)?;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants