-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add physical plan properties to protobuf definition #13136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @timsaucer -- I have had a chance to ponder this PR a bit more.
As I understand it, the reason we are thinking this code could help is that we could change the FFI bindings to return a serialized PlanProperties rather than a FFI struct.
However, given our current lackadaisical approach to protobuf compatibility https://docs.rs/datafusion-proto/latest/datafusion_proto/#version-compatibility I think we may run into trouble with this approach
Thus I think we should proceed with the FFI / API based approach rather than try to use serialized protobuf for the stable interface
That being said, I think this PR has potential value in its own right as a way to potentially improve performance (avoid recomputing the properties) though I am not sure is deserializing is faster than recomputing them 🤔
I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.
I do think we should have some basic testing, otherwise there is a real danger this code will be broken accidentally in a future refactoring.
I think the existing tests are in https://github.com/apache/datafusion/blob/10af8a73662f4f6aac09a34157b7cf5fee034502/datafusion/proto/tests/cases/roundtrip_physical_plan.rs#L107-L106
However, the way most of the current serialization worked is that they re-compute the plan properties
to recompute the properties
One thing we could potentially do is to improve the protobuf format to actually serialize the PlanProperties rather than re-compute them when re-creating the plan
That would also result in test coverage of the new code
So now that we merged #12920 do we still need this PR 🤔 |
I don't mind closing it. I hadn't yet because I wanted to think more about the value you thought could be added by avoiding the call to recompute. |
In general I would tend towards less code and more re-compute unless someone finds / reports the properties are very expensive to compute. Even if they do I would personally prefer to try and make it faster to re-compute the properties rather than serialize them as that would benefit planning for all queries (not just ones that were serialzed) |
Sounds good to me. |
Which issue does this PR close?
This is to support apache/datafusion-python#823 and to address this comment on #12920
Rationale for this change
This is a pure addition to the protobuf definition to allow for transferring
PlanProperties
in a serialized manner.What changes are included in this PR?
Are these changes tested?
Tested locally against my code for #12920
I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.
Are there any user-facing changes?
No changes. Additional message definitions are available for use downstream, and particularly for the upcoming FFI work.