-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyspark DecisionTreeRegressionModel bundle does not include all attributes #871
Comments
Your example is using mleap 0.19.0. Does this go away if you use the latest version? Also note that v0.23.1 is tested against Spark 3.4. I'd suspect it still works with Spark 3.3, but untested/unsupported. |
Hello @jsleight, I have tested using the correct jar:
And the results remain the same. The attributes are lost |
Looks like the op isn't serializing the impurities right now. Looking at what the |
Hello @jsleight The impurities are important for explainability for example. |
Yeah for sure. But I'd argue that serializing to mleap is for inference tasks. To do evaluation and introspection you could just
using spark's built in functions. Then |
Hello @jsleight I have no knowledge of Scala but I think I understood how objects are serialized internally. What do you think about the possibility of an additional parameter in serializeToBundle and deserializeFromBundle that allows us to send a Map with: And then in the BundleRegistry check if a class is in the new map or if it not, use the defaults With this perhaps users could create their own ops and add and change attributes. |
Ah, in mleap you can do that exact process by altering the ops registry. We use it for xgboost in order to allow xgboost models to be serialized in different ways depending how you want to serve them. See this readme and associated xgboost-runtime code as an example. Using this process, your approach would be to:
|
Issue Description
Pyspark DecisionTreeRegressionModel loses values in attributes after packaging and loading them.
Minimal Reproducible Example
mleap version: 0.23.1
pyspark version: 3.3.0
Python version: 3.10.6
If we take a look to the created model, we can see that nodes have different attributes.
If I save and load the model the results are:
The text was updated successfully, but these errors were encountered: