-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input data type inference behavior change in 1.14.1 #1150
Comments
The releases notes include significant changes. I did not expect the changes you mention to have an impact but I obviously made a mistake. Is it possible to have a short example I can use to reproduce your discrepancies and see how I can fix it. In ai.onnx.ml v3, double are not fully supported by TreeEnsembleClassifier. v5 fuses both TreeEnsembleClassifier and TreeEnsembleRegressor into a single operator. Switching to this would give us more freedom. |
Thanks for the reply! When converting a model from sklearn to ONNX, does the conversion code know the type associated with the input? If so, could we add a
You're referring to TreeEnsemble, no? Updating the library to support that makes sense to me |
We know the input type at conversion type and it is float32, the Cast may not be added but the users must always use float32 when running the onnx model then. I'm referring to TreeEnsemble. This new operator also supports urles such as x in {set of values}. The fact that TreeEnsemble did not support float64 make the conversion impossible in some cases. That should be better but I need some time to update the library. |
Got it. Thanks for the consideration. No rush from me on switching to |
Hello and thanks for this wonderful library! For the past 18 months, we've been using version 1.14.0 to convert an sklearn XGBoostClassifier pipeline model to ONNX. Everything has worked great. Recently we needed to upgrade to newer versions of onnx and onnxruntime. Upgrading to the latest versions of those libraries as well as the latest version of this library resulted in score mismatch issues between the sklearn pipeline version of our model and the ONNX version of our model. After much head scratching and searching, I narrowed down the issue to version 1.14.1 of skl2onnx. If I keep the old versions of
onnxruntime
andonnx
that we've been using for the last 18 months and only switch fromskl2onnx==1.14.0
toskl2onnx==1.14.1
, I can reproduce this score mismatch error. (I can also reproduce the issue when using any newer version of onnx, onnxruntime, orskl2onnx>=1.14.1
After inspecting the models in Netron, it looks like the underlying structure has changed a bit. For context, our sklearn model pipeline takes in a combination of float64 and string inputs. The string inputs are all one-hot encoded by the model pipeline. The float64 inputs are run through the venerable sklearn
passthrough
transformer. These values are then fed to theXGBClassifier
.In version
0.14.0
, the numeric float64 inputs feed to aConcat
node and then into aCast
node which converts them all to float32. The string inputs go one hot encoding ->Concat
->Reshape
and then meet up with the numeric inputs at aConcat
node.In version
0.14.1
, however, the numeric inputs are not being cast to float32, but theReshape
output of the string inputs is being cast tofloat64
.I believe this indicates that the input to my
TreeEnsembleClassifier
node in0.14.0
was an array of float32, but in0.14.1
(and beyond) it's an array of float64. (I tried loading the model into memory and running shape inference to generate the various data types but it wasn't working in the way I expected.)For a sample dataset of 1k rows, this seemingly minor change results in 33% of the scores having a difference greater than 10E-5 between the sklearn and ONNX versions of the model. The greatest difference I observed with my small tests dataset was 0.04.
Workarounds:
passthrough
in the sklearn model, I created a custom sklearn transformer that converts its input to float32. I registered a corresponding ONNX converter which translates this into a basicCast
node. This works.It's been 18 months since the release of
0.14.1
and I couldn't find any similar issues. Is this a bug? Or were we simply getting lucky before that scores were matching perfectly between sklearn and ONNX on0.14.0
? FWIW we pass float64 to the sklearn model when generating scores, so it seems wrong that we'd get one score value back from the sklearn model, but a different score back from the ONNX model when passing in the same float64 inputs.Can you provide any insight on what caused this change? I've poured through the code in this library and the changes in the 1.14.1 release and it's not immediately obvious to me which one caused it. My guess is that it's this one.
Also, I think this commit was included in that release even though it wasn't in the release notes.
The text was updated successfully, but these errors were encountered: