Description
Hello and thanks for this wonderful library! For the past 18 months, we've been using version 1.14.0 to convert an sklearn XGBoostClassifier pipeline model to ONNX. Everything has worked great. Recently we needed to upgrade to newer versions of onnx and onnxruntime. Upgrading to the latest versions of those libraries as well as the latest version of this library resulted in score mismatch issues between the sklearn pipeline version of our model and the ONNX version of our model. After much head scratching and searching, I narrowed down the issue to version 1.14.1 of skl2onnx. If I keep the old versions of onnxruntime
and onnx
that we've been using for the last 18 months and only switch from skl2onnx==1.14.0
to skl2onnx==1.14.1
, I can reproduce this score mismatch error. (I can also reproduce the issue when using any newer version of onnx, onnxruntime, or skl2onnx>=1.14.1
After inspecting the models in Netron, it looks like the underlying structure has changed a bit. For context, our sklearn model pipeline takes in a combination of float64 and string inputs. The string inputs are all one-hot encoded by the model pipeline. The float64 inputs are run through the venerable sklearn passthrough
transformer. These values are then fed to the XGBClassifier
.
In version 0.14.0
, the numeric float64 inputs feed to a Concat
node and then into a Cast
node which converts them all to float32. The string inputs go one hot encoding -> Concat
-> Reshape
and then meet up with the numeric inputs at a Concat
node.
In version 0.14.1
, however, the numeric inputs are not being cast to float32, but the Reshape
output of the string inputs is being cast to float64
.
I believe this indicates that the input to my TreeEnsembleClassifier
node in 0.14.0
was an array of float32, but in 0.14.1
(and beyond) it's an array of float64. (I tried loading the model into memory and running shape inference to generate the various data types but it wasn't working in the way I expected.)
For a sample dataset of 1k rows, this seemingly minor change results in 33% of the scores having a difference greater than 10E-5 between the sklearn and ONNX versions of the model. The greatest difference I observed with my small tests dataset was 0.04.
Workarounds:
- Change the data type on the numeric inputs from float64 to float32 upfront. This resolves the score mismatch issue, but won't be accepted by our model serving environment (it insists on passing float64 for reasons outside of our control).
- Instead of using
passthrough
in the sklearn model, I created a custom sklearn transformer that converts its input to float32. I registered a corresponding ONNX converter which translates this into a basicCast
node. This works.
It's been 18 months since the release of 0.14.1
and I couldn't find any similar issues. Is this a bug? Or were we simply getting lucky before that scores were matching perfectly between sklearn and ONNX on 0.14.0
? FWIW we pass float64 to the sklearn model when generating scores, so it seems wrong that we'd get one score value back from the sklearn model, but a different score back from the ONNX model when passing in the same float64 inputs.
Can you provide any insight on what caused this change? I've poured through the code in this library and the changes in the 1.14.1 release and it's not immediately obvious to me which one caused it. My guess is that it's this one.
Also, I think this commit was included in that release even though it wasn't in the release notes.