You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I discovered a bug in the current gRPC code in mlserver. I have a model that returns float16 arrays and I tried to get predictions via gRPC. I could narrow down the issue to this example without any client-server complexity.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 380, in from_types
InferOutputTensorConverter.from_types(output)
File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 425, in from_types
contents=InferTensorContentsConverter.from_types(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/mlserver/grpc/converters.py", line 335, in from_types
return pb.InferTensorContents(**contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected bytes, float found
The code uses bytes in the dataplane for FP16 inputs. The dataplane doesn't even offer a fp16_contents field that could be used for the purpose. (Is it because protobuf doesn't support fp16 by default?)
Potential fix
I think in this case, the fp32_content should be used in the gRPC type-to-field mapping. Although, this wastes half of the bandwidth.
The text was updated successfully, but these errors were encountered:
message ModelInferResponse
{
// ...
// The output tensors holding inference results.
repeated InferOutputTensor outputs = 5;
// The data contained in an output tensor can be represented in
// "raw" bytes form or in the repeated type that matches the
// tensor's data type. To use the raw representation 'raw_output_contents'
// must be initialized with data for each tensor in the same order as
// 'outputs'. For each tensor, the size of this content must match
// what is expected by the tensor's shape and data type. The raw
// data must be the flattened, one-dimensional, row-major order of
// the tensor elements without any stride or padding between the
// elements. Note that the FP16 and BF16 data types must be represented as
// raw content as there is no specific data type for a 16-bit float type.
//
// If this field is specified then InferOutputTensor::contents must
// not be specified for any output tensor.
repeated bytes raw_output_contents = 6;
So, 16-bit floats should actually go to raw_output_contents. Not sure, why this didn't work in my case
I think I discovered a bug in the current gRPC code in mlserver. I have a model that returns float16 arrays and I tried to get predictions via gRPC. I could narrow down the issue to this example without any client-server complexity.
Reproduce error
The last line yields
Root cause
I think the root cause is in the gRPC type-to-field mapping:
The code uses bytes in the dataplane for
FP16
inputs. The dataplane doesn't even offer afp16_contents
field that could be used for the purpose. (Is it because protobuf doesn't support fp16 by default?)Potential fix
I think in this case, the
fp32_content
should be used in the gRPC type-to-field mapping. Although, this wastes half of the bandwidth.The text was updated successfully, but these errors were encountered: