Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing with Generic Error message: Failed to obtain stable measurement. #777

Open
Kanupriyagoyal opened this issue Aug 20, 2024 · 11 comments
Assignees

Comments

@Kanupriyagoyal
Copy link

I am testing on the basic models. Model take input and return the same output of same datatype.

Inference is happening:
2024-08-20 09:35:15,923 - INFO - array_final: array([[103]], dtype=uint8)
array_final: [[103]]

perf_analyzer -m model_equals_b_uint8 --measurement-mode count_windows --measurement-request-count 5 -v
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 5
Using synchronous calls for inference

Request concurrency: 1
Pass [1] throughput: 478.67 infer/sec. Avg latency: 2059 usec (std 3372 usec).
Pass [2] throughput: 621.713 infer/sec. Avg latency: 1625 usec (std 3008 usec).
Pass [3] throughput: 491.884 infer/sec. Avg latency: 2027 usec (std 13098 usec).
Pass [4] throughput: 18.0441 infer/sec. Avg latency: 54594 usec (std 80657 usec).
Pass [5] throughput: 16.6456 infer/sec. Avg latency: 61007 usec (std 68590 usec).
Pass [6] throughput: 62.8963 infer/sec. Avg latency: 5822 usec (std 7896 usec).
Pass [7] throughput: 16.871 infer/sec. Avg latency: 60256 usec (std 112842 usec).
Pass [8] throughput: 15.6989 infer/sec. Avg latency: 63212 usec (std 110034 usec).
Pass [9] throughput: 15.1902 infer/sec. Avg latency: 65797 usec (std 87972 usec).
Pass [10] throughput: 14.0266 infer/sec. Avg latency: 72140 usec (std 93986 usec).
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-request-count.
Failed to obtain stable measurement.

perf_analyzer -m model_equals_b_uint8 --measurement-mode count_windows --measurement-request-count 50 -v
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 50
Using synchronous calls for inference

Request concurrency: 1
Pass [1] throughput: 23.4639 infer/sec. Avg latency: 42614 usec (std 182802 usec).
Pass [2] throughput: 141.78 infer/sec. Avg latency: 3437 usec (std 5377 usec).
Pass [3] throughput: 14.8405 infer/sec. Avg latency: 67552 usec (std 97666 usec).
Pass [4] throughput: 12.2003 infer/sec. Avg latency: 82423 usec (std 75027 usec).
Pass [5] throughput: 14.2399 infer/sec. Avg latency: 70712 usec (std 120651 usec).
Pass [6] throughput: 86.8397 infer/sec. Avg latency: 2083 usec (std 2502 usec).
Pass [7] throughput: 22.6803 infer/sec. Avg latency: 45020 usec (std 178493 usec).
Pass [8] throughput: 17.8704 infer/sec. Avg latency: 56233 usec (std 175833 usec).
Pass [9] throughput: 23.0646 infer/sec. Avg latency: 43166 usec (std 148978 usec).
Pass [10] throughput: 18.234 infer/sec. Avg latency: 55330 usec (std 102755 usec).
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-request-count.
Failed to obtain stable measurement.

perf_analyzer -m model_equals_b_uint8 --measurement-mode count_windows --measurement-request-count 75 -v
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 75
Using synchronous calls for inference

Request concurrency: 1
Pass [1] throughput: 428.863 infer/sec. Avg latency: 2328 usec (std 3510 usec).
Pass [2] throughput: 494.642 infer/sec. Avg latency: 2018 usec (std 3441 usec).
Pass [3] throughput: 308.695 infer/sec. Avg latency: 3156 usec (std 13751 usec).
Pass [4] throughput: 340.429 infer/sec. Avg latency: 1828 usec (std 3966 usec).
Pass [5] throughput: 21.0775 infer/sec. Avg latency: 47814 usec (std 168738 usec).
Pass [6] throughput: 18.7684 infer/sec. Avg latency: 53730 usec (std 65595 usec).
Pass [7] throughput: 16.0608 infer/sec. Avg latency: 62265 usec (std 63152 usec).
Pass [8] throughput: 3.68812 infer/sec. Avg latency: 271139 usec (std 363750 usec).
Pass [9] throughput: 203.656 infer/sec. Avg latency: 4908 usec (std 6825 usec).
Pass [10] throughput: 214.693 infer/sec. Avg latency: 2469 usec (std 3830 usec).
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-request-count.
Failed to obtain stable measurement.

perf_analyzer -m model_equals_b_uint8 --measurement-mode count_windows --measurement-request-count 100 -v
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "count_windows" mode for stabilization
Stabilizing using average latency and throughput
Minimum number of samples in each window: 100
Using synchronous calls for inference

Request concurrency: 1
Pass [1] throughput: 423.137 infer/sec. Avg latency: 2331 usec (std 2866 usec).
Pass [2] throughput: 99.6489 infer/sec. Avg latency: 10037 usec (std 135019 usec).
Pass [3] throughput: 253.617 infer/sec. Avg latency: 1639 usec (std 1605 usec).
Pass [4] throughput: 16.316 infer/sec. Avg latency: 62273 usec (std 161047 usec).
Pass [5] throughput: 22.5236 infer/sec. Avg latency: 44084 usec (std 143282 usec).
Pass [6] throughput: 13.3747 infer/sec. Avg latency: 75319 usec (std 81540 usec).
Pass [7] throughput: 15.3824 infer/sec. Avg latency: 65006 usec (std 130209 usec).
Pass [8] throughput: 2.24593 infer/sec. Avg latency: 445246 usec (std 205477 usec).
Pass [9] throughput: 145.757 infer/sec. Avg latency: 2459 usec (std 4845 usec).
Pass [10] throughput: 15.9015 infer/sec. Avg latency: 63902 usec (std 89986 usec).
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-request-count.
Failed to obtain stable measurement.

perf_analyzer -b 1 -m model_equals_b_uint16 -v
*** Measurement Settings ***
Batch size: 1
Service Kind: TRITON
Using "time_windows" mode for stabilization
Stabilizing using average latency and throughput
Measurement window: 5000 msec
Using synchronous calls for inference

Request concurrency: 1
Pass [1] throughput: 264.819 infer/sec. Avg latency: 2428 usec (std 19166 usec).
Pass [2] throughput: 19.249 infer/sec. Avg latency: 45776 usec (std 59715 usec).
Pass [3] throughput: 11.4458 infer/sec. Avg latency: 87830 usec (std 55669 usec).
Pass [4] throughput: 13.7479 infer/sec. Avg latency: 73070 usec (std 177674 usec).
Pass [5] throughput: 16.5643 infer/sec. Avg latency: 59318 usec (std 166888 usec).
Pass [6] throughput: 11.5103 infer/sec. Avg latency: 86986 usec (std 188720 usec).
Pass [7] throughput: 32.5302 infer/sec. Avg latency: 31859 usec (std 184371 usec).
Pass [8] throughput: 23.3457 infer/sec. Avg latency: 42082 usec (std 186189 usec).
Pass [9] throughput: 14.2139 infer/sec. Avg latency: 70781 usec (std 194576 usec).
Pass [10] throughput: 14.5149 infer/sec. Avg latency: 68353 usec (std 190451 usec).
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-interval.
Failed to obtain stable measurement.

Everytime i am getting same generic error message.Though I am increasing the measurement-request-count

@ganeshku1
Copy link

@Kanupriyagoyal Can you please provide complete reproduction steps, which includes Trtiton-sdk container version, your server set-up instructions for us to reproduce the issue you are observing?

@ganeshku1 ganeshku1 self-assigned this Aug 20, 2024
@Kanupriyagoyal
Copy link
Author

Kanupriyagoyal commented Aug 20, 2024

Using perf_analyzer built from source

RUN git clone https://github.com/triton-inference-server/client -b r24.07
RUN mkdir client/build && \
    cd client/build && \
    cmake -DTRITON_ENABLE_PERF_ANALYZER=ON .. && \
    make -j8 cc-clients

Server : r24.07

loaded multiple models with different data types with batch and non batch to see different datatypes supported with perf analyzer

I0814 14:12:56.254085 68896 server.cc:674]
+-------------------------+---------+--------+
| Model | Version | Status |
+-------------------------+---------+--------+
| model_equals_b_bytes | 1 | READY |
| model_equals_b_float16 | 1 | READY |
| model_equals_b_float32 | 1 | READY |
| model_equals_b_float64 | 1 | READY |
| model_equals_b_int16 | 1 | READY |
| model_equals_b_int32 | 1 | READY |
| model_equals_b_int64 | 1 | READY |
| model_equals_b_int8 | 1 | READY |
| model_equals_b_uint16 | 1 | READY |
| model_equals_b_uint32 | 1 | READY |
| model_equals_b_uint64 | 1 | READY |
| model_equals_b_uint8 | 1 | READY |
| model_equals_nb_bytes | 1 | READY |
| model_equals_nb_float16 | 1 | READY |
| model_equals_nb_float32 | 1 | READY |
| model_equals_nb_float64 | 1 | READY |
| model_equals_nb_int16 | 1 | READY |
| model_equals_nb_int32 | 1 | READY |
| model_equals_nb_int64 | 1 | READY |
| model_equals_nb_int8 | 1 | READY |
| model_equals_nb_uint16 | 1 | READY |
| model_equals_nb_uint32 | 1 | READY |
| model_equals_nb_uint64 | 1 | READY |
| model_equals_nb_uint8 | 1 | READY |
| model_string | 1 | READY |
+-------------------------+---------+————+

Need check on this also:
triton-inference-server/server#7526

@Kanupriyagoyal
Copy link
Author

If I want to pass specific data to perf analyzer using --input-data
passing json in this pattern, but getting error : Failed to init manager inputs: unable to find float data in json.

{"data":
      [
        {
          "IN0":
            {
                "content": [[54.88135039273247, 71.51893663724195, 60.276337607164386, 54.48831829968969, 42.36547993389047, 64.58941130666561, 43.75872112626925, 89.17730007820798, 96.36627605010293, 38.34415188257777, 79.17250380826646, 52.88949197529045, 56.80445610939323, 92.5596638292661, 7.103605819788694], [49.04588086175671, 22.741462797332325, 25.435648177039294, 5.802916032387562, 43.44166255581208, 31.179588199410258, 69.63434888154595, 37.775183929248094, 17.960367755963482, 2.467872839133123, 6.724963146324859, 67.93927734985672, 45.36968445560453, 53.65792111087222, 89.6671293040342]],
                "shape": [2, 15]
            }
        }]}

Would you please give an example how to create json , If I want specific json to be used for perf analyzer without round robin.

@debermudez
Copy link
Contributor

@Kanupriyagoyal I think this is reasonable error from the tool. The latencies and thus the throughputs are bouncing all over the place.
It looks like you are driving inputs round robin through the list of 25 models. Since each has a different performance, its hard to get a stabilized measurement over several passes.

I need to investigate more the answer to your follow up question. For now, you can get ungated by loading only a couple models per server instance to profile.

@Kanupriyagoyal
Copy link
Author

Kanupriyagoyal commented Aug 21, 2024

@debermudez I did load and unload of the model in explicit mode, so now one model loaded at a time. Still getting these issues

root@092f07c34488:/home/ibm-user# curl -X POST -v localhost:8000/v2/repository/models/model_equals_b_uint64/load
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> POST /v2/repository/models/model_equals_b_uint64/load HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 0
< 
* Connection #0 to host localhost left intact
root@092f07c34488:/home/ibm-user# curl -X POST -v localhost:8000/v2/repository/models/model_equals_b_uint32/unload
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> POST /v2/repository/models/model_equals_b_uint32/unload HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.81.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 0
< 
* Connection #0 to host localhost left intact
 perf_analyzer -m model_equals_b_uint64
*** Measurement Settings ***
  Batch size: 1
  Service Kind: TRITON
  Using "time_windows" mode for stabilization
  Stabilizing using average latency and throughput
  Measurement window: 5000 msec
  Using synchronous calls for inference

Request concurrency: 1
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-interval.
Failed to obtain stable measurement.

root@092f07c34488:/home/ibm-user# perf_analyzer -m model_equals_b_uint64 -v
*** Measurement Settings ***
  Batch size: 1
  Service Kind: TRITON
  Using "time_windows" mode for stabilization
  Stabilizing using average latency and throughput
  Measurement window: 5000 msec
  Using synchronous calls for inference

Request concurrency: 1
  Pass [1] throughput: 259.822 infer/sec. Avg latency: 2056 usec (std 7170 usec). 
  Pass [2] throughput: 14.5524 infer/sec. Avg latency: 63619 usec (std 156519 usec). 
  Pass [3] throughput: 13.6066 infer/sec. Avg latency: 73071 usec (std 74391 usec). 
  Pass [4] throughput: 12.2582 infer/sec. Avg latency: 81547 usec (std 129094 usec). 
  Pass [5] throughput: 16.6463 infer/sec. Avg latency: 60070 usec (std 17517 usec). 
  Pass [6] throughput: 12.5688 infer/sec. Avg latency: 79944 usec (std 95460 usec). 
  Pass [7] throughput: 13.47 infer/sec. Avg latency: 74308 usec (std 51297 usec). 
  Pass [8] throughput: 10.7772 infer/sec. Avg latency: 92812 usec (std 49877 usec). 
  Pass [9] throughput: 12.6648 infer/sec. Avg latency: 79684 usec (std 143922 usec). 
  Pass [10] throughput: 14.178 infer/sec. Avg latency: 69556 usec (std 115979 usec). 
Failed to obtain stable measurement within 10 measurement windows for concurrency 1. Please try to increase the --measurement-interval.
Failed to obtain stable measurement.


curl -X POST -v localhost:8000/v2/repository/index
{"name":"model_equals_b_uint64","version":"1","state":"UNAVAILABLE","reason":"unloaded"},{"name":"model_equals_b_uint8","version":"1","state":"READY"}

But when I am restarting the server and loading the model it is working.

@nv-hwoo
Copy link
Contributor

nv-hwoo commented Aug 21, 2024

If I want to pass specific data to perf analyzer using --input-data
passing json in this pattern, but getting error : Failed to init manager inputs: unable to find float data in json.

@Kanupriyagoyal You need to match the input name, type, and shape. Could you share the model config of the model you want to query?

@Kanupriyagoyal
Copy link
Author

model config.pbtxt

name: "gbm_model"
backend: "python"
max_batch_size: 32
input [
  {
    name: "IN0"
    data_type: TYPE_FP32
    dims: [ 30 ]
  }
]
output [
  {
    name: "OUT0"
    data_type: TYPE_FP64
    dims: [ 1 ]
  }
]

version_policy: { all { }}

instance_group [
  {
    count: 2
    kind: KIND_CPU
  }
]

dynamic_batching {
  max_queue_delay_microseconds: 500
}

input_gbm.json passed

{"data":
      [
        {
          "IN0":
            {
                "content": [17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,1.095,0.9053,8.589,153.4,0.006399,0.04904,0.05373,0.01587,0.03003,0.006193,25.38,17.33,184.6,2019,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189],
                "shape": [30]
            }
        }]}

Running perf analyzer:

perf_analyzer -m gbm_model --service-kind=triton --model-repository=/models -b 1 -u localhost:8001 -i grpc -f gdm_model_1-results.csv  --verbose-csv --concurrency-range 1 --measurement-mode count_windows --input-data input_gbm.json --collect-metrics --metrics-url http://localhost:8002/metrics --metrics-interval 1000
**error: Failed to init manager inputs: unable to find float data in json**

@debermudez
Copy link
Contributor

@Kanupriyagoyal Pinged another teammate to investigate more.

Part of the issue is the format of the content.
Looking at the doc here, the tensors need to be flattened in a row-major format. The shape stays as [2,15] however.
Can you give that a shot?

@debermudez
Copy link
Contributor

@Kanupriyagoyal any luck?

@Kanupriyagoyal
Copy link
Author

Kanupriyagoyal commented Aug 28, 2024

@Kanupriyagoyal any luck?

Yes by changing the data as flattened in a row-major format it worked for float data
but for string data still facing the error.
Below issue.
triton-inference-server/server#7526

@Kanupriyagoyal
Copy link
Author

Kanupriyagoyal commented Sep 4, 2024

@debermudez @nv-hwoo would you please suggest for the bytes data . Any example

import numpy as np
import tritonclient.grpc as grpcclient


if __name__ == "__main__":
    inputs = []
    inputs.append(grpcclient.InferInput("IN0", [16], "BYTES"))
    input0_data = np.array(["17", "2", "2007", "6", "30", "16", "15", "0", "5.4", "Swipe Transaction", "-6571010470072147219", "Bloomville", "OH", "44818",  "5499", "Bad PIN"], dtype=np.dtype("bytes"))
    inputs[0].set_data_from_numpy(input0_data)
    print(inputs[0]._raw_content) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants