Talk details

Short overview of Model Deployment and its challenges

Deployment process is different from training
Handle multiple versions of the model
Hot swapping new versions
Write wrapper code for API + Code maintenance
Retry mechanism

..and many more

Short introduction of Tensorflow Serving

Flexible, high-performance serving system for machine learning models, designed for production environments.
Battle-tested in Google's systems
Let us talk about it's features!

Basics of serving a model and interacting with it using a REST endpoint and gRPC

Short brief about the SavedModel format (Show code if possible)

Easiest way to use Tensorflow Serving - Docker

docker pull tensorflow/serving

sudo docker run -p 8501:8500 -p 9000:8501  \ 
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph,target=/models/inference_graph \
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph/model_config.config,target=/models/model_config.config \ 
 -t -i tensorflow/serving --model_config_file=/models/model_config.config

Break apart the command to explain individual components

GRPC API example

Short brief about the anatomy of the request + Code ( Show part of the class in slide ) + Demo

Rest API example

Predict API
POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict

Body
{
  // (Optional) Serving signature to use.
  // If unspecifed default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

Short brief about the anatomy of the request + Code ( Show part of the class in slide )

gRPC vs REST Comparison ( Use-case specific : Investigating performance on live images and understand reason behind slowdown )

Using model version policy

How does TF serving handle multiple versions?
Significance of --model_config_file_poll_wait_seconds
How to rollback to a stable version ( Hot-swapping)
Canarying - Serving a stable and a canary model; Custom logic at client end to redirect slice of traffic to the canary model ( sPending )

Model Warmup

Pre-requisite: Understanding the grpc request format
Necessity for Model warmup
Code + Demo for generating warmup files - Show difference between warmuped model's first request VS non-warmed up model

Monitoring using Grafana + Prometheus(Pending)

Dumping of metrics
Prometheus and PromQL
Grafana basics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Talk details

Short overview of Model Deployment and its challenges

Short introduction of Tensorflow Serving

Basics of serving a model and interacting with it using a REST endpoint and gRPC

Short brief about the SavedModel format (Show code if possible)

Easiest way to use Tensorflow Serving - Docker

GRPC API example

Rest API example

gRPC vs REST Comparison ( Use-case specific : Investigating performance on live images and understand reason behind slowdown )

Using model version policy

Model Warmup

Monitoring using Grafana + Prometheus(Pending)

Clone this wiki locally