Skip to content

Talk details

Tata Ganesh edited this page Oct 10, 2019 · 2 revisions

Short overview of Model Deployment and its challenges

  • Deployment process is different from training
  • Handle multiple versions of the model
  • Hot swapping new versions
  • Write wrapper code for API + Code maintenance
  • Retry mechanism

..and many more

Short introduction of Tensorflow Serving

  • Flexible, high-performance serving system for machine learning models, designed for production environments.
  • Battle-tested in Google's systems
  • Let us talk about it's features!

Basics of serving a model and interacting with it using a REST endpoint and gRPC

Short brief about the SavedModel format (Show code if possible)

Easiest way to use Tensorflow Serving - Docker

docker pull tensorflow/serving

sudo docker run -p 8501:8500 -p 9000:8501  \ 
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph,target=/models/inference_graph \
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph/model_config.config,target=/models/model_config.config \ 
 -t -i tensorflow/serving --model_config_file=/models/model_config.config

  • Break apart the command to explain individual components

GRPC API example

Short brief about the anatomy of the request + Code ( Show part of the class in slide ) + Demo

Rest API example

Predict API
POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict

Body
{
  // (Optional) Serving signature to use.
  // If unspecifed default serving signature is used.
  "signature_name": <string>,

  // Input Tensors in row ("instances") or columnar ("inputs") format.
  // A request can have either of them but NOT both.
  "instances": <value>|<(nested)list>|<list-of-objects>
  "inputs": <value>|<(nested)list>|<object>
}

Short brief about the anatomy of the request + Code ( Show part of the class in slide )

gRPC vs REST Comparison ( Use-case specific : Investigating performance on live images and understand reason behind slowdown )

Using model version policy

  • How does TF serving handle multiple versions?
  • Significance of --model_config_file_poll_wait_seconds
  • How to rollback to a stable version ( Hot-swapping)
  • Canarying - Serving a stable and a canary model; Custom logic at client end to redirect slice of traffic to the canary model ( sPending )

Model Warmup

  • Pre-requisite: Understanding the grpc request format
  • Necessity for Model warmup
  • Code + Demo for generating warmup files - Show difference between warmuped model's first request VS non-warmed up model

Monitoring using Grafana + Prometheus(Pending)

  • Dumping of metrics
  • Prometheus and PromQL
  • Grafana basics