-
Notifications
You must be signed in to change notification settings - Fork 0
Talk details
Tata Ganesh edited this page Oct 10, 2019
·
2 revisions
- Deployment process is different from training
- Handle multiple versions of the model
- Hot swapping new versions
- Write wrapper code for API + Code maintenance
- Retry mechanism
..and many more
- Flexible, high-performance serving system for machine learning models, designed for production environments.
- Battle-tested in Google's systems
- Let us talk about it's features!
docker pull tensorflow/serving
sudo docker run -p 8501:8500 -p 9000:8501 \
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph,target=/models/inference_graph \
--mount type=bind,source=/home/tata/Projects/hand_detector/inference_graph/model_config.config,target=/models/model_config.config \
-t -i tensorflow/serving --model_config_file=/models/model_config.config
- Break apart the command to explain individual components
Short brief about the anatomy of the request + Code ( Show part of the class in slide ) + Demo
Predict API
POST http://host:port/v1/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]:predict
Body
{
// (Optional) Serving signature to use.
// If unspecifed default serving signature is used.
"signature_name": <string>,
// Input Tensors in row ("instances") or columnar ("inputs") format.
// A request can have either of them but NOT both.
"instances": <value>|<(nested)list>|<list-of-objects>
"inputs": <value>|<(nested)list>|<object>
}
Short brief about the anatomy of the request + Code ( Show part of the class in slide )
gRPC vs REST Comparison ( Use-case specific : Investigating performance on live images and understand reason behind slowdown )
- How does TF serving handle multiple versions?
- Significance of
--model_config_file_poll_wait_seconds
- How to rollback to a stable version ( Hot-swapping)
- Canarying - Serving a stable and a canary model; Custom logic at client end to redirect slice of traffic to the canary model ( sPending )
- Pre-requisite: Understanding the grpc request format
- Necessity for Model warmup
- Code + Demo for generating warmup files - Show difference between warmuped model's first request VS non-warmed up model
- Dumping of metrics
- Prometheus and PromQL
- Grafana basics