This folder contains the source code and configuration necessary to serve the model on Vertex AI. The implementation follows this container architecture.
The serving container can be used in both online and batch prediction workflows:
-
Online predictions: Deploy the container as a REST endpoint, like a Vertex AI endpoint. This allows you to access the model for real-time predictions via the REST Application Programming Interface (API).
-
Batch predictions: Use the container to run large-scale Vertex AI batch prediction jobs to process many inputs at once.
-
serving_framework/
: A library for implementing Vertex AI-compatible HTTP servers. -
vertex_schemata/
: Folder containing YAML files that define the PredictSchemata for Vertex AI endpoints. -
Dockerfile
: Defines the Docker image for serving the model. -
entrypoint.sh
: A bash script that is used as the Docker entrypoint. It sets up the necessary environment variables, copies the TensorFlow SavedModel(s) locally and launches the TensorFlow server and the frontend HTTP server. -
model_config.txtpb
: The protocol buffer message used inentrypoint.sh
to configure the TensorFlow Model server to run multiple models. -
predictor.py
: Prepares model input, calls the models, and post-processes the output into the final response. -
requirements.txt
: Lists the required Python packages. -
server_gunicorn.py
: Creates the HTTP server that launches the prediction executor.
data_processing/
: A library for data retrieval and processing.