Skip to content

Latest commit

 

History

History
84 lines (56 loc) · 5.14 KB

submit_batch_predict.md

File metadata and controls

84 lines (56 loc) · 5.14 KB

Launch object detection batch predict jobs using GPU

Setup

Requirements:

  • K8s cluster with GPUs
  • Docker and Docker Registry
  • A Tensorflow model in SavedModel format

This example shows how to run batch prediction to do object detection on a pre-trained model in a K8s cluster using GPU. See this guide to customize Kubeflow deployment to add GPU nodes.

Kubeflow batch-predict is apache-beam-based and we are using local runner to run the job in the K8s cluser in this example. One can, however, choose different runners to run the job remotely, such as on Google Dataflow service. As of July 2018, Google Dataflow does NOT support GPUs.

Build and push the image

Use the pre-built image: gcr.io/kubeflow-examples/batch-predict at Google Container Registry (GCR) or build a kubeflow batch predict image. Following is an example to build and host the image in GCR.

IMAGE="gcr.io/${YOUR_GCP_PROJECT}/batch-predict"

docker build -t ${IMAGE} -f ./Dockerfile.batch-predict .

gcloud docker -- push ${IMAGE}

Prepare models

The model used in this example can be downloaded from gs://kubeflow-examples-data/object-detection-coco/image_string_model/saved_model. It is a slightly modified version from the faster RCNN object detection model from TensorFlow model zoo. The model in this example accepts JPEG bytes as its input data.

Alternatively, you can follow the steps to export your own model from checkpoint files from your training jobs, or download a pre-trained model in SavedModel format from the model zoo. The latter accepts numpy arrays of images bits, instead of JPEG bytes.

Refer this blog for exporting a model in SavedModel format, in particular, how to change the input format from the default.

Prepare input data

In this example, the input tensor is JPEG image strings. So we pack the image bytes into TF-records. The input files contains 150 images and can be downloaded from gs://kubeflow-examples-data/object-detection-coco/data/object-detection-images.tfrecord.

Refer this blog for converting images into the input data the model can consume.

Launch batch prediction job

Customize batch-predict.yaml to add the paths to the model, your input image files, the input format, and the output locations. Simply run:

kubectl -n <your_name_space> apply -f ./batch-predict/batch-predict.yaml

Arguments

  • input_file_patterns The list of input files or file patterns, separated by commas.

  • input_file_format One of the following formats: json, tfrecord, and tfrecord_gzip. For the model in this example, the input is a JPEG-encoded image string tensor. The input file contains TF records of JPEG bytes. If you use a model from the model zoo directly, the input is a numpy array instead. Then, your input file should contain multiple numeric arrays. Then the input format should be json. Here is such a sample input, which contains two images.

  • model_dir The directory contains the model files in SavedModel format.

  • batch_size Number of records in one batch in the input data. Depending on the memory in your machine, it is recommend to be 1 to 4, up to 8 in this example.

  • output_result_prefix Output path to save the prediction results.

  • output_error_prefix Output path to save the prediction errors.

For other flags available to Kubeflow batch predict, please refer the source code.

Monitor jobs

Once you submit the job, you can check the status and logs of the pod that runs the batch-predict job:

kubectl log <your-pod-name>

When the pod status is "complete", check the result and error files you specified when starting the job to see if the job is successful.

Visualize detection results

You can use this script to visualize the detection boxes from the prediction results files on images.