For the real-time 3D and 2D detection challenges, users need to submit not only their results in a Submission proto, but also an executable version of their model that can be timed on an NVIDIA Tesla V100 GPU. This directory contains the code that will be used to evaluate submitted models, along with example submissions.
Users will submit docker images that contain their models. Docker is used to abstract over dependency management, including differing versions of CUDA drivers. As such, users can use arbitrary versions of TensorFlow or PyTorch for their submissions with ease. In particular, we have tested TensorFlow 2.3.0 and 2.4.1, PyTorch 1.3 and 1.7.1, CUDA 10.0 and 11.0, and cuDNN 7 and 8, but other versions, especially older versions, should likely work with little issue. The only required dependency that users must have installed is numpy.
User-submitted models will take the form of a Python module named wod_latency_submission
that can be imported from our evaluation script (i.e. it is in the image's PYTHONPATH or pip-installed). The module must contain the following elements:
initialize_model
: A function that takes no arguments that loads and initializes the user's model. Will be run at the beginning of the evaluation script, before any data is passed in to the model.run_model
: A function that takes in various numpy ndarrays whose names match those of the data formats (see below). This function runs the model inference on the passed-in data and returns a dictionary from string to numpy ndarray with the following key-value pairs:boxes
: N x 7 float32 array with the center x, center y, center z, length, width, height, and heading for each detection box. NOTE: If you are participating in the 2D object detection challenge instead, this should be a N x 4 float32 array with the center x, center y, length, and width for each detection box instead.scores
: N length float32 array with the confidence scores in[0, 1]
for each detection boxclasses
: N length uint8 array with the type IDs for each detection box.
DATA_FORMATS
: A list of strings indicating which data formats the model requires. See below for more details.
Converting from Frame protos to usable point clouds/images can be non-trivially expensive (involving various unzippings and transforms) and does not reflect a workflow that would realistically be present in an autonomous driving scenario. Thus, our evaluation of submitted models does not time the conversion from Frame proto to tensor. Instead, we have pre-extracted the dataset into numpy ndarrays. The keys, shapes, and data types are:
POSE
: 4x4 float32 array with the vehicle pose.TIMESTAMP
: int64 scalar with the timestamp of the frame in microseconds.- For each lidar:
<LIDAR_NAME>_RANGE_IMAGE_FIRST_RETURN
: HxWx6 float32 array with the range image of the first return for this lidar. The six channels are range, intensity, elongation, x, y, and z. The x, y, and z values are in vehicle frame. Pixels with range 0 are not valid points.<LIDAR_NAME>_RANGE_IMAGE_SECOND_RETURN
: HxWx6 float32 array with the range image of the first return for this lidar. Same channels as the first return range image.<LIDAR_NAME>_BEAM_INCLINATION
: H-length float32 array with the beam inclinations for each row of range image for this lidar.<LIDAR_NAME>_LIDAR_EXTRINSIC
: 4x4 float32 array with the 4x4 extrinsic matrix for this lidar.<LIDAR_NAME>_CAM_PROJ_FIRST_RETURN
: HxWx6 int64 array with the lidar point to camera image projections for the first return of this lidar. See the documentation forRangeImage.camera_projection_compressed
in dataset.proto for details.<LIDAR_NAME>_CAM_PROJ_SECOND_RETURN
: HxWx6 float32 array with the lidar point to camera image projections for the first return of this lidar. See the documentation forRangeImage.camera_projection_compressed
in dataset.proto for details.- (top lidar only)
TOP_RANGE_IMAGE_POSE
: HxWx6 float32 array with the transform from vehicle frame ot global frame for every pixel in the range image for the TOP lidar. See the documentation forRangeImage.range_image_pose_compressed
in dataset.proto for details.
- For each camera:
<CAMERA_NAME>_IMAGE
: HxWx3 uint8 array with the RGB image from this camera.<CAMERA_NAME>_INTRINSIC
: 9 float32 array with the intrinsics of this camera. See the documentation forCameraCalibration.intrinsic
in dataset.proto for details.<CAMERA_NAME>_EXTRINSIC
: 4x4 float32 array with the 4x4 extrinsic matrix for this camera.<CAMERA_NAME>_WIDTH
: int64 scalar with the width of this camera image.<CAMERA_NAME>_HEIGHT
: int64 scalar with the height of this camera image.<CAMERA_NAME>_POSE
: 4x4 float32 array with the vehicle pose at the timestamp of this camera image.<CAMERA_NAME>_POSE_TIMESTAMP
: float32 scalar with the timestamp in seconds for the image (i.e. the timestamp that<CAMERA_NAME>_POSE
is valid at).<CAMERA_NAME>_ROLLING_SHUTTER_DURATION
: float32 scalar with the duration of the rolling shutter in seconds. See the documentation for `CameraImage.shutter in dataset.proto for details.<CAMERA_NAME>_ROLLING_SHUTTER_DIRECTION
: int64 scalar with the direction of the rolling shutter, expressed as the int value of aCameraCalibration.RollingShutterReadOutDirection
enum.<CAMERA_NAME>_CAMERA_TRIGGER_TIME
: float32 scalar with the time when the camera was triggered.<CAMERA_NAME>_CAMERA_READOUT_DONE_TIME
: float32 scalar with the time when the last readout finished. The difference between this and the trigger time includes the exposure time and the actual sensor readout time.
See the LaserName.Name
and CameraName.Name
enums in dataset.proto for the valid lidar and camera name strings.
To request a field from the previous frame, add _1
to the end of the field name; for example, TOP_RANGE_IMAGE_FIRST_RETURN_1
is the range image for the top lidar from the previous frame. Likewise, to request a field from two frames ago, add _2
to the end of the field name (e.g. TOP_RANGE_IMAGE_FIRST_RETURN_2
). Note that only two previous frames (in addition to the current frame, which does not require a subscript) can be requested.
Users specify which of these arrays they would like their models to receive using the DATA_FIELDS
list of strings in their wod_latency_submission
module. The requested arrays will then be passed to run_model
as keyword arguments with the original names (e.g. TOP_RANGE_IMAGE_FIRST_RETURN
).
Note that the convert_frame_to_dict
function in utils/frame_utils.py will convert a Frame proto into a dictionary with the same keys and values defined above. However, it will not add the _1
or _2
suffix to the keys from earlier frames for multi-frame input.
These examples all show different ways of making Docker images that contain submission models that comply with the input and output formats for the challenges. The repo contains a TensorFlow and a PyTorch example for each of the real-time detection challenges (2D and 3D).
tensorflow/from_saved_model
: Contains a basic PointPillars model that is executed from a SavedModel for the 3D detection challenge.pytorch/from_saved_model
: Contains a PV-RCNN model that loads saved weights and runs model inference using the input/output formats specified above for the 3D detection challenge.tensorflow/multiframe
: Contains a basic PointNet model that shows how to use multiple frames as input.2d_challenge/tensorflow
: Contains a pretrained EfficientDet model loaded from the TF Model Zoo that outputs detections for the 2D detection challenge.2d_challenge/pytorch
: Contains a pretrained Faster R-CNN model loaded from torchvision that outputs detections for the 2D detection challenge.
Upload your docker image to a Bucket on Google Cloud Storage or to Google
Container/Artifact Registry and indicate resource location using
docker_image_source
field of the submission proto.
Please note that the latency would only be benchmarked if a valid docker image path is provided
You can use any bucket configuration but to minimize any incurred costs it is
recommended to create a bucket with the following settings Location: Region
,
Region: us-west1 (Oregon)
docker_image_source should contain full path to the submission image starting
with gs://
protocol, i.e. docker_image_source: "gs://example_bucket_name/example_folder/example_docker_image.tar.gz"
You can produce a compatible Docker image via the following command:
docker save --output="example_docker_image.tar" ID_OF_THE_IMAGE
More info on docker save is available at official documentation
To improve container size and upload time you can additionally compress the image:
gzip example_docker_image.tar
To upload image to the bucket you can use Web interface of Google Cloud Storage
or gsutil
command:
gsutil cp example_docker_image.tar.gz gs://example_bucket_name/example_folder/
More info on gsutil
command available at official documentation
You will need to grant Waymo’s service account
[email protected]
permissions to read from
your submission bucket (for this reason we recommend use a separate bucket
just for submissions)
- It is recommended to use a unique name for every submitted docker image as the evaluation server may fetch the submission with a delay from the actual submission time.
- Please note that resulting docker images, even after compressions may be larger than 1GB and using Google Container Registry or Google Artifact Registry may be preferred.
There is no difference between using Google Container Registry or Google Artifact Registry
Google Container Registry and Google Artifact Registry use same underlying
bucket base storage system so the recommended bucket settings apply too
Location: Region
, Region: us-west1 (Oregon)
- docker_image_source should contain full id of the submission image within
*docker.pkg.dev doman, i.e.
docker_image_source: "us-west1-docker.pkg.dev/example-registry-name/example-folder/example-image@sha256:example-sha256-hash"
You can upload Docker image via the following command:
docker tag example-image-hash us-west1-docker.pkg.dev/example-registry-name/example-folder/example-image
docker push us-west1-docker.pkg.dev/example-registry-name/example-folder/example-image
More info on docker save is available at official documentation
You will need to grant Waymo’s service account
[email protected]
permissions to read from
your Google Docker Registry and Google Artifact Registry (for this reason we
recommend use a separate repository just for submissions)
It is recommended to always submit using sha256 of the currently submitted image and not using ‘latest’ as the evaluation server may fetch the submission with a delay from the actual submission time.