Merge pull request #14 from zooniverse/mdv5-zooniverse-poc

Update CameraTraps Batch API to utilize Megadetector V5
zooniverse · Dec 4, 2023 · 3ee7222 · 3ee7222
2 parents 8044b97 + 02b2a27
commit 3ee7222
Show file tree

Hide file tree

Showing 663 changed files with 30,219 additions and 102,231 deletions.
diff --git a/.github/workflows/deploy_app.yml b/.github/workflows/deploy_app.yml
@@ -0,0 +1,52 @@
+name: Deploy to Production
+
+on:
+  push:
+    branches:
+      - zooniverse-deployment
+  workflow_dispatch:
+
+jobs:
+  build_and_push_image:
+    name: Build and Push Image
+    uses: zooniverse/ci-cd/.github/workflows/build_and_push_image.yaml@main
+    with:
+      repo_name: camera-traps-api
+      commit_id: ${{ github.sha }}
+      file: ./api/batch_processing/api_core/Dockerfile
+      latest: true
+
+  deploy_app:
+    runs-on: ubuntu-latest
+    needs: [build_and_push_image]
+    steps:
+    - name: Checkout
+      uses: actions/[email protected]
+
+    - name: Login to GitHub Container Registry
+      uses: docker/login-action@v3
+      with:
+        registry: ghcr.io
+        username: ${{ github.actor }}
+        password: ${{ secrets.GITHUB_TOKEN }}
+
+    - uses: azure/login@v1
+      with:
+       creds: ${{ secrets.AZURE_AKS }}
+
+    - uses: azure/aks-set-context@v3
+      with:
+       resource-group: kubernetes
+       cluster-name: microservices
+
+    - name: Dry run deployments
+      run: |
+        sed "s/__IMAGE_TAG__/${{ github.sha }}/g" ./api/batch_processing/api_core/kubernetes/deployment.tmpl \
+          | kubectl --context azure apply --dry-run=client --record -f -
+
+    - name: Modify & apply template
+      run: |
+        sed "s/__IMAGE_TAG__/${{ github.sha }}/g" ./api/batch_processing/api_core/kubernetes/deployment.tmpl \
+          | kubectl --context azure apply --record -f -
+
+
diff --git a/.gitignore b/.gitignore
@@ -54,3 +54,9 @@ api_config*.py
 *.pth
 *.o
 debug.log
+*.swp
+
+# Things created when building the sync API
+yolov5
+api/synchronous/api_core/animal_detection_api/detection
+
diff --git a/.spyproject/codestyle.ini b/.spyproject/codestyle.ini
diff --git a/.spyproject/encoding.ini b/.spyproject/encoding.ini
diff --git a/.spyproject/vcs.ini b/.spyproject/vcs.ini
diff --git a/Jenkinsfile b/Jenkinsfile
diff --git a/README.md b/README.md
diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,41 @@
+<!-- BEGIN MICROSOFT SECURITY.MD V0.0.8 BLOCK -->
+
+## Security
+
+Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
+
+If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
+
+## Reporting Security Issues
+
+**Please do not report security vulnerabilities through public GitHub issues.**
+
+Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
+
+If you prefer to submit without logging in, send email to [[email protected]](mailto:[email protected]).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
+
+You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 
+
+Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
+
+  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
+  * Full paths of source file(s) related to the manifestation of the issue
+  * The location of the affected source code (tag/branch/commit or direct URL)
+  * Any special configuration required to reproduce the issue
+  * Step-by-step instructions to reproduce the issue
+  * Proof-of-concept or exploit code (if possible)
+  * Impact of the issue, including how an attacker might exploit the issue
+
+This information will help us triage your report more quickly.
+
+If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
+
+## Preferred Languages
+
+We prefer all communications to be in English.
+
+## Policy
+
+Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
+
+<!-- END MICROSOFT SECURITY.MD BLOCK -->
diff --git a/api/README.md b/api/README.md
@@ -3,21 +3,11 @@
 Though most of our users either use the MegaDetector model directly or work with us to run MegaDetector on the cloud, we also package useful components developed in the Camera Traps project into APIs that users can operate (on the cloud or on local computers) to process camera trap images in a variety of scenarios. This folder contains the source code of the APIs and documentation on how to set them up.
 
 
-## Detector
-
-Our animal detection model ([MegaDetector](https://github.com/Microsoft/CameraTraps#megadetector)) trained on camera trap images from a variety of ecosystems can be served via two APIs, one for real-time applications or small batches of test images (synchronous API), and one for processing large collections of images (batch processing API). These APIs can be adapted to deploy any algorithms or models &ndash; see our tutorial in the [AI for Earth API Framework](https://github.com/Microsoft/AIforEarth-API-Development) repo.
-
-
 ### Synchronous API
 
-This API&rsquo;s `/detect` endpoint processes up to 8 images at a time, and optionally returns copies of the input images annotated with the detection bounding boxes. This API powers the [demo](../demo) web app.
-
-To build the API, first download the [MegaDetector](https://github.com/Microsoft/CameraTraps#megadetector) model file to `detector_synchronous/api/animal_detection_api/model`.
-
+This API is intended for real-time scenarios where a small number of images are processed at a time and latency is a priority.  See documentation [here](synchronous).
 
 ### Batch processing API
 
-This API runs the detector on up to two million images in one request using [Azure Batch](https://azure.microsoft.com/en-us/services/batch/). To use this API the input images need to be copied to Azure [Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/). Please see the [user guide](./batch_processing/README.md) and get in touch with us if you&rsquo;re interested in standing up your own instane of the batch processing API. 
-
-The [batch_processing](batch_processing) folder includes the source for the API itself, tools for working with the results the API generates, and support for integrating our API output with other tools.
+This API runs the detector on lots of images (typically millions) and distributes the work over potentially many nodes using [Azure Batch](https://azure.microsoft.com/en-us/services/batch/). See documentation [here](batch_processing).
 
diff --git a/api/batch_processing/README.md b/api/batch_processing/README.md
@@ -1,6 +1,6 @@
 # Camera trap batch processing API user guide
 
-Though most of our users either use the [MegaDetector](https://github.com/Microsoft/CameraTraps#megadetector) model directly or work with us to run MegaDetector on the cloud, we also offer an open-source reference implementation for a an API that processes a large quantity of camera trap images, to support  a variety of online scenarios. The output is most helpful for separating empty from non-empty images based on a detector confidence threshold that you select, and putting bounding boxes around animals, people, and vehicles to help manual review proceed more quickly.  If you are interested in setting up an endpoint to process very small numbers of images for real-time applications (e.g. for anti-poaching applications), see the source for our [real-time camera trap image processing API](https://aiforearth.portal.azure-api.net/docs/services/ai-for-earth-camera-trap-detection-api/).
+Though most of our users either use the [MegaDetector](https://github.com/ecologize/CameraTraps#megadetector) model directly or work with us to run MegaDetector on the cloud, we also offer an open-source reference implementation for a an API that processes a large quantity of camera trap images, to support  a variety of online scenarios. The output is most helpful for separating empty from non-empty images based on a detector confidence threshold that you select, and putting bounding boxes around animals, people, and vehicles to help manual review proceed more quickly.  If you are interested in setting up an endpoint to process very small numbers of images for real-time applications (e.g. for anti-poaching applications), see the source for our [real-time camera trap image processing API](https://github.com/ecologize/CameraTraps/tree/main/api/synchronous).
 
 With the batch processing API, you can process a batch of up to a few million images in one request to the API. If in addition you have some images that are labeled, we can evaluate the performance of the MegaDetector on your labeled images (see [Post-processing tools](#post-processing-tools)).
 
@@ -191,7 +191,7 @@ Note that the field `Status` in the returned body is capitalized (since July 202
 
 The URL to the output file is valid for 180 days from the time the request has finished. If you neglected to retrieve them before the link expired, contact us with the RequestID and we can send the results to you. 
 
-The output file is a JSON in the format described below, last updated in February 2021 (`"format_version": "1.1"`).
+The output file is a JSON in the format described below.
 
 
 #### Batch processing API output format
@@ -205,11 +205,19 @@ Example output with both detection and classification results:
 ```json
 {
     "info": {
-        "detector": "megadetector_v3",
+        "format_version": "1.3",
+        "detector": "md_v4.1.0.pb",
         "detection_completion_time": "2019-05-22 02:12:19",
         "classifier": "ecosystem1_v2",
         "classification_completion_time": "2019-05-26 01:52:08",
-        "format_version": "1.1"
+        "detector_metadata": {
+           "megadetector_version":"v4.1.0",
+           "typical_detection_threshold":0.8,
+           "conservative_detection_threshold":0.6
+        }
+        "classifier_metadata": {
+           "typical_classification_threshold":0.75
+        }
     },
     "detection_categories": {
         "1": "animal",
@@ -225,9 +233,8 @@ Example output with both detection and classification results:
     },
     "images": [
         {
-            "file": "path/from/base/dir/image1.jpg",
-            "meta": "a string of metadata if it was available in the list at images_requested_json_sas",
-            "max_detection_conf": 0.926,
+            "file": "path/from/base/dir/image_with_animal.jpg",
+            "meta": "optional free-text metadata",
             "detections": [
                 {
                     "category": "1",
@@ -247,21 +254,32 @@ Example output with both detection and classification results:
             ]
         },
         {
-            "file": "/path/from/base/dir/image2.jpg",
+            "file": "/path/from/base/dir/empty_image.jpg",
             "meta": "",
-            "max_detection_conf": 0,
             "detections": []
         },
         {
-            "file": "/path/from/base/dir2/corrupted.jpg",
+            "file": "/path/from/base/dir2/corrupted_image.jpg",
             "failure": "Failure image access"
         }
     ]
 }
 ```
 
-A full output example computed on the Snapshot Serengeti data can be found [here](http://dolphinvm.westus2.cloudapp.azure.com/data/snapshot_serengeti/serengeti_val_detections_from_pkl_MDv1_20190528_w_classifications.json).
+##### Model metadata
+
+The 'detector' field (within the 'info' field) specifies the filename of the detector model that produced this results file.  It was omitted in old files generated with run_detector_batch.py, so with extremely high probability, if this field is not present, you can assume the file was generated with MegaDetector v4.
+
+In newer files, this should contain the filename (base name only) of the model file, which typically will be one of:
 
+* megadetector_v4.1 (MegaDetector v4, run via the batch API) 
+* md_v4.1.0.pb (MegaDetector v4, run locally) 
+* md_v5a.0.0.pt (MegaDetector v5a) 
+* md_v5b.0.0.pt (MegaDetector v5b) 
+
+This string is used by some tools to choose appropriate default confidence values, which depend on the model version.  If you change the name of the MegaDetector file, you will break this convention, and YMMV.
+
+The "detector_metadata" and "classifier_metadata" fields are also optionally added as of format version 1.2.  These currently contain useful default confidence values for downstream tools (particularly Timelapse), but we strongly recommend against blindly trusting these defaults; always explore your data before choosing a confidence threshold, as the optimal value can vary widely.
 
 ##### Detector outputs
 
@@ -279,8 +297,6 @@ Detection categories not listed here are allowed by this format specification, b
 
 When the detector model detects no animal (or person or vehicle), the confidence `conf` is shown as 0.0 (not confident that there is an object of interest) and the `detections` field is an empty list.
 
-All detections above the confidence threshold of 0.1 are recorded in the output file.
-
 
 ##### Classifier outputs
 

diff --git a/api/batch_processing/api_core/README.md b/api/batch_processing/api_core/README.md
@@ -3,25 +3,29 @@
 
 ## Build the Docker image for Batch node pools
 
-We need to build a Docker image with the necessary packages (mainly TensorFlow) to run the scoring script. Azure Batch will pull this image from a private container registry, which needs to be in the same region as the Batch account.
+We need to build a Docker image with the necessary packages (mainly PyTorch) to run the scoring script. Azure Batch will pull this image from a private container registry, which needs to be in the same region as the Batch account.
 
 Navigate to the subdirectory `batch_service` (otherwise you need to specify the Docker context).
 
+```
+cd batch_service
+```
+
 Build the image from the Dockerfile in this folder:
 ```commandline
-export IMAGE_NAME=zooniversecameratraps.azurecr.io/tensorflow:1.14.0-gpu-py3
+export IMAGE_NAME=zooniversecameratraps.azurecr.io/pytorch:2.1.0-cuda12.1-cudnn8-runtime
 export REGISTRY_NAME=zooniversecameratraps
 sudo docker image build --rm --tag $IMAGE_NAME --file ./Dockerfile .
 ```
 
-Test that TensorFlow can use the GPU in an interactive Python session:
+Test that PyTorch can use the Cuda in an interactive Python session:
 ```commandline
 sudo docker run --gpus all -it --rm $IMAGE_NAME /bin/bash
 
 python
-import tensorflow as tf
-print('tensorflow version:', tf.__version__)
-print('tf.test.is_gpu_available:', tf.test.is_gpu_available())
+import torch
+print('pytorch version:', torch.__version__)
+print('torch.cuda.is_available:', torch.cuda.is_available())
 quit()
 ```
 You can now exit/stop the container.
@@ -42,11 +46,11 @@ Follow the `examples/create_batch_pool.ipynb` notebook in the PR at [create_batc
 
 ## Upload the Megadetector model for use in the Batch node pool
 
-The TF `.pb` model file needs to be available for the Node Batch pool VMs, via mounted blob containers from the newly setup storage accounts in step above.
+MegaDetector V5 is a PT `.pt` model and needs to be available for the Node Batch pool VMs, via mounted blob containers from the newly setup storage accounts in step above.
 
-Download the `v4` version of the model `.pb` file from the download links in [megadetector.md](../../../megadetector.md) and upload to the correct paths in the `models` storage account setup in batch node setup above.
+Download the `v5a` version of the model `.pt` file from the download links in [megadetector.md](../../../megadetector.md) and upload to the correct paths in the `models` storage account setup in batch node setup above.
 
-Location to upload can be found in [score.py](batch_service/score.py#L414) noting the use of env.DETECTOR_REL_PATH which is setup by MD_VERSIONS_TO_REL_PATH in [server_api_config.py](server_api_config.py#L51). Default for `v4.1` model is `models/megadetector_copies/megadetector_v4_1/md_v4.1.0.pb`
+Location to upload can be found in [score_v5.py](batch_service/score_v5.py#246) noting the use of env.DETECTOR_REL_PATH which is setup by MD_VERSIONS_TO_REL_PATH in [server_api_config.py](server_api_config.py#L51). Default for `v5a` model is `models/megadetector_copies/megadetector_v5a/md_v5a.0.0.pt`
 
 ## Flask app
 

diff --git a/api/batch_processing/api_core/batch_service/Dockerfile b/api/batch_processing/api_core/batch_service/Dockerfile
@@ -1,5 +1,14 @@
-FROM tensorflow/tensorflow:1.14.0-gpu-py3
+FROM pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
 # Python version is 3.6.8
+RUN apt-get update && apt-get -y upgrade && \
+    apt-get install --no-install-recommends -y \
+    git \
+    libxext6 \
+    libglib2.0-0 \
+    libgl1 
 
+RUN git clone https://github.com/ultralytics/yolov5/
+
+RUN pip install ultralytics-yolov5 yolov5 ultralytics
 RUN pip install --upgrade pip
-RUN pip install azure-storage-blob==12.7.1 pillow numpy requests
+RUN pip install azure-storage-blob==12.7.1 pillow numpy requests jsonpickle