feat: RND-116: YOLOv8 ML Backend (#607)

Co-authored-by: Micaela Kaplan <[email protected]> Co-authored-by: caitlinwheeless <[email protected]> Co-authored-by: micaelakaplan <[email protected]>
HumanSignal · Sep 6, 2024 · 01b0516 · 01b0516
1 parent 49e8929
commit 01b0516
Show file tree

Hide file tree

Showing 53 changed files with 8,928 additions and 27 deletions.
diff --git a/README.md b/README.md
@@ -61,6 +61,8 @@ Check the **Required parameters** column to see if you need to set any additiona
 | [spacy](/label_studio_ml/examples/spacy)                                                   | NER by [SpaCy](https://spacy.io/)                                                                                                                    | ✅              | ❌                | ❌        | None                       | Set      [(see documentation)](https://spacy.io/usage/linguistic-features) |
 | [tesseract](/label_studio_ml/examples/tesseract)                                           | Interactive OCR. [Details](https://github.com/tesseract-ocr/tesseract)                                                                               | ❌              | ✅                | ❌        | None                       | Set (characters)                                                           | 
 | [watsonX](/label_studio_ml/exampels/watsonx)| LLM inference with [WatsonX](https://www.ibm.com/products/watsonx-ai) and integration with [WatsonX.data](watsonx.data)| ✅ | ✅| ❌ | None| Arbitrary|
+| [yolo](/label_studio_ml/examples/yolo)                                                     | Object detection with [YOLO](https://docs.ultralytics.com/tasks/) | ✅ | ❌ | ❌ | None | Arbitrary |
+
 # (Advanced usage) Develop your model
 
 To start developing your own ML backend, follow the instructions below.

diff --git a/label_studio_ml/examples/mmdetection-3/mmdetection.py b/label_studio_ml/examples/mmdetection-3/mmdetection.py
@@ -100,6 +100,7 @@ def build_labels_from_labeling_config(self, schema):
             for ls_label, label_attrs in self.labels_attrs.items():
                 predicted_values = label_attrs.get("predicted_values", "").split(",")
                 for predicted_value in predicted_values:
+                    predicted_value = predicted_value.strip()  # remove spaces at the beginning and at the end
                     if predicted_value:  # it shouldn't be empty (like '')
                         if predicted_value not in mmdet_labels:
                             print(

diff --git a/label_studio_ml/examples/mmdetection-3/requirements-base.txt b/label_studio_ml/examples/mmdetection-3/requirements-base.txt
@@ -1,2 +1,2 @@
 gunicorn==22.0.0
-label-studio-ml @ git+https://github.com/HumanSignal/label-studio-ml-backend.git
+label-studio-ml @ git+https://github.com/HumanSignal/label-studio-ml-backend.git@fix/rnd-117
diff --git a/label_studio_ml/examples/mmdetection-3/test_model.py b/label_studio_ml/examples/mmdetection-3/test_model.py
@@ -2,7 +2,7 @@
 
 from mmdetection import MMDetection
 
-from pytest import approx
+from label_studio_ml.utils import compare_nested_structures
 
 label_config = """
 <View>
@@ -41,22 +41,6 @@
 ]
 
 
-def compare_nested_structures(a, b, path=""):
-    """Compare two dicts or list with approx() for float values"""
-    if isinstance(a, dict) and isinstance(b, dict):
-        assert a.keys() == b.keys(), f"Keys mismatch at {path}"
-        for key in a.keys():
-            compare_nested_structures(a[key], b[key], path + "." + str(key))
-    elif isinstance(a, list) and isinstance(b, list):
-        assert len(a) == len(b), f"List size mismatch at {path}"
-        for i, (act_item, exp_item) in enumerate(zip(a, b)):
-            compare_nested_structures(act_item, exp_item, path + f"[{i}]")
-    elif isinstance(a, float) and isinstance(b, float):
-        assert a == approx(b), f"Mismatch at {path}"
-    else:
-        assert a == b, f"Mismatch at {path}"
-
-
 def test_mmdetection_model_predict():
     model = MMDetection(label_config=label_config)
     predictions = model.predict([task])

diff --git a/label_studio_ml/examples/yolo/.dockerignore b/label_studio_ml/examples/yolo/.dockerignore
@@ -0,0 +1,22 @@
+# Exclude everything
+**
+
+# Include Dockerfile and docker-compose for reference (optional, decide based on your use case)
+!Dockerfile
+!docker-compose.yml
+
+# Include Python application files
+!*.py
+!*.yaml
+!tests/*
+!control_models/*
+!models/*
+
+# Include requirements files
+!requirements*.txt
+
+# Include script
+!*.sh
+
+# Exclude specific requirements if necessary
+# requirements-test.txt (Uncomment if you decide to exclude this)
diff --git a/label_studio_ml/examples/yolo/Dockerfile b/label_studio_ml/examples/yolo/Dockerfile
@@ -0,0 +1,62 @@
+FROM pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime
+ARG DEBIAN_FRONTEND=noninteractive
+ARG TEST_ENV
+
+WORKDIR /app
+
+RUN conda update conda -y
+
+RUN --mount=type=cache,target="/var/cache/apt",sharing=locked \
+    --mount=type=cache,target="/var/lib/apt/lists",sharing=locked \
+    apt-get -y update \
+    && apt-get install -y git \
+    && apt-get install -y wget \
+    && apt-get install -y g++ freeglut3-dev build-essential libx11-dev \
+    libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev libfreeimage-dev \
+    && apt-get -y install ffmpeg libsm6 libxext6 libffi-dev python3-dev python3-pip gcc
+
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_CACHE_DIR=/.cache \
+    PORT=9090 \
+    WORKERS=2 \
+    THREADS=4 \
+    CUDA_HOME=/usr/local/cuda
+
+RUN conda install -c "nvidia/label/cuda-12.1.1" cuda -y
+ENV CUDA_HOME=/opt/conda \
+    TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0"
+
+# install base requirements
+COPY requirements-base.txt .
+RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
+    pip install -r requirements-base.txt
+
+# install model requirements
+COPY requirements.txt .
+RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
+    pip3 install -r requirements.txt
+
+# install test requirements if needed
+COPY requirements-test.txt .
+# build only when TEST_ENV="true"
+RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
+    if [ "$TEST_ENV" = "true" ]; then \
+      pip3 install -r requirements-test.txt; \
+    fi
+
+WORKDIR /app
+
+COPY . ./
+
+WORKDIR /app/models
+
+# Download the YOLO models
+RUN yolo predict model=yolov8m.pt source=/app/tests/car.jpg \
+    && yolo predict model=yolov8n.pt source=/app/tests/car.jpg \
+    && yolo predict model=yolov8n-cls.pt source=/app/tests/car.jpg \
+    && yolo predict model=yolov8n-seg.pt source=/app/tests/car.jpg
+
+WORKDIR /app
+
+CMD ["/app/start.sh"]
diff --git a/label_studio_ml/examples/yolo/README.md b/label_studio_ml/examples/yolo/README.md
diff --git a/label_studio_ml/examples/yolo/README_DEVELOP.md b/label_studio_ml/examples/yolo/README_DEVELOP.md
@@ -0,0 +1,131 @@
+```mermaid
+classDiagram
+    class ControlModel {
+        +str type
+        +ControlTag control
+        +str from_name
+        +str to_name
+        +str value
+        +YOLO model
+        +float model_score_threshold
+        +Optional[Dict[str, str]] label_map
+        +LabelStudioMLBase label_studio_ml_backend
+        +get_cached_model(path: str) YOLO
+        +create(cls, mlbackend: LabelStudioMLBase, control: ControlTag) ControlModel
+        +predict_regions(path: str) List[Dict]
+        +debug_plot(image)
+    }
+
+    class RectangleLabelsModel {
+        +predict_regions(path: str) List[Dict]
+        +create_rectangles(results, path) List[Dict]
+    }
+
+    class RectangleLabelsObbModel {
+        +predict_regions(path: str) List[Dict]
+        +create_rotated_rectangles(results, path) List[Dict]
+    }    
+    
+
+    class PolygonLabelsModel {
+        +predict_regions(path: str) List[Dict]
+        +create_polygons(results, path) List[Dict]
+    }
+    
+    class KeyPointLabelsModel {
+        +predict_regions(path: str) List[Dict]
+        +create_keypoints(results, path) List[Dict]
+    }
+
+    class ChoicesModel {
+        +predict_regions(path: str) List[Dict]
+        +create_choices(results, path) List[Dict]
+    }
+
+    class VideoRectangleModel {
+        +predict_regions(path: str) List[Dict]
+        +create_video_rectangles(results, path) List[Dict]
+        +update_tracker_params(yaml_path: str, prefix: str) str | None
+    }
+
+    ControlModel <|-- RectangleLabelsModel
+    ControlModel <|-- RectangleLabelsObbModel
+    ControlModel <|-- PolygonLabelsModel
+    ControlModel <|-- ChoicesModel
+    ControlModel <|-- KeyPointLabelsModel
+    ControlModel <|-- VideoRectangleModel
+    
+```
+
+### 1. **Architecture Overview**
+
+The architecture of the project is modular and is primarily centered around integrating YOLO-based models with Label Studio to automate the labeling of images and videos. The system is organized into several Python modules that interact with each other to perform this task. The main components of the architecture include:
+
+1. **Main YOLO Integration Module (`model.py`)**:
+   - This is the central module that connects Label Studio with YOLO models. It handles the overall process of detecting control tags from Label Studio’s configuration, running predictions on tasks, and returning the predictions in the format that Label Studio expects.
+
+2. **Control Models (`control_models/`)**:
+   - The control models are specialized modules that correspond to different annotation types in Label Studio (e.g., RectangleLabels, PolygonLabels, Choices, VideoRectangle). Each control model is responsible for handling specific types of annotations by using the YOLO model to predict the necessary regions or labels.
+
+3. **Base Control Model (`control_models/base.py`)**:
+   - This is an abstract base class that provides common functionality for all control models. It handles tasks like loading the YOLO model, caching it for efficiency, and providing a template for the predict and create methods.
+
+4. **Specific Control Models**:
+   - **RectangleLabelsModel (`control_models/rectanglelabels.py`)**: Handles bounding boxes (both simple and oriented bounding boxes) for images.
+   - **PolygonLabelsModel (`control_models/polygonlabels.py`)**: Deals with polygon annotations, typically used for segmentation tasks.
+   - **ChoicesModel (`control_models/choices.py`)**: Manages classification tasks where the model predicts one or more labels for the entire image.
+   - **KeyPointLabelsModel (`control_models/keypointlabels.py`)**: Supports keypoint annotations, where the model predicts the locations of keypoints on an image.
+   - **VideoRectangleModel (`control_models/videorectangle.py`)**: Focuses on tracking objects across video frames, generating bounding boxes for each frame.
+
+### 2. **Module Descriptions**
+
+1. **`model.py` (Main YOLO Integration Module)**:
+   - **Purpose**: This module serves as the entry point for integrating YOLO models with Label Studio. It is responsible for setting up the YOLO model, detecting which control models are needed based on the Label Studio configuration, running predictions on tasks, and returning the results in the required format.
+   - **Key Functions**:
+     - `setup()`: Initializes the YOLO model parameters.
+     - `detect_control_models()`: Scans the Label Studio configuration to determine which control models to use.
+     - `predict()`: Runs predictions on a batch of tasks and formats the results for Label Studio.
+     - `fit()`: (Not implemented) Placeholder for updating the model based on new annotations.
+
+2. **`control_models/base.py` (Base Control Model)**:
+   - **Purpose**: Provides a common interface and shared functionality for all specific control models. It includes methods for loading and caching the YOLO model, plotting results for debugging, and abstract methods that need to be implemented by subclasses.
+   - **Key Functions**:
+     - `get_cached_model()`: Retrieves a YOLO model from cache or loads it if not cached.
+     - `create()`: Factory method to instantiate a control model.
+     - `predict_regions()`: Abstract method to be implemented by subclasses to perform predictions.
+
+3. **`control_models/choices.py` (ChoicesModel)**:
+   - **Purpose**: Handles classification tasks where the model predicts one or more labels for an image. It converts the YOLO model’s classification output into Label Studio’s choices format.
+   - **Key Functions**:
+     - `create_choices()`: Processes the YOLO model’s output and maps it to the Label Studio choices format.
+
+4. **`control_models/rectanglelabels.py` (RectangleLabelsModel)**:
+   - **Purpose**: Manages the creation of bounding box annotations, both simple (axis-aligned) and oriented (rotated), from the YOLO model’s output.
+   - **Key Functions**:
+     - `create_rectangles()`: Converts the YOLO model’s bounding box predictions into Label Studio’s rectangle labels format.
+     - `create_rotated_rectangles()`: Handles oriented bounding boxes (OBB) by processing rotation angles and converting them to the required format.
+
+5. **`control_models/polygonlabels.py` (PolygonLabelsModel)**:
+   - **Purpose**: Converts segmentation masks generated by the YOLO model into polygon annotations for Label Studio. This is useful for tasks where precise boundaries around objects are required.
+   - **Key Functions**:
+     - `create_polygons()`: Transforms the YOLO model’s segmentation output into polygon annotations.
+
+6. **`control_models/keypointlabels.py` (KeyPointLabelsModel)**:
+   - **Purpose**: Supports keypoint annotations by predicting the locations of keypoints on an image using the pose YOLO model.
+   - **Key Functions**:
+     - `create_keypoints()`: Processes the YOLO model’s keypoint predictions and converts them into Label Studio’s keypoint labels format.
+
+7. **`control_models/videorectangle.py` (VideoRectangleModel)**:
+   - **Purpose**: Focuses on tracking objects across video frames, using YOLO’s tracking capabilities to generate bounding box annotations for each frame in a video sequence.
+   - **Key Functions**:
+     - `predict_regions()`: Runs YOLO’s tracking model on a video and converts the results into Label Studio’s video rectangle format.
+     - `create_video_rectangles()`: Processes the output of the tracking model to create a sequence of bounding boxes across video frames.
+     - `update_tracker_params()`: Customizes the tracking parameters based on settings in Label Studio’s configuration.
+
+### **Module Interaction**
+
+- **Workflow**: The main workflow begins with `model.py`, which reads tasks and the Label Studio configuration to detect and instantiate the appropriate control models. These control models are responsible for making predictions using the YOLO model and converting the results into a format that Label Studio can use for annotations.
+
+- **Inter-Module Communication**: Each control model inherits from `ControlModel` in `base.py`, ensuring that they all share common methods for loading the YOLO model, handling predictions, and caching. The specific control models (e.g., RectangleLabelsModel, PolygonLabelsModel) implement the abstract methods defined in `ControlModel` to provide the specialized behavior needed for different types of annotations.
+
+This modular structure allows for easy extension and modification, where new control models can be added to handle additional annotation types or new model architectures.