Skip to content
Han Bui edited this page Oct 2, 2024 · 1 revision

This guide explains how to set up and configure the tiling-based neural network pipeline using DepthAI. The pipeline processes video frames, splits them into tiles, runs inference on each tile using a neural network, and merges the results.

Prerequisites

  • DepthAI Python library (depthai)
  • OAK camera
  • Python 3.6+

Installation

  1. Clone the repository and install the required dependencies.

    git clone [email protected]:luxonis/nn-tiling.git
    cd nn-tiling
    pip install -r requirements.txt
  2. Downloading Files: A script in util/download.py is provided to help you download the pre-trained YOLOv5 model. Skip this if you have your own blob.

    python util/download.py

Configure the pipeline

Once the models are downloaded, you can configure and run the pipeline by adjusting the following parameters in your main.py script. Here are some key parameters to consider:

  • neural network model (path to YOLOv5 model blob you just downloaded from Drive), e.g.:

    nn_path = 'models/yolov5s_default_openvino_2021.4_6shave.blob'
  • input source

    • OAK camera
    • A prerecorded video using ReplayVideo node, e.g.:
    replay.setReplayVideoFile(Path('videos/4k_traffic.mp4'))
  • confidence threshold (filter out neural network ouput based on its confidence), e.g.:

    conf_thresh = 0.4
  • IoU threshold (Intersection over Union threshold), e.g.:

    iou_thresh = 0.4
    

    About this threshold, see this Medium post.

  • tiling settings

    • grid_size - number of tiles horizontally and vertically, e.g.:
    grid_size = (3, 2)
    • overlap - amount of overlap between tiles (default is 0.2), e.g.:
    overlap = 0.2
  • setting up the grid_matrix: 2D list that defines how the image tiles are split and optionally merged. It allows you to control how adjacent tiles are grouped together for inference by assigning the same integer to neighboring tiles.

    The grid_matrix must match the dimensions defined by the grid_size. For example, if the grid size is (3, 2) (3 tiles horizontally, 2 tiles vertically), the grid_matrix will have 3 columns and 2 rows.

    grid_matrix_no_merging = [
        [0, 1, 2],
        [3, 4, 5]
    ]
    # this matrix defines same tile setup
    grid_matrix_no_merging = [
        [0, 1, 0],
        [1, 0, 1]
    ]
    # 1 merge
    grid_matrix_with_merging_1 = [
        [0, 1, 0],
        [2, 2, 1]
    ]
    # 2 merges
    grid_matrix_with_merging_2 = [
        [3, 3, 0],
        [2, 2, 1]
    ]

    Note: the index of the tiles in the grid matrix does not matter. It does not have any meaning except for defining if its neighbour also has that integer.

  • IMG_SHAPE: set the dimension of your video input, e.g.:

    IMG_SHAPE = (3840, 2160) # 4k video
Clone this wiki locally