Welcome to the Ultralytics xview-yolov3
repository! This project provides the necessary code and instructions to train the powerful Ultralytics YOLOv3 object detection model on the challenging xView dataset. The primary goal is to support participants in the xView Challenge, which focuses on advancing the state-of-the-art in detecting objects within satellite imagery, a critical application of computer vision in remote sensing.
To successfully run this project, ensure your environment meets the following prerequisites:
- Python: Version 3.6 or later. You can download Python from the official Python website.
- Dependencies: Install the required packages using pip. It's recommended to use a virtual environment.
pip3 install -U -r requirements.txt
Key dependencies include:
numpy
: Essential for numerical operations in Python.scipy
: Provides algorithms for scientific and technical computing.torch
: The core PyTorch library for deep learning.opencv-python
: The OpenCV library for computer vision tasks.h5py
: Enables interaction with data stored in HDF5 format.tqdm
: A utility for displaying progress bars in loops and command-line interfaces.
Begin by downloading the necessary xView dataset files. You can obtain the data directly from the xView Challenge data download page. Ensure you have sufficient storage space, as satellite imagery datasets can be quite large.
Training the YOLOv3 model on the xView dataset involves preprocessing the data and then running the training script.
Before initiating the training process, we perform several preprocessing steps on the target labels to enhance model performance:
- Outlier Removal: Outliers in the dataset are identified and removed using sigma-rejection to clean the data.
- Anchor Generation: A new set of 30 k-means anchors are generated specifically tailored for the
c60_a30symmetric.cfg
configuration file. This process utilizes the MATLAB scriptutils/analysis.m
. The generated anchors help the model better predict bounding boxes of various sizes and aspect ratios present in the xView dataset.
Once the xView data is downloaded and placed in the expected directory, you can start training by executing the train.py
script. You will need to configure the path to your xView data within the script:
- Modify line 41 for local machine execution.
- Modify line 43 if you are training in a cloud environment like Google Colab or Kaggle.
python train.py
If your training session is interrupted, you can easily resume training from the last saved checkpoint. Use the --resume
flag as shown below:
python train.py --resume 1
The script will automatically load the weights from the latest.pt
file located in the checkpoints/
directory and continue the training process.
During each training epoch, the system processes 8 randomly sampled 608x608 pixel chips extracted from each full-resolution image in the dataset. On hardware like an Nvidia GTX 1080 Ti, you can typically complete around 100 epochs per day.
Be mindful of overfitting, which can become a significant issue after approximately 200 epochs. Monitoring validation metrics is crucial. The best observed validation mean Average Precision (mAP) in experiments was 0.16 after 300 epochs (roughly 3 days of training), corresponding to a training mAP of 0.30.
Monitor the training progress by observing the loss plots for bounding box regression, objectness, and class confidence. These plots should ideally show decreasing trends, similar to the example below:
To improve model robustness and generalization, the datasets.py
script applies various data augmentations to the full-resolution input images during training using OpenCV. The specific augmentations and their parameters are:
Augmentation | Description |
---|---|
Translation | +/- 1% (vertical and horizontal) |
Rotation | +/- 20 degrees |
Shear | +/- 3 degrees (vertical and horizontal) |
Scale | +/- 30% |
Reflection | 50% probability (vertical and horizontal) |
HSV Saturation | +/- 50% |
HSV Intensity | +/- 50% |
Note: Augmentation is applied only during the training phase. During inference or validation, the original, unaugmented images are used. The corresponding bounding box coordinates are automatically adjusted to match the transformations applied to the images. Explore more augmentation techniques with Albumentations.
After training completes, the model checkpoints (.pt
files) containing the learned weights are saved in the checkpoints/
directory. You can use the detect.py
script to perform inference on new or existing xView images using your trained model.
For example, to run detection on the image 5.tif
from the training set using the best performing weights (best.pt
), you would run:
python detect.py --weights checkpoints/best.pt --source path/to/5.tif
The script will process the image, detect objects, draw bounding boxes, and save the output image. An example output might look like this:
If you find this repository, the associated tools, or the xView dataset useful in your research or work, please consider citing the relevant sources:
For the xView dataset itself, please refer to the citation guidelines provided on the xView Challenge website.
π€ We thrive on community contributions! Open-source projects like this benefit greatly from your input. Whether it's fixing bugs, adding features, or improving documentation, your help is valuable. Please see our Contributing Guide for more details on how to get started.
We also invite you to share your feedback through our Survey. Your insights help shape the future of Ultralytics projects.
A huge thank you π to all our contributors for making our community vibrant and innovative!
Ultralytics offers two licensing options to accommodate different needs:
- AGPL-3.0 License: Ideal for students, researchers, and enthusiasts, this OSI-approved open-source license promotes open collaboration and knowledge sharing. See the LICENSE file for full details.
- Enterprise License: Designed for commercial use, this license allows integration of Ultralytics software and AI models into commercial products and services without the open-source requirements of AGPL-3.0. If your project requires an Enterprise License, please contact us via Ultralytics Licensing.
For bug reports, feature requests, or suggestions, please use the GitHub Issues page. For general questions, discussions, and community interaction, join our Discord server!