List of small components written in python, that helps you to ease the task.
- Dataset Splitter - Splits Dataset in Train, Test and Validation Dataset Randomly (80:10:10) ratio
- CSV Generator - Generates a CSV file from Pascal VOC dataset.
- TFRecord Generator - Generates TFRecord from CSV file of Pascal VOC dataset.
Steps to split the PASCAL VOC dataset in Colab.
Sample dataset structure would look like:
images.zip
-- img1.jpg
-- img1.xml
-- img2.jpg
-- img2.xml
...
- Make the image dir
!mkdir /content/images
- (Optional) Unzip if your dataset is in zip compressed folder. (Note: If you have unzipped dataset, make sure all your files are in '/images/all' directory)
!unzip -q images.zip -d /content/images/all
- Make Directories for Test, Train and Validation
!mkdir /content/images/train; mkdir /content/images/test; mkdir /content/images/validation
- Import the DatasetSplitter.py file
!wget https://raw.githubusercontent.com/MasoomBadi/DatasetHelper/main/DatasetSplitter.py
- Run the python file.
!python DatasetSplitter.py
Once you have your dataset ready, execute these script to generate the CSV file from Pascal VOC dataset and create a TFRecord file from it.
- (Optional) If you don't yet have the labelmap.txt file ready, you can run script from below to create it.
labelmap.txt contains the list of classes that are used in your dataset, each in a new line.
%%bash
cat <<EOF >> /content/labelmap.txt
Class1
Class2
Class3
Class4
EOF
- Get the scripts.
!wget https://raw.githubusercontent.com/MasoomBadi/DatasetHelper/main/CSVGenerator.py
!wget https://raw.githubusercontent.com/MasoomBadi/DatasetHelper/main/TFRecordGenerator.py
- Run the files to create a TFRecord.
!python3 CSVGenerator.py
!python3 TFRecordGenerator.py --csv_input=images/train_labels.csv --labelmap=labelmap.txt --image_dir=images/train --output_path=train.tfrecord
!python3 TFRecordGenerator.py --csv_input=images/validation_labels.csv --labelmap=labelmap.txt --image_dir=images/validation --output_path=val.tfrecord