-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for HDF5 dataset and an HDF5 creation tool #1468
base: main
Are you sure you want to change the base?
Conversation
keras_retinanet/bin/build_hdf5.py
Outdated
|
||
import h5py | ||
import numpy as np | ||
from tqdm import tqdm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import progressbar |
import progressbar |
There are other scripts using progressbar2
instead of tqdm
. I think we should choose one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
Already changed and pushed changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, I've been meaning to try something like this for a while.
Do you have measurements in general for how much time is gained by using the HDF5 format?
return {'labels': self.labels[image_index], | ||
'bboxes': self.bboxes[image_index]} | ||
|
||
def compute_input_output(self, group): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You override this to remove the filtering, right? Does it have a large computational impact? I'd expect it to be minimal, in which case it would be cleaner to not override this function. Do you have a measurement for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I removed filtering because filtering happens when creating the hdf5. This process relies in the CSVGenerator class which filters the annotations already, so I considered removing that
I haven’t made a lot of testing yet, since I haven’t had the time for it, but I was getting roughly 6x faster epochs. I have to say that my dataset is composed of large images (around 3000x2000) and with smaller images the speedup will be less significant. |
I would be interested to see the differences on a more "normal" dataset like COCO. I expect the difference will be much smaller there because AFAIK the most time spent there is from anchor target generation, not data loading. |
Added retinanet-build-hdf5 entry point which allows the creation of datasets in the hdf5 format and a new option 'hdf5' to retinanet-train. This allows the dataset to be loaded in main memory the whole time and drastically reduces training times.