Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for HDF5 dataset and an HDF5 creation tool #1468

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

madisi98
Copy link

@madisi98 madisi98 commented Oct 4, 2020

Added retinanet-build-hdf5 entry point which allows the creation of datasets in the hdf5 format and a new option 'hdf5' to retinanet-train. This allows the dataset to be loaded in main memory the whole time and drastically reduces training times.


import h5py
import numpy as np
from tqdm import tqdm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other scripts using progressbar2 instead of tqdm. I think we should choose one of them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Already changed and pushed changes :)

Copy link
Contributor

@hgaiser hgaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, I've been meaning to try something like this for a while.

Do you have measurements in general for how much time is gained by using the HDF5 format?

return {'labels': self.labels[image_index],
'bboxes': self.bboxes[image_index]}

def compute_input_output(self, group):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You override this to remove the filtering, right? Does it have a large computational impact? I'd expect it to be minimal, in which case it would be cleaner to not override this function. Do you have a measurement for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I removed filtering because filtering happens when creating the hdf5. This process relies in the CSVGenerator class which filters the annotations already, so I considered removing that

@madisi98
Copy link
Author

madisi98 commented Oct 6, 2020

I haven’t made a lot of testing yet, since I haven’t had the time for it, but I was getting roughly 6x faster epochs. I have to say that my dataset is composed of large images (around 3000x2000) and with smaller images the speedup will be less significant.

@hgaiser
Copy link
Contributor

hgaiser commented Oct 6, 2020

I would be interested to see the differences on a more "normal" dataset like COCO. I expect the difference will be much smaller there because AFAIK the most time spent there is from anchor target generation, not data loading.

@hsahin hsahin changed the base branch from master to main June 17, 2021 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants