Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loader optimizations #293

Merged
merged 4 commits into from
Sep 6, 2021
Merged

Data loader optimizations #293

merged 4 commits into from
Sep 6, 2021

Conversation

turian
Copy link
Contributor

@turian turian commented Sep 5, 2021

A handful of optimizations that load data more quickly

@turian turian changed the title [WIP] Allow in_memory Data loader optimizations Sep 5, 2021
# For timestamp (event) embedding tasks,
# the metadata for each instance is {filename: , timestamp: }.
if self.embedding_type == "event":
if self.embedding_type == "event" and metadata:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be for a later issue, but I'm finding myself scratching my head a bit trying to remember how the metadata works here for event embeddings as well as the labels. Would be good to include in the docstring a bit of info on why we need metadata for event and how that is structured / should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a bit of a headscratcher.

shuffle=False,
pin_memory=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any consideration with the metadata? With this I think the embeddings and labels will be transferred to CUDA, but the metadata won't (I think b/c they aren't tensors). I think it will be fine, just curious if there are any gotchas there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turian turian merged commit c2c0124 into main Sep 6, 2021
@turian turian deleted the splits branch September 6, 2021 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants