-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loader optimizations #293
Conversation
# For timestamp (event) embedding tasks, | ||
# the metadata for each instance is {filename: , timestamp: }. | ||
if self.embedding_type == "event": | ||
if self.embedding_type == "event" and metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be for a later issue, but I'm finding myself scratching my head a bit trying to remember how the metadata works here for event embeddings as well as the labels. Would be good to include in the docstring a bit of info on why we need metadata for event and how that is structured / should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a bit of a headscratcher.
shuffle=False, | ||
pin_memory=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any consideration with the metadata? With this I think the embeddings and labels will be transferred to CUDA, but the metadata won't (I think b/c they aren't tensors). I think it will be fine, just curious if there are any gotchas there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A handful of optimizations that load data more quickly