Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustable multibucket sizes #7

Open
tdozat opened this issue Jun 18, 2017 · 0 comments
Open

Adjustable multibucket sizes #7

tdozat opened this issue Jun 18, 2017 · 0 comments
Assignees

Comments

@tdozat
Copy link
Owner

tdozat commented Jun 18, 2017

Currently, the model raises an error if not all buckets can be filled. When training this is hardly a problem, and likewise when parsing a sequence of suitably large files. However, when the list of files to parse contains a mix of large files and small files, this causes problems--in order to parse the large files quickly without large memory consumption, you need to sort it into multiple buckets, but in order to parse files with only one or two sentences you can't use more than one or two buckets.

In order to handle a mix of large and small files, the system needs a way of setting up empty buckets.

@tdozat tdozat self-assigned this Jun 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant