Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about batch size and test-set evaluation #303

Open
stevenreich47 opened this issue Feb 20, 2021 · 1 comment
Open

Question about batch size and test-set evaluation #303

stevenreich47 opened this issue Feb 20, 2021 · 1 comment

Comments

@stevenreich47
Copy link

Hi there!

I noticed something a little odd while evaluating an ensemble using baselines/cifar/ensemble.py: it seems that evaluation is only performed on the test set rounded down to a multiple of the batch size, rather than the full set. I noticed this as the numpy arrays which store the predictions have shape (9984, 10) (that script has an eff. batch size of 64, which divides 9984).

I believe that this might be the case in the other training/eval scripts as well; as I read it, the test iterator is only called for the first TEST_IMAGES // BATCH_SIZE batches, leaving a partial batch if the batch size doesn't evenly divide.

Please let me know if I'm mistaken about this. If you find this is accurate, do the reported results need to be reevaluated? If they were run with the current default effective batch size of 512, I believe 272 test examples out of 10000 were missed.

@znado
Copy link
Collaborator

znado commented May 11, 2021

Yeah, this is an unfortunate issue. The papers that were used to implement this codebase all dropped the last partial batch, so the convention was kept. That said, we would like to properly pad the last partial batch to properly do evals in the near future (we mostly run on TPUs internally and they require a fixed batch size, which makes the final partial batch nontrivial).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants