You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The MNIST example is too nice insofar as both the train and test set have a number of samples divisible by the batch size of 100. In general we should not assume this is the case. Our abstract model API should support some form of dynamic batch size.
I am not sure what the best approach for this is yet. However, if we assume that the model is executing on a fixed batch size and that only once per epoch will it receive a differing batch size, then when we average the loss we could multiply by a mask along the batch axis (1s where there are samples, 0s where there are not) and divide by the sum of the mask. If we do it like this, maybe it should be opt-in, since it does introduce a few extra floating point operations.
The text was updated successfully, but these errors were encountered:
The MNIST example is too nice insofar as both the train and test set have a number of samples divisible by the batch size of 100. In general we should not assume this is the case. Our abstract model API should support some form of dynamic batch size.
I am not sure what the best approach for this is yet. However, if we assume that the model is executing on a fixed batch size and that only once per epoch will it receive a differing batch size, then when we average the loss we could multiply by a mask along the batch axis (1s where there are samples, 0s where there are not) and divide by the sum of the mask. If we do it like this, maybe it should be opt-in, since it does introduce a few extra floating point operations.
The text was updated successfully, but these errors were encountered: