-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in AdaptivePool #641
Comments
I think this is expected -- I think the adaptive part of our pooling works in a slightly different way, as it was more computationally efficient for us to do it that way. (Adaptive pooling was added back in #129, although I don't see any discussion of PyTorch consistency there.) Tagging @paganpasta in case they recall any other information. Perhaps we should simply document this discrepancy? |
Going through the corresponding issue (#121), it looks like we implemented adaptive pooling differently from PyTorch infavor of (potential) speed-up. The main difference being how shapes are carved from the original tensor. Hope this helps. FWIW, I did not see "significant" difference in performance of (classification) models converted from PyTorch. |
Right! Thank you @paganpasta :) Regarding performance, I think we achieved similar performance to PyTorch because we implemented it slightly differently. If we'd matched PyTorch exactly (i.e. chosen different pooling regions), then we'd have ended up being slightly slower. If it does turn out to be possible to change things to match PyTorch, without a significant speed difference, then I'd be happy to take a PR on that. Else, I'd suggest we should probably document this difference but otherwise just accept that we do things slightly differently. |
@paganpasta yeah I got similar performance for both models (pytorch vs eqx one). I just tried porting my model to pytorch (collaborators still use torch) and a sanity test of getting the same output failed. imho a sentence in the docs will help save a couple of debug hours for lost souls and should be more than enough |
any idea how to get the same output as pytorch model output though? Can you confirm if this issue also in all the different pool layers? I just tried MaxPool1d (without the adaptive) and noticing different output for pytorch and equinox. Also, please which other layers might behave different from Pytorch due to difference in implementation? |
I'm afraid getting precise compatibility between the frameworks is pretty hard. Even for the same algorithm then one sometimes gets meaningful differences in output, just due to the differences in the underlying compilers etc. I think adaptive pooling and batch norm are the two places where we do something algorithmically different. Although we do have a PR out to change the latter, see #675. (That I really need to review...) For something like max pool I thought we were the same. There's only one way to compute a maximum, after all. In general we don't tend to diverge from PyTorch without a good reason, so I think most other layers should be compatible. FWIW I translate models PyTorch->Equinox (in my non-open-source actually-paid work!) and generally observe the same behaviour between the two libraries. When this is important I make sure to also include that as part of the tests for my Equinox implementation. |
The following code seems to fail
Not sure if I'm missing something here, I just stumbled upon this trying to convert an Equinox model to pytorch.
I tried going over
AdaptivePool
but couldn't really figure out the problem. Same thing happens when using 2d pooling btw. Happy to make a PR fixing the issue with some guidanceThe text was updated successfully, but these errors were encountered: