-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run GitHub CI on MacOS #598
base: dev
Are you sure you want to change the base?
Conversation
7bc1e6d
to
da6fb8a
Compare
da6fb8a
to
68bfbdb
Compare
ab31b8e
to
e818a24
Compare
Thank you for spending the time working on this. I do think there is value in this, however I think it should be changed slightly. |
Sure, but my best guess is that GitHub runners don't have / can't use MPS! bmillwood#3 shows that
In fact it may be the other way around -- https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories only explicitly gives M1 as the CPU for |
I reviewed the results you referenced, and I have a few thoughts here. The tests that you ran were all done on python 3.8, 3.9, and 3.10. After looking around in the code a bit further, I realized that TransformerLens probably only supports arm Macs in python 3.11. When the library selects what device to use, it only selects mps, if the installed torch version is over 2.0. Torch was release in March 2023, which was about half a year after python 3.11 was released. I know that at least the first year of arm Macs had very rocky support for a huge range of things, so the possibility of incompatibilities of python+torch+mps increases greatly with every version you go backwards from 3.11. Perhaps the issues with the runners failing on order versions could be somehow related to this? There are still a lot of incompatibilities of certain features of torch with MPS. It is improving rapidly, but seeing that the runner failed on older versions of python is not really all that surprising. |
This on the theory that it's only MPS specifically that doesn't work, and MacOS 13 maybe (?) doesn't have MPS
e818a24
to
7328ebd
Compare
I don't think Python version affects torch version in the way that you imply. Even the latest stable version of PyTorch (2.3.0) is still compatible with Python versions all the way back to 3.8. The torch version is pinned in Nevertheless, I rebased and added 3.11.
|
Yeah I did some digging earlier in the day as well, and as you have deduced it seems like MPS is simply not possible at this time. I am not sure if you have already seen this comment actions/runner-images#9254 (comment), but it seems like it's an issue that has been brought to the attention of GitHub. I opened an issue for it actions/runner-images#9918, so hopefully someone at least looks at it. I don't understand why would have not made it available. Regardless, for our purposes, I think we should just leave this until it is added. There really isn't any reason to run this on Mac OS CPU devices, since the possibilities of having issues there different from Ubuntu are incredibly slim. There is a ton of value if we can get this running on MPS though. Unfortunately, we are just going to have to wait. Maybe with macOS 15 next month, they will allow MPS to be used finally. Thank you for putting the time into experimenting with this, and fingers crossed that we will be able to close this out and merge it soon. |
Description
Full disclosure, I intended to open this PR against my own fork and accidentally opened it here. Having done so, I after-the-fact said "well, why not see if folks want it anyway?" and here we are.
While looking into something that was failing on Mac MPS hardware, I decided to try to set up Mac CI, since I don't have an MPS device myself.
I think this PR does not solve my problem, because the GitHub Action runners appear not to use MPS. The
macos-latest
runner fails CI with this kind of message:Notice in particular the numbers very strongly suggest there should be enough memory for this allocation. I think what's happening is that MPS is not actually available on GitHub Actions runners (this is rumoured e.g. in pytorch/pytorch#111449 (comment) but I don't see documentation about this from GitHub themselves), yet somehow the code tries to use MPS anyway.
This PR uses the
macos-13
runner, which for whatever reason works just fine. Apple's docs suggest MPS has been available since macOS 12.3, but bmillwood#3 suggests the runners don't have it. Possibly they're just better at noticing they don't have it, and using a different code path?Therefore there are two key questions raised by this PR:
macos-13
runner works whilemacos-latest
doesn't, and are we going to be able to upgrade it some day in the future, or are we eventually going to have to drop it as it ages out of existence?Regardless of whether these tests are good proxies for Mac users, you could reasonably object that this isn't worth the extra time / complexity added to CI runs. If that's your concern, and you'd be mollified by running the macos tests on 3.10 only, let me know and I can make that change.
This PR also currently doesn't run the notebook tests on macos, since the pandoc install step would need updating, and I thought I might as well take temperature on the change as a whole before making any more efforts in that direction.
Type of change
Screenshots
I think the PR speaks for itself in this regard.
Checklist: