-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Triton Error [CUDA]: invalid argument #237
Comments
Hi @sameerreddy13, thanks for bringing this to our attention! Can you help us understand how your environment is set up? We're wondering if somehow the wrong |
Yep! I can't use docker in my env but other than that this is the environment. 8x Nvidia A40
Pytorch:
Python:
Conda Env: |
@sameerreddy13 could you see if this happens when using python 3.9.x? We haven't tested 3.10 much |
Still happens with python 3.9.16 |
So I upgraded to torch2.0 and dropped in the new flash attention module and it works without a hitch. It might be simpler to switch to this for BERT atleast. I can make a PR for this if this is wanted. This was after spending a while trying to debug the kernel issue. |
@sameerreddy13 I am having the same issue, would you be able to share the diffs for "dropped in the new flash attention module and it works without a hitch"? |
hey @eldarkurtic here is the main diff. You can drop the if condition (I put the or True and forgot to remove it on my fork)
and a file to just test if flash attention is avaliable in your env
|
Thanks a lot @sameerreddy13 ! |
In case it's useful to anyone, I'm strangely getting this error when running the glue test script FWIW I'm on tesla gpus since that's what I can access at this moment. EDIT: I believe I'm getting this w/ fp32. In # Basic run configuration, additional details will be added to this name for each GLUE task, and each random seed
base_run_name: glue-finetuning-benchmark-test
default_seed: 1111
precision: fp32 |
Update: on a slightly different environment (listed below, needed to make some tweaks to build apex on my system) I'm getting this issue using My hunch is that this issue is related to triton-lang/triton#1512. relevant pieces of environment for this posttorch: 1.12.1, compiled for CUDA 11.3 |
Yep looks like a similar error to what I'm seeing.. still haven't been able to resolve mine if anybody has any advice would be much appreciated (triton-lang/triton#1512) |
I have a feeling this is some combination of cuda/triton/torch versions...but is not something we have encountered at all. The MosaicBERT work was done mostly on torch 1.12.1+cu116 I believe. And we've since run the code with torch1.13.1+cu117. @mitchellnw What does your environment look like? |
Thanks yea you're probably right. I'm on torch2.0.0+cu118 with triton2.0.0. I'll try torch1.13.1+cu117 and see if that works. |
We've also been using triton |
thanks, really appreciate it! i'll mess around with versions (probably later this week) and see if that fixes things |
I actually tried switching my triton version to 2.0.0 (with everything else in the environment I listed above the same if I recall correctly) and got a completely different error ('invalid source', essentially the same error appearing at triton-lang/triton#1098). My guess was that the syntax of triton's dot function has changed between 2.0.0.dev20221103 and 2.0.0? |
Any news on this issue? The error does not occur when using fp32 precision. But how to fix this for bf16? |
Getting the following issue when running mosaic-bert recipe. Only with bf16, works with fp32.
The text was updated successfully, but these errors were encountered: