-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get dynamic shapes to work with Phi-3-mini-128k-instruct #1579
Comments
sorry for the slow turn-around. Got busy with some other week last couple weeks. Here's a smaller repro on this issue.
This one is coming from repetitive
being optimized by dce as
A quick WAR is to run cse before the dce pass in grad transform. |
I'm now seeing this one error out. Looks like we are missing a |
I created a histogram of all the sequence lengths we see in one dataset example: Of course, the ideal is to do this generally, but a temporary solution might be bucketing. There are two clusters at the end (near 400 and at ~480) that could be grouped into a bucket. For the lower region, we might consider more buckets. Given vectorization widths etc. my recommendation would be that we bucket every 16 elements. |
Thanks for the distribution! At minimum bucket intervals should be 128bit aligned, yes, that makes a lot of sense. On the smaller size we might do well with larger buckets than the larger size, as the perf delta won't be as large at smaller sizes. CC @IvanYashchuk since I was chatting with him this morning about bucketing and padding approaches. |
🚀 Feature
The program below fails due to the use of
cache=thunder.core.options.CACHE_OPTIONS.SYMBOLIC_VALUES
.Motivation
With NeMo, we are starting to test fine-tuning with varying sequence lengths, and thus the tensor sizes are changing every step.
Pitch
We do not give an error :-)
Alternatives
The alternative is probably to pad the tensor up to a power of two and compile that.
Additional context
cc @tfogal
The text was updated successfully, but these errors were encountered: