Skip to content

Allow buffer typevar in CuArray type #1062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 11, 2021

Conversation

DhairyaLGandhi
Copy link
Member

No description provided.

@DhairyaLGandhi
Copy link
Member Author

We'll need to guard the additional type param against < CUDA v3.4

@DhairyaLGandhi
Copy link
Member Author

Yup, that seem like its happy with CUDA tests. Thanks @maleadt !

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented Sep 6, 2021

Supersedes FluxML/Flux.jl#1704, so at least we can rid of the rrules for cpu and gpu, and the FillArrays weren't the bottleneck in that computation, hopefully that can be decided at JuliaGPU/Adapt.jl#47

@DhairyaLGandhi
Copy link
Member Author

Julia 1.3 error is unrelated to this PR

@DhairyaLGandhi
Copy link
Member Author

The buildkite error seems to be coming from ChainRules @oxinabox

@oxinabox
Copy link
Member

oxinabox commented Sep 7, 2021

Neat.
So reason this wasn't caught in our downstream dependency testing,
is that this is a bug (/missing featue) in Julia v1.5
@YingboMa fixed it in JuliaLang/julia#36030
As I understand it, ForwardDiff doesn't support Complex numbers unless you are using Julia v1.6+.

A simple demonstration is:
log(complex(ForwardDiff.Dual(1.0))), which works fine in Julia 1.6.
but is a StackOverflow (as is seen in the logs) in Julia 1.5

I am not sure where, and if, this should be worked around.

cc @mcabbott

@mcabbott
Copy link
Member

mcabbott commented Sep 7, 2021

I'm mostly puzzled why this wasn't triggered before, since the ^ gradient has contained log(complex(... for ages. The test which triggers this is skipped here https://github.com/FluxML/Zygote.jl/pull/1044/files#diff-3483c521f73b08fdb6b00f014614cc0d69f87ea7b098a24aff838cbdc812704dR25-R28 for now. Haven't tried to solve it beyond that.

@mcabbott mcabbott changed the title Add buffer typevar Allow buffer typevar in CuArray type Sep 7, 2021
@DhairyaLGandhi
Copy link
Member Author

Does seem something CR should address, but seems orthogonal to this PR. Mind if I merge @oxinabox ?

@DhairyaLGandhi
Copy link
Member Author

The other option is to drop 1.3 compatibility, but seems odd to need to do that on something that used to pass.

@DhairyaLGandhi
Copy link
Member Author

We already have the tests we need - buildkite is failing on master.

@oxinabox
Copy link
Member

oxinabox commented Sep 7, 2021

Realistically it doesn't seem like this is a major issue.

It is a weird code path that is only explored if:

  • using an old julia version (very rare)
  • using complex numbers (rare)
  • using CUDA, or I think broadcasting since much broadcasting goes through forward (not so rare)

The other option is to drop 1.3 compatibility, but seems odd to need to do that on something that used to pass.

This one fails up until 1.6.
It might be acceptable to just stop running buildkite on 1.5, since it is an unmaintained release.
And if people who are using julia 1.5 run into issues, they can just update.
I think because this is such a corner case we might not ever have that happen.

Does seem something CR should address, but seems orthogonal to this PR. Mind if I merge @oxinabox ?

I am good if @mcabbott is good.

I'm mostly puzzled why this wasn't triggered before, since the ^ gradient has contained log(complex(... for ages

Could be that Zygote was broken with CUDA for quite a while and so those tests were failing before they got that far.

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented Sep 8, 2021

Its not only restricted to 1.5 buildkite though, similar issues can be seen in 1.3 on GHA, plus I don't think 1.5 is an old enough release.

We test Zygote with CUDA in many different places, including all the layers with different losses on GPU on Flux, and even FluxBench with different modelling types. Unlikely Zygote was broken until recently where the typevar in CuArray was introduced, which this PR fixes.

@DhairyaLGandhi
Copy link
Member Author

Bump

@DhairyaLGandhi DhairyaLGandhi merged commit b33988e into FluxML:master Sep 11, 2021
@DhairyaLGandhi DhairyaLGandhi deleted the dg/cutypevar branch September 11, 2021 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants