Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Union complexF type in cor #65

Conversation

josemanuel22
Copy link
Contributor

@josemanuel22 josemanuel22 commented Jan 16, 2021

Solves ticket #63

@codecov
Copy link

codecov bot commented Jan 16, 2021

Codecov Report

Merging #65 (f8b85ef) into master (ba90d86) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #65   +/-   ##
=======================================
  Coverage   97.37%   97.37%           
=======================================
  Files           1        1           
  Lines         381      381           
=======================================
  Hits          371      371           
  Misses         10       10           
Impacted Files Coverage Δ
src/Statistics.jl 97.37% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba90d86...f8b85ef. Read the comment docs.

@josemanuel22 josemanuel22 changed the title adding Union complexF type in cor adding Union complexF type in cor #63 Jan 16, 2021
@josemanuel22 josemanuel22 changed the title adding Union complexF type in cor #63 adding Union complexF type in cor Jan 16, 2021
@nalimilan
Copy link
Member

Do you really mean #63? I don't see why we would use complex numbers to fix that issue.

@josemanuel22
Copy link
Contributor Author

josemanuel22 commented Jan 16, 2021

Do you really mean #63? I don't see why we would use complex numbers to fix that issue.

I'm sorry I meant #63, I don't know how to change it. I used complex numbers since there is a test that uses them and it failed me with floats:

@testset "variance of complex arrays (#13309)" begin
    z = rand(ComplexF64, 10)
    @test var(z) ≈ invoke(var, Tuple{Any}, z) ≈ cov(z) ≈ var(z,dims=1)[1] ≈ sum(abs2, z .- mean(z))/9
    @test isa(var(z), Float64)
    @test isa(invoke(var, Tuple{Any}, z), Float64)
    @test isa(cov(z), Float64)
    @test isa(var(z,dims=1), Vector{Float64})
    @test varm(z, 0.0) ≈ invoke(varm, Tuple{Any,Float64}, z, 0.0) ≈ sum(abs2, z)/9
    @test isa(varm(z, 0.0), Float64)
    @test isa(invoke(varm, Tuple{Any,Float64}, z, 0.0), Float64)
    @test cor(z) === 1.0
    v = varm([1.0+2.0im], 0; corrected = false)
    @test v ≈ 5
    @test isa(v, Float64)
end

@nalimilan
Copy link
Member

What part of #63 are you trying to address exactly and what's the intended result? I think what @bkamins meant there is that cor([missing]) and cor(Union{Float64, Missing}[missing]) should give the same result.

@josemanuel22
Copy link
Contributor Author

josemanuel22 commented Jan 16, 2021

What part of #63 are you trying to address exactly and what's the intended result? I think what @bkamins meant there is that cor([missing]) and cor(Union{Float64, Missing}[missing]) should give the same result.

Sure enough, that's what I understood, the result should be the same and equal to 1.0. That's what you get with the Union. This also makes cor(Any [missing]) equal to one and cor(Any[1]) also equal to one.

@bkamins
Copy link
Contributor

bkamins commented Jan 16, 2021

This code does not seem to produce a correct result if you e.g. pass a vector of BigFloat to cor.

@josemanuel22
Copy link
Contributor Author

Can you please pass me the test with bigFloat that has failed?

@bkamins
Copy link
Contributor

bkamins commented Jan 17, 2021

julia> x = BigFloat.([1,2,3])
3-element Array{BigFloat,1}:
 1.0
 2.0
 3.0

julia> typeof(one(real(eltype(convert(AbstractVector{Union{ComplexF64, Missing}}, x)))))
Float64

and should be BigFloat.

Same with e.g.:

julia> x = Float16.([1,2,3])
3-element Array{Float16,1}:
 1.0
 2.0
 3.0

julia> typeof(one(real(eltype(convert(AbstractVector{Union{ComplexF64, Missing}}, x)))))
Float64

and again we get a wrong type.

(note that what you propose is not only "conceptually" wrong, but it is also breaking)

Additionally your approach allocates, which should be avoided.

@bkamins
Copy link
Contributor

bkamins commented Jan 17, 2021

If you would like to go forward with this PR I would recommend:

  • first wait till https://github.com/JuliaLang/Statistics.jl/pull/61 is merged; then rebase your branch
  • on top of change real to float in cor of a single collection #61 check if T is missing and special case it; otherwise check if nonmissingtype(T) is concrete, in this case use the current path; if nonmissingtype(T) is not concrete then first a common type of elements of the vector using promote_type should be computed, and then the analysis as above (checking if it is missing and taking nonmissingtype steps should be done for it)

@nalimilan
Copy link
Member

@josemanuel22 More fundamentally, as you can see I marked #63 with the 2.0 label as it would change the existing behavior, so we can't do that until Julia 2.0 (or at least Statistics 2.0). But even then I'm not sure what would be the correct solution (we might even drop that method which doesn't sound too useful).

@nalimilan
Copy link
Member

on top of #61 check if T is missing and special case it; otherwise check if nonmissingtype(T) is concrete, in this case use the current path; if nonmissingtype(T) is not concrete then first a common type of elements of the vector using promote_type should be computed, and then the analysis as above (checking if it is missing and taking nonmissingtype steps should be done for it)

Ah I hadn't seen your new posts. Yeah that would probably work. But it is really worth it? Going over all elements is going to be very costly just to return the number one of the right type... Is that function really useful? I used it at JuliaStats/StatsBase.jl#627 but maybe we can find another approach.

@bkamins
Copy link
Contributor

bkamins commented Jan 17, 2021

But it is really worth it?

I agree. I just say what would be the "most correct way to do it". Alternatively we could decide to throw an error. This is something for 2.0 release anyway as you have commented.

@josemanuel22
Copy link
Contributor Author

Perfect, thanks a lot for the help. It will then be looked at later.

@josemanuel22 josemanuel22 deleted the inconsistencies_in_cor_calculation branch January 19, 2021 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants