Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote corkendall (issue 634) #647

Merged
merged 23 commits into from
Feb 8, 2021
Merged

Rewrote corkendall (issue 634) #647

merged 23 commits into from
Feb 8, 2021

Conversation

PGS62
Copy link
Contributor

@PGS62 PGS62 commented Jan 26, 2021

New version of corkendall is approx 4 times faster if both arguments are vectors and 7 times faster if at least one is a matrix. See issue #634 for details. Passes all tests except against Julia nightly, which has unconnected failures (in hist.jl).

New version of corkendall is approx 4 times faster if both arguments are vectors and 7 times faster if at least one is a matrix. See issue 634 for details.
…identifiers over Unicode equivalents whenever possible"
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks mostly good, the main issue to sort out is overflow I think

src/rankcorr.jl Outdated
Comment on lines 31 to 33
#
# Kendall correlation
#
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, that was VSCode auto formatter. Fixed now.

src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
test/rankcorr.jl Outdated Show resolved Hide resolved
test/rankcorr.jl Outdated Show resolved Hide resolved
test/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
PGS62 and others added 7 commits January 31, 2021 15:55
@PGS62
Copy link
Contributor Author

PGS62 commented Feb 1, 2021

I will do another commit that incorporates all your suggestions except:

  1. Suggestions you made to avoid copies where the copy was in fact necessary.
  2. Enhancements to tests. I will look at those next.

Please don't merge the pull request just yet even if the next batch of tests pass.

Also I feel I've not been using GitHub quite as it's intended. I probably should have made more use of the "Commit Suggestion" buttons above, and I'm not sure what the "Add suggestion to batch" buttons do so I didn't use them either...

Incorporated suggestions from nalimilan. Also further changes to avoid overflow errors
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
src/rankcorr.jl Show resolved Hide resolved
src/rankcorr.jl Show resolved Hide resolved
src/rankcorr.jl Outdated Show resolved Hide resolved
PGS62 and others added 8 commits February 1, 2021 13:42
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@PGS62
Copy link
Contributor Author

PGS62 commented Feb 1, 2021

I've made changes you suggested to test/rankcorr.jl.

I don't think there's any outstanding reason to not merge this PR now. Do you agree? I've clicked all the "Resolve conversation" buttons but mysteriously some of them still read "Resolve conversation" (rather than their other state of "Unresolve conversation"). Do both parties to a conversation need to click the button?

@PGS62
Copy link
Contributor Author

PGS62 commented Feb 8, 2021

I would think that once #656 is merged in then all tests should pass for this PR as well. Any chance it can be merged in too?

@nalimilan
Copy link
Member

CI failures on nightly are only due to #656 so I'll merge. Thanks for this work! Once I'll have merged my pairwise PR I think it will be interesting to add specialized pairwise methods for corkendall to compute sortperm(x) only once for each variable. It would also be interesting to use multithreading, e.g. spawning one thread for each cell in the correlation table.

@nalimilan nalimilan merged commit 11ac5b5 into JuliaStats:master Feb 8, 2021
@PGS62
Copy link
Contributor Author

PGS62 commented Feb 8, 2021

I enjoyed working on it and learned a great deal from your feedback, for which thank you!

I think one needs to be a bit careful with threading, on which I did do some work but never got to a stage that I was happy with. My original objective was to calculate corkendall for an input matrix with 10,000 columns so the output would be a 10,000 x 10,000 symmetric matrix. I couldn't get spawning 50,000,000 threads to work at all - Julia crashed. So I also tried one thread per column of the output (seemed to work OK ) and one thread per "chunk of columns" (this approach still a work in progress).

As we briefly discussed, I also need to be able to handle missing values where to calculate the correlation between two vectors one first filters out pairs for which either vector has missing elements. I hope the sortperm trick can be tweaked to work in this case, but I haven't thought about it yet.

@nalimilan
Copy link
Member

Yes we should probably never create much more tasks than Threads.nthreads().

As we briefly discussed, I also need to be able to handle missing values where to calculate the correlation between two vectors one first filters out pairs for which either vector has missing elements. I hope the sortperm trick can be tweaked to work in this case, but I haven't thought about it yet.

At least for the listwise method at #627 it shouldn't be hard since it calls _pairwise! one views of vectors without missing entries, so everything should work automatically. When skipping missing values in a pairwise fashion the implementation would be a bit trickier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants