Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SortedVector set type #38

Merged
merged 27 commits into from
May 3, 2024
Merged

Add SortedVector set type #38

merged 27 commits into from
May 3, 2024

Conversation

gdalle
Copy link
Collaborator

@gdalle gdalle commented May 3, 2024

Add a new pseudoset type SortedVector which is uniformly faster than both BitSet and Set up to 10% sparsity (see Discourse)

@codecov-commenter
Copy link

codecov-commenter commented May 3, 2024

Codecov Report

Attention: Patch coverage is 98.18182% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 73.98%. Comparing base (83cb0b0) to head (001b97f).
Report is 1 commits behind head on main.

Files Patch % Lines
src/sortedvector.jl 97.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #38      +/-   ##
==========================================
+ Coverage   72.30%   73.98%   +1.67%     
==========================================
  Files           8        9       +1     
  Lines         278      319      +41     
==========================================
+ Hits          201      236      +35     
- Misses         77       83       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/sortedvector.jl Outdated Show resolved Hide resolved
src/sortedvector.jl Show resolved Hide resolved
src/sortedvector.jl Outdated Show resolved Hide resolved
test/sortedvector.jl Outdated Show resolved Hide resolved
@gdalle gdalle mentioned this pull request May 3, 2024
src/sortedvector.jl Outdated Show resolved Hide resolved
src/sortedvector.jl Outdated Show resolved Hide resolved
@adrhill
Copy link
Owner

adrhill commented May 3, 2024

Out of curiosity: Do we need to update #4 again? 😉

@gdalle
Copy link
Collaborator Author

gdalle commented May 3, 2024

Out of curiosity: Do we need to update #4 again? 😉

Oh definitely. They've been blown so far out of the water they don't even see the water

@gdalle
Copy link
Collaborator Author

gdalle commented May 3, 2024

EDIT: we're not uniformly faster than BitSet, see https://discourse.julialang.org/t/best-data-structure-for-fast-unions-of-large-sets-of-integers/113785/33?u=gdalle

But it's okay cause the user can choose

@adrhill
Copy link
Owner

adrhill commented May 3, 2024

In runtests.jl, there are a couple of loops over different set types:

@testset "First order" begin
for S in (BitSet, Set{UInt64})
@testset "Set type $S" begin

@testset "Real-world tests" begin
include("brusselator.jl")
for S in (BitSet, Set{UInt64})
@testset "Set type $S" begin

SortedVector should be tested there as well.

And once HessianTracer is supported via distributive_merge, the same applies to HessianTracer:

@testset "Second order" begin
for S in (BitSet, Set{UInt64})
@testset "Set type $S" begin

Once this passes, we make SortedVector the DEFAULT_SET_TYPE in src/pattern.jl.

@gdalle
Copy link
Collaborator Author

gdalle commented May 3, 2024

Once this passes, we make SortedVector the DEFAULT_SET_TYPE in src/pattern.jl.

Keep in mind that BitSet is still faster for denser cases. Should we be asymptotically optimal and let the user tweak for smaller sizes? I think so but it's your call

@gdalle
Copy link
Collaborator Author

gdalle commented May 3, 2024

Should I get rid of the parametric vector type and just use Vector for simplicity? It's a bit cumbersome and I don't see us using anything else in the short term (as in, next week)

@adrhill
Copy link
Owner

adrhill commented May 3, 2024

Keep in mind that BitSet is still faster for denser cases. Should we be asymptotically optimal and let the user tweak for smaller sizes?

The benchmarks on Discourse were primarily on huge input dimensionalities (10e6), whereas I feel the average user interested in sparsity probably deals with smaller problems. I think to select the default, we should benchmark on inputs of dimensionality 10 to 1000. Maybe there is a clear winner regardless of sparsity.

@gdalle
Copy link
Collaborator Author

gdalle commented May 3, 2024

Depends on #42, merge that one first

@adrhill adrhill changed the title SortedVector Add SortedVector set type May 3, 2024
@adrhill adrhill mentioned this pull request May 3, 2024
@adrhill
Copy link
Owner

adrhill commented May 3, 2024

Closes #41.

@adrhill adrhill merged commit 79cf7b9 into main May 3, 2024
2 checks passed
@adrhill adrhill deleted the gd/sortedvec branch May 3, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants