Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA ridge benchmarks #18

Merged
merged 43 commits into from
Feb 21, 2024
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
8f06185
WIP: add benchmark setup for Ridge
fcharras Jan 11, 2024
08cc70f
scikit-learn benchmark works
fcharras Jan 11, 2024
9de7154
smaller dimensions / consolidation script works
fcharras Jan 11, 2024
d1d8760
add results.csv file
fcharras Jan 12, 2024
893f67e
Increase dimensions / update results
fcharras Jan 12, 2024
ca43909
Order sheets alphabetically, ignore casing
fcharras Jan 12, 2024
2be4ee7
do not display alpha column in the spreadsheet
fcharras Jan 12, 2024
0480354
Fix solver column
fcharras Jan 12, 2024
00005f0
move data copy and objective evaluation in objective dedicated methods
fcharras Jan 12, 2024
0bcb890
Update results.csv
fcharras Jan 12, 2024
82b3ec5
update yaml
fcharras Jan 12, 2024
0ce2013
Trigger ci after fix
fcharras Jan 12, 2024
6bb0aa5
Adress review
fcharras Jan 12, 2024
5fddbac
Tune down dataset size and update results
fcharras Jan 12, 2024
ac5a5c9
Add other scikit-learn solvers
fcharras Jan 15, 2024
50a2cd5
Skip sag, saga, sparse_cg and lbfgs ~ much too slow ?
fcharras Jan 15, 2024
ca4077f
Add number of iterations
fcharras Jan 15, 2024
2f9d369
fix n_iter report
fcharras Jan 15, 2024
1cf6569
fix n_iter report
fcharras Jan 15, 2024
40fb097
max_iter and tol out of benchmark unicity key ~ enable comparing solv…
fcharras Jan 15, 2024
60556a8
update results sheet
fcharras Jan 15, 2024
c5cb0bd
linting
fcharras Jan 15, 2024
658bc82
Add cuml solver
fcharras Jan 15, 2024
cf662eb
Add cuml results
fcharras Jan 15, 2024
f3e9765
Also add warm-up for scikit-learn
fcharras Jan 15, 2024
c2f2479
Faster warmup
fcharras Jan 15, 2024
edd2e90
Add sklearn+array_api+torch solver
fcharras Jan 16, 2024
4e1a2c0
Fix sklearn+array api+torch solver, and add results
fcharras Jan 16, 2024
70ac835
Add scikit-learn-intelex solver
fcharras Jan 16, 2024
9e808cd
Add scikit-learn-intelex result even if actual solver is unkown
fcharras Jan 16, 2024
d6cc692
Skip if array_api_support for Ridge not available
fcharras Jan 16, 2024
6bcf4d0
wip: iterate on benchmark parameters, following review suggestions
fcharras Jan 17, 2024
bd78618
Nits
fcharras Jan 17, 2024
35a40d0
Fix lsqr
fcharras Jan 17, 2024
9344602
Update results (wip: cuml missing)
fcharras Jan 17, 2024
6d021d0
revert unrelated kmeans changes
fcharras Jan 17, 2024
e7a1399
Add more solvers and dimensions, work around issues with incompatible…
fcharras Jan 18, 2024
8144d23
fixup
fcharras Jan 18, 2024
d5bab3f
fixup cuml
fcharras Jan 18, 2024
d2ccf5b
fixup torch
fcharras Jan 18, 2024
5a0da41
Remove lbfgs (because it enforces positive=True)
fcharras Jan 18, 2024
7b4c52e
Update all results.
fcharras Jan 18, 2024
92aa833
Fix a bug that didn't reset bold settings after a sync refresh
fcharras Feb 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ jobs:
run: |
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv --check-csv
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv --check-csv
python ./benchmarks/ridge/consolidate_result_csv.py ./benchmarks/ridge/results.csv --check-csv
3 changes: 3 additions & 0 deletions .github/workflows/sync_benchmark_files_to_gsheet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,11 @@ jobs:
run: |
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv --check-csv
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv --check-csv
python ./benchmarks/ridge/consolidate_result_csv.py ./benchmarks/ridge/results.csv --check-csv
echo "$GSPREAD_SERVICE_ACCOUNT_AUTH_KEY" > service_account.json
python ./benchmarks/kmeans/consolidate_result_csv.py ./benchmarks/kmeans/results.csv \
--sync-to-gspread --gspread-url $GSPREAD_URL --gspread-auth-key ./service_account.json
python ./benchmarks/pca/consolidate_result_csv.py ./benchmarks/pca/results.csv \
--sync-to-gspread --gspread-url $GSPREAD_URL --gspread-auth-key ./service_account.json
python ./benchmarks/ridge/consolidate_result_csv.py ./benchmarks/ridge/results.csv \
--sync-to-gspread --gspread-url $GSPREAD_URL --gspread-auth-key ./service_account.json
2 changes: 2 additions & 0 deletions .github/workflows/test_cpu_benchmarks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,5 @@ jobs:
PYTHONPATH=$PYTHONPATH:$(realpath ../../kmeans_dpcpp/) benchopt run --no-plot -l -d Simulated_correlated_data[n_samples=1000,n_features=14]
cd ../pca
benchopt run --no-plot -l -d Simulated_correlated_data[n_samples=100,n_features=100]
cd ../ridge
benchopt run --no-plot -l -d Simulated_correlated_data[n_samples=100,n_features=100,n_targets=2]
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ hardware.
Benchmarks are currently available for the following algorithms:
- [k-means](https://github.com/soda-inria/sklearn-engine-benchmarks/tree/main/benchmarks/kmeans)
- [PCA](https://github.com/soda-inria/sklearn-engine-benchmarks/tree/main/benchmarks/pca)
- [Ridge](https://github.com/soda-inria/sklearn-engine-benchmarks/tree/main/benchmarks/pca)

Here is a (non-exhaustive) list of libraries that are compared in the benchmarks:
- [scikit-learn](https://scikit-learn.org/stable/index.html)
Expand Down
3 changes: 1 addition & 2 deletions benchmarks/kmeans/consolidate_result_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from functools import partial
from io import BytesIO
from itertools import zip_longest
from operator import attrgetter

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -403,7 +402,7 @@ def _gspread_sync(source, gspread_url, gspread_auth_key):
)
# ensure worksheets are sorted anti-alphabetically
sheet.reorder_worksheets(
sorted(sheet.worksheets(), key=attrgetter("title"), reverse=True)
sorted(sheet.worksheets(), key=lambda worksheet: worksheet.title.lower())
)

# upload all values
Expand Down
11 changes: 6 additions & 5 deletions benchmarks/kmeans/objective.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,24 +32,25 @@ def set_data(self, X, **dataset_parameters):
self.X = X
dtype = X.dtype

if self.init == "random" or self.sample_weight == "random":
rng = np.random.default_rng(self.random_state)

if self.sample_weight == "None":
sample_weight = None
elif self.sample_weight == "unary":
sample_weight = np.ones(len(X), dtype=dtype)
elif self.sample_weight == "random":
sample_weight = rng.random(size=len(X)).astype(dtype)
rng_sample_weight = np.random.default_rng(
dataset_parameters["random_state"] + 1
)
sample_weight = rng_sample_weight.random(size=len(X)).astype(dtype)
else:
raise ValueError(
"Expected 'sample_weight' parameter to be either equal to 'None', "
f"'unary' or 'random', but got {sample_weight}."
)

if self.init == "random":
rng_init = np.random.default_rng(self.random_state)
init = np.array(
rng.choice(X, self.n_clusters, replace=False), dtype=X.dtype
rng_init.choice(X, self.n_clusters, replace=False), dtype=X.dtype
)
elif self.init == "k-means++":
init = self.init
Expand Down
3 changes: 1 addition & 2 deletions benchmarks/pca/consolidate_result_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from functools import partial
from io import BytesIO
from itertools import zip_longest
from operator import attrgetter

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -401,7 +400,7 @@ def _gspread_sync(source, gspread_url, gspread_auth_key):
)
# ensure worksheets are sorted anti-alphabetically
sheet.reorder_worksheets(
sorted(sheet.worksheets(), key=attrgetter("title"), reverse=True)
sorted(sheet.worksheets(), key=lambda worksheet: worksheet.title.lower())
)

# upload all values
Expand Down
Loading