Skip to content

Update example commands in README #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 27 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,22 +44,41 @@ conda env create -n rapids --solver=libmamba -f envs/conda-env-rapids.yml

### Benchmarks Runner

How to run benchmarks using the `sklbench` module and a specific configuration:
How to run sklearnex benchmarks on CPU using the `sklbench` module and regular scope of benchmarking cases:

```bash
python -m sklbench --config configs/sklearn_example.json
python -m sklbench --configs configs/regular \
--filters algorithm:library=sklearnex algorithm:device=cpu \
--environment-name ENV_NAME --result-file result_sklearnex_cpu_regular.json
# Same command with shorter argument aliases for typing convenience
python -m sklbench -c configs/regular \
-f algorithm:library=sklearnex algorithm:device=cpu \
-e ENV_NAME -r result_sklearnex_cpu_regular.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ENV_NAME is very unclear to me.

The docs say:

Environment name to use instead of it's configuration hash.

But that doesn't tell me what the environment is or what is is used for. Should ENV_NAME be substituted with something else? Is it required?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

```

The default output is a file with JSON-formatted results of benchmarking cases. To generate a better human-readable report, use the following command:

```bash
python -m sklbench --config configs/sklearn_example.json --report
python -m sklbench -c configs/regular \
-f algorithm:library=sklearnex algorithm:device=cpu \
-e ENV_NAME -r result_sklearnex_cpu_regular.json \
--report --report-file result-sklearnex-cpu-regular.xlsx
```

By default, output and report file paths are `result.json` and `report.xlsx`. To specify custom file paths, run:
In order to optimize datasets downloading and get more verbose output, use `--prefetch-datasets` and `-l INFO` arguments:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to provide info about the requirements for kaggle data.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use --prefetch-datasets as the default recommendation for regular runs?
Does it have any drawbacks?

```bash
python -m sklbench -c configs/regular \
-f algorithm:library=sklearnex algorithm:device=cpu \
-e ENV_NAME -r result_sklearnex_cpu_regular.json \
--report --report-file report-sklearnex-cpu-regular.xlsx \
--prefetch-datasets -l INFO
```

To select measurement for few algorithms only, extend filter (`-f`) argument:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To select measurement for few algorithms only, extend filter (`-f`) argument:
To run benchmarks for a for few algorithms only, extend filter (`-f`) argument:

```bash
python -m sklbench --config configs/sklearn_example.json --report --result-file result_example.json --report-file report_example.xlsx
# ...
-f algorithm:library=sklearnex algorithm:device=cpu algorithm:estimator=PCA,KMeans
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it could mention that these algorithms need to be in the config JSON.

# ...
```

For a description of all benchmarks runner arguments, refer to [documentation](sklbench/runner/README.md#arguments).
Expand All @@ -69,7 +88,9 @@ For a description of all benchmarks runner arguments, refer to [documentation](s
To combine raw result files gathered from different environments, call the report generator:

```bash
python -m sklbench.report --result-files result_1.json result_2.json --report-file report_example.xlsx
python -m sklbench.report \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is missing the flag that's needed when mixing sklearn and sklearnex.

--result-files result_1.json result_2.json \
--report-file report_example.xlsx
```

For a description of all report generator arguments, refer to [documentation](sklbench/report/README.md#arguments).
Expand Down