-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: continuous s3 benchmarking #2355
Conversation
86f3073
to
677f504
Compare
Benchmarks: TPC-H on S3Table of Results
|
085546c
to
b5ff4a2
Compare
Deploying vortex-bench with
|
Latest commit: |
37c740d
|
Status: | ✅ Deploy successful! |
Preview URL: | https://4822acd2.vortex-bench.pages.dev |
Branch Preview URL: | https://dk-continuous-s3-benchmarkin.vortex-bench.pages.dev |
b5ff4a2
to
36aa83a
Compare
.github/workflows/bench-pr.yml
Outdated
run: | | ||
cargo run --bin tpch_benchmark --release -- \ | ||
--use-remote-data-dir s3://vortex-bench-dev/tpch-sf1/ \ | ||
--exclude-queries 15 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query 15 basically has a mismatched row count (0 instead of the correct 1) on Parquet in S3.
@@ -241,6 +253,7 @@ async fn bench_main( | |||
.zip_eq(EXPECTED_ROW_COUNTS) | |||
.enumerate() | |||
.filter(|(idx, _)| queries.as_ref().map(|q| q.contains(idx)).unwrap_or(true)) | |||
.filter(|(idx, _)| exclude_queries.as_ref().map(|excluded| !excluded.contains(idx)).unwrap_or(true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has always been missing. If you exclude a query its row count is zero which fails the check unless the expected row count is zero.
benchmarks-website/code.js
Outdated
|
||
"vortex-file-compressed": '#23d100', | ||
"vortex-file-compressed (nvme)": '#23d100', | ||
"vortex-file-compressed (s3)": '#7ad169', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The S3 colors are less saturated versions of the NVME colors.
Benchmarks: TPC-H on NVMETable of Results
|
Benchmarks: random_access |
Benchmarks: Clickbench on NVMETable of Results
|
Benchmarks: compressTable of Results
|
The TPC-H benchmark JSONs now include a "storage" field which is either "s3", "nvme", or not present. When we show PR comments, we join on the field so that we don't compare s3 to nvme. When we show benchmarks in plots, we split nvme and s3 into distinct series.
2ad6d8c
to
78204e9
Compare
@robert3005 I think you were the one who asked to be tagged. |
.github/workflows/bench-pr.yml
Outdated
run: | | ||
echo "TMPDIR=/work" >> $GITHUB_ENV | ||
|
||
- name: Run TPC-H benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this moved to a different file
let storage = match url.scheme() { | ||
"s3" => "s3", | ||
"gcs" => "gcs", | ||
"file" => "nvme", | ||
otherwise => { | ||
println!("unknown URL scheme: {}", otherwise); | ||
return ExitCode::FAILURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - there must be a nicer way, maybe supported @ "s3" | "gcs" | "file" => support.to_owned() ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want file
to become nvme
though :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatesn wanted to split out the graphs + there have been some other changes in the CI that cause a conflict, but otherwise LGTM
.github/workflows/bench-pr.yml
Outdated
@@ -218,7 +156,7 @@ jobs: | |||
run: | | |||
echo "TMPDIR=/work" >> $GITHUB_ENV | |||
|
|||
- name: Run TPC-H benchmark | |||
- name: Run benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add ${{ matrix.benchmark.id }}
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This slipped in in my continuous S3 benchmarks PR #2355.
This slipped in in my continuous S3 benchmarks PR #2355.
No description provided.