Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next skip #84

Merged
merged 10 commits into from
Apr 23, 2024
Merged

Next skip #84

merged 10 commits into from
Apr 23, 2024

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Apr 21, 2024

Support skipping vales with Jiter, this is specifically for apache/datafusion#7845.

The performance seems good, benchmarks show skipping values is significantly faster than not.

Running the query from apache/datafusion#7845 (comment), gives:

-- datafusion with next_value
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 17.2s

-- datafusion with next_skip
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 14.3s

-- datafusion using serde-json allocating keys to a Vec<String> then checking for the key
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 21.7s

-- datafusion return false - e.g. fastest case for a UDF
SELECT count(*) FROM records where json_contains(attributes, 'size');
  -- 6165586 in 11.8s

-- datafusion like
SELECT count(*) FROM records where attributes like '%"size":%';
  -- 6165586 in 14.5s

-- duckdb
SELECT count(*) FROM read_parquet('file.parquet') where list_contains(json_keys(attributes), 'size')
  -- 6165586 in 14.2s

Copy link

codspeed-hq bot commented Apr 21, 2024

CodSpeed Performance Report

Merging #84 will not alter performance

Comparing next_skip (bf9c6d1) with main (0ab4dd4)

Summary

✅ 59 untouched benchmarks

🆕 14 new benchmarks

Benchmarks breakdown

Benchmark main next_skip Change
🆕 big_jiter_skip N/A 107.8 ms N/A
🆕 bigints_array_jiter_skip N/A 500.4 µs N/A
🆕 floats_array_jiter_skip N/A 573.2 µs N/A
🆕 massive_ints_array_jiter_skip N/A 1.2 ms N/A
🆕 medium_response_jiter_skip N/A 73.4 µs N/A
🆕 pass1_jiter_skip N/A 53.4 µs N/A
🆕 pass2_jiter_skip N/A 5.4 µs N/A
🆕 sentence_jiter_skip N/A 6.5 µs N/A
🆕 short_numbers_jiter_skip N/A 332.7 µs N/A
🆕 string_array_jiter_skip N/A 37.1 µs N/A
🆕 true_array_jiter_skip N/A 22.8 µs N/A
🆕 true_object_jiter_skip N/A 56.2 µs N/A
🆕 unicode_jiter_skip N/A 6.7 µs N/A
🆕 x100_jiter_skip N/A 2.4 µs N/A

crates/jiter/src/number_decoder.rs Show resolved Hide resolved
crates/jiter/src/parse.rs Show resolved Hide resolved
crates/jiter/src/number_decoder.rs Show resolved Hide resolved
@samuelcolvin samuelcolvin enabled auto-merge (squash) April 23, 2024 17:48
@samuelcolvin samuelcolvin merged commit ad047bf into main Apr 23, 2024
12 checks passed
@samuelcolvin samuelcolvin deleted the next_skip branch April 23, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants