Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer array dimensions from index expressions #175

Merged
merged 5 commits into from
Sep 28, 2023

Conversation

var77
Copy link
Collaborator

@var77 var77 commented Sep 24, 2023

Description

When index is being created via expression and dim option is not passed we can not infer the array dimensions from the table row. So by this PR we will check if the expression list is not empty when inferring the dimensions, we will evaluate the expression over the first row of the table and try to get the array dimensions from the result.

Relates to #174
Issue #162

@var77 var77 requested review from Ngalstyan4 and dqii September 24, 2023 14:36
@github-actions
Copy link

github-actions bot commented Sep 24, 2023

Benchmarks

metric old new pct change
recall (after create) 0.740 0.740 -
recall (after insert) 0.742 0.764 +2.96%
select bulk tps 218.594 228.773 +4.66%
select bulk latency (ms) 34.397 33.716 -1.98%
select bulk latency (stddev ms) 11.556 9.489 -17.89%
create latency (ms) 1570.497 1456.375 -7.27%
insert bulk tps 13.726 14.685 +6.99%
insert bulk latency (ms) 72.843 68.082 -6.54%
insert bulk latency (stddev ms) 3.065 2.094 -31.68%
disk usage (bytes) 6348800.000 6348800.000 -

Copy link
Contributor

@dqii dqii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever :)

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and fix!

I am a little worried about extracting a row from the table and passing it to an expression in index creation but we should be fine with extensive regression testing. We just need to make sure we use the APIs correctly for all kinds of corner cases.

Remaining: add tests

  • expression is a non-const function
  • expression returns e.g. 5-lengthed array on the first row, and 6-lengthed array on the second (should be easy to implement if you make the expression be the identity function and use a table with a column that has real[] type)
  • add tests from pull Add create expression test #174
  • add pgvector tests (need not implement this functionality. just make sure a sane error is thrown)
  • test when the expression does not even return an array but returns something else
  • test when the expression returns a pgvector array

scripts/extern_defined.sh Outdated Show resolved Hide resolved
src/hnsw/build.c Show resolved Hide resolved
src/hnsw/build.c Outdated Show resolved Hide resolved
src/hnsw/build.c Outdated Show resolved Hide resolved
src/hnsw/build.c Show resolved Hide resolved
@var77 var77 force-pushed the varik/infer-dims-from-expression branch 3 times, most recently from b329be4 to 1a6834d Compare September 25, 2023 11:21
@var77
Copy link
Collaborator Author

var77 commented Sep 25, 2023

Currently after creating an index with expression, we are not able to query it using <-> operator as the operator usage check fails. I have described the issue in this comment . Should we fix the issue in this PR or work on it in a different PR?

cc: @Ngalstyan4 @dqii

@var77 var77 force-pushed the varik/infer-dims-from-expression branch 2 times, most recently from 346f3d0 to 3fd8e53 Compare September 25, 2023 11:27
@var77 var77 force-pushed the varik/infer-dims-from-expression branch from 3fd8e53 to a8127fa Compare September 26, 2023 18:19
@codecov
Copy link

codecov bot commented Sep 26, 2023

Codecov Report

Merging #175 (311b985) into main (e87563a) will decrease coverage by 0.14%.
Report is 1 commits behind head on main.
The diff coverage is 86.66%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #175      +/-   ##
==========================================
- Coverage   83.06%   82.93%   -0.14%     
==========================================
  Files          17       17              
  Lines        1175     1201      +26     
  Branches      251      256       +5     
==========================================
+ Hits          976      996      +20     
- Misses         85       87       +2     
- Partials      114      118       +4     
Files Coverage Δ
src/hnsw/insert.c 82.25% <100.00%> (ø)
src/hooks/executor_start.c 89.47% <100.00%> (ø)
src/hooks/post_parse.c 88.60% <100.00%> (-1.27%) ⬇️
src/hnsw.c 80.00% <88.88%> (+0.22%) ⬆️
src/hnsw/build.c 81.11% <84.84%> (-0.30%) ⬇️

Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Feel free to merge after addressing those

src/hnsw.c Outdated Show resolved Hide resolved
src/hnsw/build.c Outdated Show resolved Hide resolved
test/sql/hnsw_create_expr.sql Show resolved Hide resolved
test/sql/hnsw_create_expr.sql Outdated Show resolved Hide resolved
@var77 var77 merged commit 4c80eca into main Sep 28, 2023
26 of 27 checks passed
@var77 var77 deleted the varik/infer-dims-from-expression branch September 28, 2023 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants