Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support select vector index #48521

Merged
merged 2 commits into from
Sep 9, 2024

Conversation

yulongfufu
Copy link
Contributor

@yulongfufu yulongfufu commented Jul 17, 2024

Support select vector index

Fixes #46678

TODO:

  • Add BE unit tests.
  • Resolve tricky parts and hard-coded sections.
  • Implement a bypass for non-pipeline mode for temporary shutdown.
  • Support passing Tenann tuning parameters.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

@yulongfufu yulongfufu requested review from a team as code owners July 17, 2024 13:55
@wanpengfei-git wanpengfei-git requested a review from a team July 17, 2024 13:55
@yulongfufu yulongfufu changed the title [WIP]Support select vector Support select vector index Aug 12, 2024
@decster decster changed the title Support select vector index [Feature] Support select vector index Aug 13, 2024
be/src/column/schema.cpp Show resolved Hide resolved
be/src/exec/pipeline/scan/olap_chunk_source.cpp Outdated Show resolved Hide resolved

auto pattern = ColumnHelper::get_const_value<TYPE_VARCHAR>(base);

std::string pattern_str = pattern.to_string();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could replace string with string_view, then we needn't create a new std::string and the following invocations of substr needn't copy anymore.

Suggested change
std::string pattern_str = pattern.to_string();
std::string_view pattern_str = pattern.to_string();

be/src/exprs/math_functions.cpp Outdated Show resolved Hide resolved
be/src/exprs/math_functions.cpp Outdated Show resolved Hide resolved
be/src/exprs/math_functions.cpp Outdated Show resolved Hide resolved
be/src/storage/rowset/rowset_options.h Outdated Show resolved Hide resolved
be/src/storage/tablet_schema.h Outdated Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Show resolved Hide resolved
be/src/storage/rowset/segment_iterator.cpp Outdated Show resolved Hide resolved
@yulongfufu yulongfufu force-pushed the support_select_vector branch 6 times, most recently from 232597f to f1dc60b Compare August 26, 2024 17:28
decster
decster previously approved these changes Sep 6, 2024
Copy link

sonarcloud bot commented Sep 9, 2024

Copy link

github-actions bot commented Sep 9, 2024

[BE Incremental Coverage Report]

pass : 272 / 335 (81.19%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exec/olap_scan_node.cpp 0 6 00.00% [586, 587, 588, 589, 590, 591]
🔵 be/src/storage/index/vector/empty_index_reader.h 0 8 00.00% [21, 23, 25, 26, 29, 31, 34, 37]
🔵 be/src/storage/index/vector/vector_index_reader_factory.cpp 8 16 50.00% [26, 32, 33, 34, 35, 36, 37, 39]
🔵 be/src/storage/index/vector/vector_index_reader.h 1 2 50.00% [47]
🔵 be/src/storage/index/vector/tenann_index_reader.cpp 19 34 55.88% [49, 51, 52, 55, 56, 70, 71, 72, 74, 80, 81, 82, 92, 93, 94]
🔵 be/src/exprs/math_functions.cpp 51 58 87.93% [917, 921, 934, 950, 954, 981, 984]
🔵 be/src/storage/rowset/segment_iterator.cpp 118 133 88.72% [408, 409, 543, 544, 573, 612, 615, 616, 661, 668, 669, 671, 672, 673, 1514]
🔵 be/src/exec/pipeline/scan/olap_chunk_source.cpp 43 46 93.48% [241, 342, 552]
🔵 be/src/storage/rowset/rowset.cpp 2 2 100.00% []
🔵 be/src/storage/tablet_reader.cpp 4 4 100.00% []
🔵 be/src/storage/index/vector/tenann_index_reader.h 2 2 100.00% []
🔵 be/src/column/schema.cpp 3 3 100.00% []
🔵 be/src/storage/index/vector/vector_search_option.h 1 1 100.00% []
🔵 be/src/storage/index/vector/tenann/del_id_filter.h 1 1 100.00% []
🔵 be/src/storage/tablet_schema.cpp 8 8 100.00% []
🔵 be/src/column/chunk.cpp 8 8 100.00% []
🔵 be/src/storage/index/vector/tenann/del_id_filter.cpp 3 3 100.00% []

Copy link

github-actions bot commented Sep 9, 2024

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Sep 9, 2024

[FE Incremental Coverage Report]

pass : 248 / 267 (92.88%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/common/VectorSearchOptions.java 28 32 87.50% [72, 73, 108, 118]
🔵 com/starrocks/sql/optimizer/rule/transformation/RewriteToVectorPlanRule.java 140 155 90.32% [101, 177, 203, 229, 243, 244, 245, 246, 247, 249, 250, 251, 253, 254, 256]
🔵 com/starrocks/sql/optimizer/Optimizer.java 2 2 100.00% []
🔵 com/starrocks/catalog/Table.java 3 3 100.00% []
🔵 com/starrocks/sql/optimizer/operator/physical/PhysicalOlapScanOperator.java 6 6 100.00% []
🔵 com/starrocks/qe/SessionVariable.java 6 6 100.00% []
🔵 com/starrocks/sql/optimizer/rule/RuleSet.java 1 1 100.00% []
🔵 com/starrocks/planner/OlapScanNode.java 17 17 100.00% []
🔵 com/starrocks/catalog/FunctionSet.java 5 5 100.00% []
🔵 com/starrocks/sql/plan/PlanFragmentBuilder.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/OptimizerContext.java 4 4 100.00% []
🔵 com/starrocks/catalog/OlapTable.java 1 1 100.00% []
🔵 com/starrocks/sql/optimizer/rule/RuleSetType.java 2 2 100.00% []
🔵 com/starrocks/sql/optimizer/operator/logical/LogicalOlapScanOperator.java 5 5 100.00% []
🔵 com/starrocks/sql/optimizer/rule/tree/AddDecodeNodeForDictStringRule.java 2 2 100.00% []
🔵 com/starrocks/sql/optimizer/rule/RuleType.java 1 1 100.00% []
🔵 com/starrocks/sql/StatementPlanner.java 24 24 100.00% []

@kangkaisen kangkaisen merged commit 406787f into StarRocks:main Sep 9, 2024
45 of 47 checks passed
@sharfy
Copy link

sharfy commented Sep 10, 2024

@ZiheLiu
Execution Error approx_l2_distance ,without limit 1

CREATE TABLE t_test_vector_table (
id bigint(20) NOT NULL COMMENT "",
vector1 ARRAY NOT NULL COMMENT "",
INDEX index_vector1 (vector1) USING VECTOR ("metric_type" = "l2_distance", "is_vector_normed" = "false", "M" = "512", "index_type" = "hnsw", "dim"="5")
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"enable_persistent_index" = "false",
"replicated_storage" = "false",
"compression" = "LZ4"
);
insert into t_test_vector_table values(1, [1,2,3,4,5]);
insert into t_test_vector_table values(2, [4,5,6,7,8]);

select id, approx_l2_distance([1,1,1,1,1], vector1) score ,vector1 from t_test_vector_table order by score

org.jkiss.dbeaver.model.sql.DBSQLException: SQL 错误 [1064] [42000]: Internal error: [2024-09-10 17:56:24] /root/tenann/tenann/searcher/faiss_hnsw_ann_searcher.cc:304: Error: Error in virtual void faiss::IndexHNSW::search(faiss::Index::idx_t, const float*, faiss::Index::idx_t, float*, faiss::Index::idx_t*, const faiss::SearchParameters*) const at /var/local/thirdparty/src/faiss-1.7.3/faiss/IndexHNSW.cpp:291: Error: 'k > 0' failed

at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:133)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:615)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$2(SQLQueryJob.java:506)
at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:192)

@yulongfufu
Copy link
Contributor Author

yulongfufu commented Sep 10, 2024

select id, approx_l2_distance([1,1,1,1,1], vector1) score ,vector1 from t_test_vector_table order by score

@sharfy For vector index searches, it's meaningless without a limit k. However, in cases where limit k is not set, a fallback should be implemented to avoid throwing errors

HangyuanLiu pushed a commit to HangyuanLiu/starrocks that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support vector index and ANNS.
7 participants