-
Notifications
You must be signed in to change notification settings - Fork 1.2k
tiflash: support inverted index #20266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
tiflash: support inverted index #20266
Conversation
Signed-off-by: Lloyd-Pottiger <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
倒排索引是信息检索领域常用的索引技术。它将文本划分为单个词,并构建词->文档 ID 索引,以便快速搜索确定哪些文档包含特定的词。 | ||
|
||
对于数值列(整数、时间和日期类型),我们可以简化存储从数字到其在列中位置的映射(值 → rowid)。因此,使用倒排索引,可以快速查找包含特定值的行,从而加快 WHERE 子句的处理速度。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also describe the scenario of the inverted index? As a user, I may want to know in what cases I should build inverted index and in what cases a traditional row index may be preferred. The more examples, the better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a section:
## 适用场景
数值列倒排索引在 TiFlash 中构建,支持数值、日期时间类型的 =, !=, >, >=, <, <=, in 快速过滤,在以下场景中数值列倒排索引有明显优势:
- 过滤条件过滤率高,但过滤后行数依然较多。TiFlash 批量读取性能可能优于 TiKV 索引回表。
- 查询包含 IndexMerge 或 IndexJoin 算子,但 TiKV 索引命中行数多导致性能差。将 IndexJoin 转化为 HashJoin,下推到 TiFlash 节点进行计算,利用 MPP 并行降低查询延迟。
- 查询 WHERE 子句同时包含简单等值、范围过滤条件和复杂函数过滤条件。数值列倒排索引帮忙提前过滤掉不满足简单等值、范围过滤条件的行,从而减少复杂函数过滤条件的计算量。
Signed-off-by: Lloyd-Pottiger <[email protected]>
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions (in Chinese).
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?