Skip to content

Commit 318e0f5

Browse files
committed
Rewrite query language refdoc
1 parent 42d9c01 commit 318e0f5

File tree

2 files changed

+187
-76
lines changed

2 files changed

+187
-76
lines changed

docs/get-started/query-language-intro.md

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,6 @@ app_name:tantivy
2323

2424
In many cases the field name can be omitted, quickwit will then use the `default_search_fields` configured for the index.
2525

26-
### Adressing structured data
27-
28-
Data stored deep inside nested data structures like `object` or `json` fields can be addressed using dots as separators in the field name.
29-
For instance, the document `{"product": {"attributes": {color": "red"}}}` is matched by
30-
```
31-
product.attributes.color:red
32-
```
33-
34-
If the keys of your object contain dots, the above syntax has some ambiguity : by default `{"k8s.component.name": "quickwit"}` will be matched by
35-
```k8s.component.name:quickwit```
36-
37-
It is possible to remove the ambiguity by setting expand_dots in the json field configuration.
38-
In that case, it will be necessary to escape the `.` in the query to match this document like this :
39-
```
40-
k8s\.component\.name:quickwit
41-
```
42-
4326
### Clauses Cheat Sheet
4427

4528
Quickwit support various types of clauses to express different kinds of conditions. Here's a quick overview of them:

docs/reference/query-language.md

Lines changed: 187 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,108 +1,236 @@
11
---
2-
title: Query language
2+
title: Query Language Reference
33
sidebar_position: 40
44
---
55

6-
Quickwit uses a query mini-language which is used by providing a `query` parameter to the search endpoints.
6+
## Pseudo-grammar
7+
8+
```
9+
query = '(' query ')'
10+
| query operator query
11+
| unary_operator query
12+
| clause
13+
14+
operator = 'AND' | 'OR'
715
8-
### Terms
16+
unary_operator = 'NOT' | '-'
17+
18+
clause = field_name ':' field_clause
19+
| defaultable_clause
20+
| '*'
21+
22+
field_clause = term | term_prefix | term_set | phrase | phrase_prefix | range | '*'
23+
defaultable_clause = term | term_prefix | term_set | phrase | phrase_prefix
24+
```
25+
---
26+
## Writing Queries
27+
### Escaping Special Characters
928

10-
The `query` is parsed into a series of terms and operators. There are two types of terms: single terms such as “tantivy” and phrases which is a group of words surrounded by double quotes such as “hello world”.
29+
Special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. Such characters can still appear in query terms, but they need to be escaped by an anti-slash `\` .
1130

12-
Multiple terms can be combined together with Boolean operators `AND, OR` to form a more complex query. By default, terms will be combined with the `AND` operator.
31+
<!-- NEED CLARIFICATION: where is escaping necessary ? non-quoted terms ? field names ?-->
1332

14-
IP addresses can be provided as IpV4 or IpV6. It is recommended to use the same format as in the indexed documents.
33+
### Allowed characters in field names
34+
<!-- NEED CLARIFICATION: this should refer to a section of the index documentation that explains allowed field names -->
1535

16-
### Fields
36+
### Addressing nested structures
1737

18-
You can specify fields to search in the query by following the syntax `field_name:term`.
38+
Data stored deep inside nested data structures like `object` or `json` fields can be addressed using dots as separators in the field name.
39+
For instance, the document `{"product": {"attributes": {color": "red"}}}` is matched by
40+
```
41+
product.attributes.color:red
42+
```
1943

20-
For example, let's assume an index that contains two fields, `title`, and `body` with `body` the default field. To search for the phrase “Barack Obama” in the title AND “president” in the body, you can enter:
44+
If the keys of your object contain dots, the above syntax has some ambiguity : by default `{"k8s.component.name": "quickwit"}` will be matched by
45+
```k8s.component.name:quickwit```
2146

47+
It is possible to remove the ambiguity by setting expand_dots in the json field configuration.
48+
In that case, it will be necessary to escape the `.` in the query to match this document like this :
2249
```
23-
title:"barack obama" AND president
50+
k8s\.component\.name:quickwit
2451
```
2552

26-
Note that a query like `title:barack obama` will find only `barack` in the title and `obama` in the default fields. If no default field has been set on the index, this will result in an error.
53+
---
54+
55+
## Structured data
56+
### Datetime
57+
Datetime values must be provided in rfc3339 format, such as `1970-01-01T00:00:00Z`
2758

28-
### Searching structures nested in documents.
59+
### IP addresses
60+
IP addresses can be provided as IPv4 or IPv6. It is recommended to search with the format used when indexing documents.
61+
There is no support for searching for a range of IP using CIDR notation, but you can use normal range queries.
2962

30-
Quickwit is designed to index structured data.
31-
If you search into some object nested into your document, whether it is an `object`, a `json` object, or whether it was caught through the `dynamic` mode, the query language is the same. You simply need to chain the different steps to reach your value from the root of the document.
63+
---
3264

33-
For instance, the document `{"product": {"attributes": {color": "red"}}}` is returned if you query `product.attributes.color:red`.
65+
## Types of clauses
3466

35-
If a dot `.` exists in one of the key of your object, the above syntax has some ambiguity.
36-
For instance, by default, `{"k8s.component.name": "quickwit"}` will be matched by `k8s.component.name:quickwit`.
67+
### Term `field:term`
68+
```
69+
term: term_char+
70+
```
3771

38-
It is possible to remove the ambiguity by setting `expand_dots` in the json field configuration.
39-
In that case, it will be necessary to escape the `.` in the query to match this document.
72+
Matches documents if the targeted field contains a token equal to the provided term.
4073

41-
For instance, the above document will match the query `k8s\.component\.name:quickwit`.
74+
`field:value` will match any document where the field 'field' has a token 'value'.
4275

43-
### Boolean Operators
76+
### Term Prefix `field:prefix*`
77+
```
78+
term_prefix: term '*'
79+
```
4480

45-
Quickwit supports `AND`, `+`, `OR`, `NOT` and `-` as Boolean operators (case sensitive). By default, the `AND` is chosen, this means that if you omit it in a query like `title:"barack obama" president` Quickwit will interpret the query as `title:"barack obama" AND president`.
81+
Matches documents if the targeted field contains a token which starts with the provided value.
4682

47-
### Grouping boolean operators
83+
`field:quick*` will match any document where the field 'field' has a token like `quickwit` or `quickstart`, but not `qui` or `abcd`.
4884

49-
Quickwit supports parenthesis to group multiple clauses:
5085

86+
### Term set `field: IN [a b c]`
5187
```
52-
(color:red OR color:green) AND size:large
88+
term_set = 'IN' '[' term_list ']'
89+
term_list = term_list term
90+
| term
5391
```
92+
Matches if the document contains any of the tokens provided.
93+
94+
###### Examples
95+
`field: IN [ab cd]` will match 'ab' or 'cd', but nothing else.
96+
97+
###### Perfomance Note
98+
This is a lot like writing `field:ab OR field:cd`. When there are only a handful of terms to search for, using ORs is usually faster.
99+
When there are many values to match, a term set query can become more efficient.
54100

55-
### Slop Operator
101+
<!-- previously a field was required. It looks like it may no longer be the case -->
56102

57-
Quickwit also supports phrase queries with a slop parameter using the slop operator `~` followed by the value of the slop.
58-
The query will match phrases if its terms are separated by slop terms at most.
103+
### Phrase `field:"sequence of words"`
104+
```
105+
phrase = phrase_string
106+
| phrase_string slop
107+
phrase_string = '"' phrase_char '"'
108+
slop = '~' [01-9]+
109+
110+
```
59111

60-
The slop can be considered a budget between all terms. E.g. `"A B C"~1` matches `"A X B C"`, `"A B X C"`, but not `"A X B X C"`.
112+
Matches if the field contains the sequence of token provided. `field:"looks good to me"` will match any document containing that sequence of tokens.
113+
The field must have been configured with `record: position` when indexing.
61114

62-
Transposition costs 2, e.g. `"A B"~1` will not match `"B A"` but it would with `"A B"~2`.
115+
###### Slop operator
116+
Is is also possible to add a slop, which allow matching a sequence with some distance. For instance `"looks to me"~1` will match "looks good to me", but not "looks very good to me".
117+
Transposition costs 2, e.g. `"A B"~1` will not match `"B A"` but it would with `"A B"~2`.
63118
Transposition is not a special case, in the example above A is moved 1 position and B is moved 1 position, so the slop is 2.
64119

65-
:::caution
66-
Slop queries can only be used on field indexed with the [record option](./../configuration/index-config.md#text-type) set to `position` value.
67-
:::
120+
### Phrase Prefix `field:"finish this phr"*`
121+
```
122+
phrase_prefix = phrase '*'
123+
```
124+
125+
Matches if the field contains the sequence of token provided, where the last token in the query may be only a prefix of the token in the document.
126+
127+
The field must have been configured with `record: position` when indexing.
128+
129+
There is no slop for phrase prefix queries.
68130

69-
### Set Operator
131+
###### Examples
132+
`field:"thanks for your contrib"*` will match 'thanks for your contribution'.
70133

71-
Quickwit supports `IN [value1 value2 ...]` as a set membership operator. This is more cpu efficient than the equivalent `OR`ing of many terms, but may download more of the split than `OR`ing, especially when only a few terms are searched. You must specify a field being searched for Set queries.
134+
###### Limitation
72135

73-
### Range queries
136+
Quickwit may trim some results matched by this clause in some cases. If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co", and search for any documents where "thanks for your" is followed by any of these tokens.
74137

75-
Range queries can only be executed on fields with a fast field. Currently only fields of type `ip` are supported.
138+
If there are many tokens starting with "co", "contribution" might not be one of the 50 selected tokens, and the query won't match a document containing "thanks for your contribution". Normal prefix queries don't suffer from this issue.
76139

140+
141+
<!-- NEEDS CLARIFICATION : what does "first 50 tokens" mean ? in what order ? can the value be tuned ? -->
142+
143+
144+
### Range `field: [low_bound high_bound}`
145+
```
146+
range = explicit_range | comparison_half_range
147+
148+
explicit_range = left_bound_char bounds right_bound_char
149+
left_bound_char = '[' | '{'
150+
right_bound_char = '}' | ']'
151+
bounds = term term
152+
| term '*'
153+
| '*' term
154+
155+
comparison_range = comparison_operator term
156+
comparision_operator = '<' | '>' | '<=' | '>='
157+
```
158+
159+
Matches if the document contains a token between the provided bounds for that field.
160+
For range queries, you must provide a field. Quickwit won't use `default_search_fields` automatically.
161+
162+
###### Order
163+
For text fields, the ranges are defined by lexicographic order. It means for a text field, 100 is between 1 and 2.
164+
When using ranges on integers, it behaves naturally.
165+
166+
###### Inclusive and exclusive bounds
167+
Inclusive bounds are represented by square brackets `[]`. They will match tokens equal to the bound term.
168+
Exclusive bounds are represented by curly brackets `{}`. They will not match tokens equal to the bound term.
169+
170+
###### Half-Open bounds
171+
You can make an half open range by using `*` as one of the bounds. `field:[b TO *]` will match 'bb' and 'zz', but not 'ab'.
172+
You can also use a comparison based syntax:`field:<b`, `field:>b`, `field:<=b` or `field:>=b`.
173+
174+
<!-- NEEDS CLARIFICATION : ordering of empty values ? -->
175+
176+
###### Examples
77177
- Inclusive Range: `ip:[127.0.0.1 TO 127.0.0.50]`
78178
- Exclusive Range: `ip:{127.0.0.1 TO 127.0.0.50}`
79179
- Unbounded Inclusive Range: `ip:[127.0.0.1 TO *] or ip:>=127.0.0.1`
80180
- Unbounded Exclusive Range: `ip:{127.0.0.1 TO *] or ip:>127.0.0.1`
81181

82182

83-
#### Examples:
183+
### Exists `field:*`
84184

85-
With the following corpus:
86-
```json
87-
[
88-
{"id": 1, "body": "a red bike"},
89-
{"id": 2, "body": "a small blue bike"},
90-
{"id": 3, "body": "a small, rusty, and yellow bike"},
91-
{"id": 4, "body": "fred's small bike"},
92-
{"id": 5, "body": "a tiny shelter"}
93-
]
94-
```
95-
The following queries will output:
185+
Matches documents where the field is set. You have to specify a field for this query, Quickwit won't use `default_search_fields` automatically.
96186

97-
- `body:"small bird"~2`: no match []
98-
- `body:"red bike"~2`: matches [1]
99-
- `body:"small blue bike"~3`: matches [2]
100-
- `body:"small bike"`: matches [4]
101-
- `body:"small bike"~1`: matches [2, 4]
102-
- `body:"small bike"~2`: matches [2, 4]
103-
- `body:"small bike"~3`: matches [2, 3, 4]
104-
- `body: IN [small tiny]`: matches [2, 3, 4, 5]
187+
### Match All `*`
105188

106-
### Escaping Special Characters
189+
Matches every document. You can't put a field in front. It is simply written as `*`.
190+
191+
---
192+
193+
## Building Queries
194+
Most queries are composed of more than one clause. When doing so, you may add operators between clauses.
195+
196+
Implicitly if no operator is provided, 'AND' is assumed.
197+
198+
### Conjunction `AND`
199+
An `AND` query will match only if both sides match.
107200

108-
Special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. Such characters can still appear in query terms, but they need to be escaped by an antislash `\` .
201+
<!-- TODO: Formal example ?*-->
202+
203+
### Disjunction `OR`
204+
An `OR` query will match if either (or both) sides match.
205+
206+
<!-- TODO: Formal example ?*-->
207+
208+
### Negation `NOT` or `-`
209+
A `NOT` query will match if the clause it is applied to does not match.
210+
The `-` prefix is equivalent to the `NOT` operator.
211+
212+
### Grouping `()`
213+
Parentheses are used to force the order of evaluation of operators.
214+
For instance, if a query should match if 'field1' is 'one' or 'two', and 'field2' is 'three', you can use `(field1:one OR field1:two) AND field2:three`.
215+
216+
### Operator Precedence
217+
Without parentheses, `AND` takes precedence over `OR`. That is, `a AND b OR c` is interpreted as `(a AND b) or c`.
218+
219+
`NOT` and `-` takes precedence over everything, such that `-a AND b` means `(-a) AND b`, not `-(a AND B)`.
220+
221+
222+
---
223+
224+
## Other considerations
225+
226+
### Default Search Fields
227+
In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration.
228+
229+
<!-- NEED CLARIFICATION : default fields clauses behavior on an array is combined using OR or AND ? -->
230+
231+
### Tokenization
232+
Note that the result of a query can depend on the tokenizer used for the field getting searched. Hence this document always speaks of tokens, which may be the exact value the document contain (in case of the raw tokenizer), or a subset of it (for instance any tokenizer cutting on spaces).
233+
234+
<!-- NOTE : should dig deeper ? -->
235+
Quickwit uses a query mini-language which is used by providing a `query` parameter to the search endpoints.
236+
<!-- todo also used in some place in ES: where? -->

0 commit comments

Comments
 (0)