You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
true in-place partial updates, custom ranking and more.</p>
114
+
115
+
<strong>Examples and starting sample applications</strong>
116
+
<p>
117
+
There are many examples and starting applications on
118
+
<ahref="https://github.com/vespa-engine/sample-apps/">GitHub</a> and <ahref="https://pyvespa.readthedocs.io/en/latest/examples.html">PyVespa examples</a>.
Copy file name to clipboardExpand all lines: en/tutorials/hybrid-search.md
+49-46Lines changed: 49 additions & 46 deletions
Original file line number
Diff line number
Diff line change
@@ -135,10 +135,10 @@ schema doc {
135
135
}
136
136
137
137
field embedding type tensor<bfloat16>(v[384]) {
138
-
indexing: input title." ".input text | embed | attribute
139
-
attribute {
140
-
distance-metric: angular
141
-
}
138
+
indexing: input title." ".input text | embed | attribute
139
+
attribute {
140
+
distance-metric: angular
141
+
}
142
142
}
143
143
144
144
rank-profile bm25 {
@@ -149,7 +149,7 @@ schema doc {
149
149
150
150
rank-profile semantic {
151
151
inputs {
152
-
query(e) tensor<bfloat16>(v[384])
152
+
query(e) tensor<bfloat16>(v[384])
153
153
}
154
154
first-phase {
155
155
expression: closeness(field, embedding)
@@ -170,7 +170,7 @@ The [string](../reference/schema-reference.html#string) data type represents bot
170
170
and there are significant differences between [index and attribute](../text-matching.html#index-and-attribute). The above
171
171
schema includes default `match` modes for `attribute` and `index` property for visibility.
172
172
173
-
Note that we are enabling [BM25](../reference/bm25.html) for `title` and `text`.
173
+
Note that we are enabling [BM25](../reference/bm25.html) for `title` and `text`
174
174
by including `index: enable-bm25`. The language field is the only field that is not the NFCorpus dataset.
175
175
We hardcode its value to "en" since the dataset is English. Using `set_language` avoids automatic language detection and uses the value when processing the other
176
176
text fields. Read more in [linguistics](../linguistics.html).
@@ -189,9 +189,9 @@ Our `embedding` vector field is of [tensor](../tensor-user-guide.html) type with
189
189
field embedding type tensor<bfloat16>(v[384]) {
190
190
indexing: input title." ".input text | embed arctic | attribute
191
191
attribute {
192
-
distance-metric: angular
192
+
distance-metric: angular
193
193
}
194
-
}
194
+
}
195
195
```
196
196
The `indexing` expression creates the input to the `embed` inference call (in our example the concatenation of the title and the text field). Since
197
197
the dataset is small, we do not specify `index` which would build [HNSW](../approximate-nn-hnsw.html) datastructures for faster (but approximate) vector search. This guide uses [snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs) as the text embedding model. The model is
@@ -250,7 +250,7 @@ Some notes about the elements above:
250
250
-`<container>` defines the [container cluster](../jdisc/index.html) for document, query and result processing
251
251
-`<search>` sets up the [query endpoint](../query-api.html). The default port is 8080.
252
252
-`<document-api>` sets up the [document endpoint](../reference/document-v1-api-reference.html) for feeding.
253
-
-`component` with type `hugging-face-embedder` configures the embedder in the application package. This include where to fetch the model files from, the prepend
253
+
-`component` with type `hugging-face-embedder` configures the embedder in the application package. This includes where to fetch the model files from, the prepend
254
254
instructions, and the pooling strategy. See [huggingface-embedder](../embedding.html#huggingface-embedder) for details and other embedders supported.
255
255
-`<content>` defines how documents are stored and searched
256
256
-`<min-redundancy>` denotes how many copies to keep of each document.
The output should look like this (rates may vary depending on your machine HW):
320
320
321
321
<pre>{% highlight json%}
322
322
{
323
323
"feeder.operation.count": 3633,
324
-
"feeder.seconds": 39.723,
324
+
"feeder.seconds": 148.515,
325
325
"feeder.ok.count": 3633,
326
-
"feeder.ok.rate": 91.459,
326
+
"feeder.ok.rate": 24.462,
327
327
"feeder.error.count": 0,
328
328
"feeder.inflight.count": 0,
329
-
"http.request.count": 13157,
330
-
"http.request.bytes": 21102792,
331
-
"http.request.MBps": 0.531,
329
+
"http.request.count": 3633,
330
+
"http.request.bytes": 2985517,
331
+
"http.request.MBps": 0.020,
332
332
"http.exception.count": 0,
333
-
"http.response.count": 13157,
334
-
"http.response.bytes": 1532828,
335
-
"http.response.MBps": 0.039,
336
-
"http.response.error.count": 9524,
337
-
"http.response.latency.millis.min": 0,
338
-
"http.response.latency.millis.avg": 1220,
339
-
"http.response.latency.millis.max": 13703,
333
+
"http.response.count": 3633,
334
+
"http.response.bytes": 348320,
335
+
"http.response.MBps": 0.002,
336
+
"http.response.error.count": 0,
337
+
"http.response.latency.millis.min": 316,
338
+
"http.response.latency.millis.avg": 787,
339
+
"http.response.latency.millis.max": 1704,
340
340
"http.response.code.counts": {
341
-
"200": 3633,
342
-
"429": 9524
341
+
"200": 3633
343
342
}
344
343
}{% endhighlight %}</pre>
345
344
346
345
Notice:
347
346
348
347
-`feeder.ok.rate` which is the throughput (Note that this step includes embedding inference). See [embedder-performance](../embedding.html#embedder-performance) for details on embedding inference performance. In this case, embedding inference is the bottleneck for overall indexing throughput.
349
-
-`http.response.code.counts` matches with `feeder.ok.count` - The dataset has 3633 documents. The `429`are harmless. Vespa asks the client
350
-
to slow down the feed speed because of resource contention.
348
+
-`http.response.code.counts` matches with `feeder.ok.count` - The dataset has 3633 documents. Note that if you observe any `429`responses, these are
349
+
harmless. Vespa asks the client to slow down the feed speed because of resource contention.
351
350
352
351
353
352
## Sample queries
@@ -356,14 +355,16 @@ We can now run a few sample queries to demonstrate various ways to perform searc
$ ir_datasets export beir/nfcorpus/test queries --fields query_id text |head -1
358
+
$ ir_datasets export beir/nfcorpus/test queries --fields query_id text |head -1
360
359
</pre>
361
360
</div>
362
361
363
362
<pre>
364
363
PLAIN-2 Do Cholesterol Statin Drugs Cause Breast Cancer?
365
364
</pre>
366
365
366
+
If you see a pipe related error from the above command, you can safely ignore it.
367
+
367
368
Here, `PLAIN-2` is the query id of the first test query. We'll use this test query to demonstrate querying Vespa.
368
369
369
370
### Lexical search with BM25 scoring
@@ -393,7 +394,7 @@ This query returns the following [JSON result response](../reference/default-res
393
394
"id": "toplevel",
394
395
"relevance": 1.0,
395
396
"fields": {
396
-
"totalCount": 65
397
+
"totalCount": 46
397
398
},
398
399
"coverage": {
399
400
"coverage": 100,
@@ -423,7 +424,7 @@ This query returns the following [JSON result response](../reference/default-res
423
424
{% endhighlight %}</pre>
424
425
425
426
The query retrieves and ranks `MED-10` as the most relevant document—notice the `totalCount` which is the number of documents that were retrieved for ranking
426
-
phases. In this case, we exposed 65 documents to first-phase ranking, it is higher than our target, but also fewer than the total number of documents that match any query terms.
427
+
phases. In this case, we exposed about 50 documents to first-phase ranking, it is higher than our target, but also fewer than the total number of documents that match any query terms.
427
428
428
429
In the example below, we change the grammar from the default `weakAnd` to `any`, and the query matches 1780, or almost 50% of the indexed documents.
429
430
@@ -542,7 +543,7 @@ This query returns the following [JSON result response](../reference/default-res
542
543
}{% endhighlight %}</pre>
543
544
544
545
The result of this vector-based search differed from the previous sparse keyword search, with a different relevant document at position 1. In this case,
545
-
the relevance score is 0.606 and calculated by the `closeness` function in the `semantic` rank-profile.
546
+
the relevance score is 0.606 and calculated by the `closeness` function in the `semantic` rank-profile. Note that more documents were retrieved than the `targetHits`.
546
547
547
548
```
548
549
rank-profile semantic {
@@ -562,7 +563,7 @@ Note that similarity scores of embedding vectors are often optimized via contras
562
563
563
564
## Evaluate ranking accuracy
564
565
565
-
The previous section demonstrated how to combine the Vespa query language with rank profiles to
566
+
The previous section demonstrated how to combine the Vespa query language with rank profiles
566
567
to implement two different retrieval and ranking strategies.
567
568
568
569
In the following section we evaluate all 323 test queries with both models to compare their overall effectiveness, measured using [nDCG@10](https://en.wikipedia.org/wiki/Discounted_cumulative_gain). `nDCG@10` is the official evaluation metric of the BEIR benchmark and is an appropriate metric for test sets with graded relevance judgments.
Ranking metric NDCG@10 for rank profile hybrid-linear-normalize: 0.3356
1103
+
Ranking metric NDCG@10 for rank profile hybrid-linear-normalize: 0.3387
1101
1104
</pre>
1102
1105
1103
1106
On this particular dataset, the `hybrid-normalize-bm25-with-atan` rank profile performs the best, but the difference is small. This also demonstrates that hybrid search
1104
1107
and ranking is a complex problem and that the effectiveness of the hybrid model depends on the dataset and the retrieval strategies.
1105
1108
1106
1109
These results (which is the best) might not
1107
-
transfer to your specific retrieval use case and dataset, so it is important to evaluate the effectiveness of a hybrid model on your specific dataset and having
1108
-
your own relevance judgments.
1110
+
transfer to your specific retrieval use case and dataset, so it is important to evaluate the effectiveness of a hybrid model on
1111
+
your specific dataset.
1109
1112
1110
1113
See [Improving retrieval with LLM-as-a-judge](https://blog.vespa.ai/improving-retrieval-with-llm-as-a-judge/) for more information on how to collect relevance judgments for your dataset.
1111
1114
1112
1115
### Summary
1113
1116
1114
-
In this tutorial, we demonstrated combining two retrieval strategies using the Vespa query language and how to expression hybriding ranking using the Vespa ranking framework.
1117
+
In this tutorial, we demonstrated combining two retrieval strategies using the Vespa query language and how to express hybrid ranking using the Vespa ranking framework.
1115
1118
1116
1119
We showed how to express hybrid queries using the Vespa query language and how to combine the two retrieval strategies using the Vespa ranking framework. We also showed how to evaluate the effectiveness of the hybrid ranking model using one of the datasets that are a part of the BEIR benchmark. We hope this tutorial has given you a good understanding of how to combine different retrieval strategies using Vespa, and that there is not a single silver bullet for all retrieval problems.
0 commit comments