@@ -97,7 +97,7 @@ <h3 id="high-data-availability">High Data Availability</h3>
97
97
< p >
98
98
Ideally, the data is available and searchable at all times, even during node failures.
99
99
High availability costs resources due to data replication.
100
- How many replicas of the data to configure,
100
+ How many replicas of the data to configure
101
101
depends on what kind of availability guarantees the deployment should provide.
102
102
Configure availability vs cost:
103
103
</ p >
@@ -143,7 +143,7 @@ <h3 id="content-node-database">Content node database</h3>
143
143
In a flat distributed system there is only one active instance of the same document,
144
144
while with grouped distribution there is one active instance per group.</ li >
145
145
< li > The documents in the < b > Not Ready</ b > DB are stored but not indexed.</ li >
146
- < li > The documents in the < b > Removed</ b > are stored but blocklisted, hidden from search.
146
+ < li > The documents in the < b > Removed</ b > DB are stored but blocklisted, hidden from search.
147
147
The documents are permanently deleted from storage by
148
148
< a href ="../proton.html#proton-maintenance-jobs "> Proton maintenance jobs</ a > .</ li >
149
149
</ ul >
@@ -156,8 +156,8 @@ <h3 id="content-node-database">Content node database</h3>
156
156
</ p > < p >
157
157
With < em > searchable-copies</ em > =2 and < em > redundancy</ em > =2,
158
158
each replica is fully indexed on separate content nodes.
159
- Only the documents in < em > Active</ em > state is searchable,
160
- the posting lists for a given term is (up to) doubled as compared to < em > searchable-copies</ em > =1.
159
+ Only the documents in < em > Active</ em > state are searchable,
160
+ the posting lists for a given term are (up to) doubled as compared to < em > searchable-copies</ em > =1.
161
161
</ p > < p >
162
162
See < a href ="sizing-examples.html "> Content cluster Sizing example deployments</ a >
163
163
for examples using grouped and flat data distribution.
@@ -201,7 +201,7 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
201
201
< li > Invokes chains of custom < a href ="../jdisc/container-components.html "> container components/plugins</ a >
202
202
which can work on the request and query input and also the results.</ li >
203
203
< li > Dispatching of query to content nodes in the content cluster(s) for parallel execution.
204
- With flat distribution, queries are dispatched to all content nodes
204
+ With flat distribution queries are dispatched to all content nodes,
205
205
while with a grouped distribution the query is dispatched to all content nodes within a group
206
206
and the queries are load-balanced between the groups using a
207
207
< a href ="../reference/services-content.html#dispatch-policy "> dispatch-policy</ a > .</ li >
@@ -238,9 +238,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
238
238
< li > Build up the query tree from the serialized network representation.</ li >
239
239
< li > Lookup the query terms in the index and B-tree dictionaries
240
240
and estimate the number of hits each term and parts of the query tree will produce.
241
- Terms which searches attribute fields without < a href ="../attributes.html#fast-search "> fast-search</ a >
241
+ Terms which search attribute fields without < a href ="../attributes.html#fast-search "> fast-search</ a >
242
242
will be given a hit count estimate to the total number of documents.</ li >
243
- < li > Optimize and re-arrange the query tree for most efficient performance trying to move terms or
243
+ < li > Optimize and re-arrange the query tree for most efficient performance, trying to move terms or
244
244
operators with the lowest hit ratio estimate first in the query tree.</ li >
245
245
< li > Prepare for query execution, by fetching posting lists from the index and B-tree structures.</ li >
246
246
< li > Multithreaded execution per search starts using the above information.
@@ -255,9 +255,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
255
255
< p >
256
256
< a href ="../jdisc/ "> Container</ a > clusters are stateless and easy to scale horizontally,
257
257
and don't require any data distribution during re-sizing.
258
- The set of stateful content clusters can be scaled independently
258
+ The set of stateful content nodes can be scaled independently
259
259
and < a href ="../elasticity.html "> re-sized</ a > which requires re-distribution of data.
260
- Re-distribution of data in Vespa, is supported and designed to be done without significant serving impact.
260
+ Re-distribution of data in Vespa is supported and designed to be done without significant serving impact.
261
261
Altering the number of nodes or groups in a Vespa content cluster does not require re-feeding of the corpus,
262
262
so it's easy to start out with a sample prototype and scale it to production scale workloads.
263
263
</ p >
@@ -316,12 +316,12 @@ <h2 id="content-cluster-scalability-model">Content cluster scalability model</h2
316
316
</ tr >
317
317
</ table >
318
318
< p >
319
- Adding content nodes to content cluster (keeping the total document volume fixed) configured with flat distribution,
320
- reduces the dynamic query work per node (< em > DQW</ em > )
319
+ Adding content nodes to a content cluster (keeping the total document volume fixed) with flat distribution
320
+ reduces the dynamic query work per node (< em > DQW</ em > ),
321
321
but does not reduce the static query work (< em > SQW</ em > ).
322
322
The overall system cost also increases as you need to rent another node.
323
323
</ p > < p >
324
- Since < em > DQW</ em > depends and scales almost linearly with the number of documents on the content nodes,
324
+ Since < em > DQW</ em > depends and scales almost linearly with the number of documents on the content nodes,
325
325
one can try to distribute the work over more nodes.
326
326
< em > Amdahl's law</ em > specifies that the maximum speedup one achieve by parallelizing the
327
327
dynamic work (< em > DQW</ em > ) is given by the formula:
@@ -413,9 +413,9 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
413
413
< ul >
414
414
< li >
415
415
< p >
416
- For the yellow use case,
417
- the measured latency is almost independent of the total document volume. This is called sublinear latency scaling
418
- which calls for scaling up using better flavor specification instead of scaling out.
416
+ For the yellow use case the measured latency is almost independent of the total document volume.
417
+ This is called sublinear latency scaling, which calls for scaling up using better flavor
418
+ specification instead of scaling out.
419
419
</ p >
420
420
< p >
421
421
The observed latency at 10M documents per node is almost the same as with 1M documents per node.
@@ -430,8 +430,7 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
430
430
</ li >
431
431
< li >
432
432
< p >
433
- For the blue use case,
434
- the measured latency shows a clear correlation with the document volume.
433
+ For the blue use case the measured latency shows a clear correlation with the document volume.
435
434
This is a case where the dynamic query work portion is high,
436
435
and adding nodes to the flat group will reduce the serving latency.
437
436
The sweet spot is found where targeted latency SLA is achieved.
@@ -455,7 +454,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
455
454
< p >
456
455
It is possible to reduce latency of queries
457
456
where the < a href ="#dynamic-query-work "> dynamic query work</ a > portion is high.
458
- Using multiple threads per search for a use case where the static query work is high,
457
+ Using multiple threads per search for a use case where the static query work is high
459
458
will be as wasteful as adding nodes to a flat distribution.
460
459
</ p >
461
460
< figure >
@@ -482,7 +481,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
482
481
< li > Sublinear approximate nearest neighbor search latency does not benefit from using more threads per search</ li >
483
482
</ ul >
484
483
< p >
485
- By default, the number of threads per search is one,
484
+ By default the number of threads per search is one,
486
485
as that gives the best resource usage measured as CPU resources used per query.
487
486
The optimal threads per search depends on the query use case,
488
487
and should be evaluated by benchmarking.
@@ -537,7 +536,7 @@ <h3 id="when-documents-are-too-large">When documents are too large</h4>
537
536
increase the amount of temporary memory required for complex ranking expressions like multi-dimensional ColBert maxsim.
538
537
As document are processed, indexed, stored and ranked as individual units, working on a few very large documents
539
538
at a time may not offer the system enough opportunity to parallelize and result in poor, uneven utilization
540
- of resources, and even a small fraction of very- large documents may impact your mean (and especially higher percentile)
539
+ of resources, and even a small fraction of very large documents may impact your mean (and especially higher percentile)
541
540
latencies both for processing and query execution.
542
541
543
542
< h3 id ="too-small-documents "> When documents are too small</ h4 >
@@ -565,11 +564,11 @@ <h2 id="scaling-document-volume-per-node">Scaling document volume per node</h2>
565
564
< p >
566
565
With the latency SLA in mind, benchmark with increasing number of documents per node
567
566
and watch system level metrics and Vespa metrics.
568
- If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
567
+ If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
569
568
overall cost is reduced by fitting more documents into each node
570
569
(e.g. by increasing memory, cpu and disk constraints set by the node flavor).
571
570
</ p > < p >
572
- With larger fan-out by using more nodes to partition the data overcomes also higher tail latency
571
+ With larger fan-out using more nodes to partition the data also overcomes higher tail latency
573
572
as search waits for all results from all nodes. Therefore, the overall execution time depends on
574
573
the slowest node at the time of the query. In such cases with large fan-out, using
575
574
< a href ="../reference/services-content.html#coverage "> adaptive timeout</ a > is recommended
@@ -603,7 +602,7 @@ <h3 id="memory-usage-sizing">Memory usage sizing</h3>
603
602
< p >
604
603
The memory usage on a content node increases as the document volume increases.
605
604
The memory usage increases almost linearly with the number of documents.
606
- The Vespa vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
605
+ The vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
607
606
so the virtual memory usage reported might be high,
608
607
as both index and summary files are mapped into memory using mmap
609
608
and pages are paged into memory as needed.
@@ -630,7 +629,7 @@ <h2 id="scaling-throughput">Scaling Throughput</h2>
630
629
Also, that it has capacity to absorb load increases over time,
631
630
as well as having sufficient capacity to sustain node outages during peak traffic.
632
631
</ p > < p >
633
- At some throughput level, some resource(s) in the system will be fully saturated,
632
+ At some throughput level some resource(s) in the system will be fully saturated,
634
633
and requests will be queued up causing latency to spike up,
635
634
as requests are spending more time waiting for the saturated resource.
636
635
</ p > < p >
0 commit comments