Skip to content

Commit cada6d7

Browse files
authored
Merge pull request #3622 from vespa-engine/hmusum/minor-changes
Minor changes to language
2 parents b5b460f + 03a51eb commit cada6d7

File tree

1 file changed

+23
-24
lines changed

1 file changed

+23
-24
lines changed

en/performance/sizing-search.html

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ <h3 id="high-data-availability">High Data Availability</h3>
9797
<p>
9898
Ideally, the data is available and searchable at all times, even during node failures.
9999
High availability costs resources due to data replication.
100-
How many replicas of the data to configure,
100+
How many replicas of the data to configure
101101
depends on what kind of availability guarantees the deployment should provide.
102102
Configure availability vs cost:
103103
</p>
@@ -143,7 +143,7 @@ <h3 id="content-node-database">Content node database</h3>
143143
In a flat distributed system there is only one active instance of the same document,
144144
while with grouped distribution there is one active instance per group.</li>
145145
<li>The documents in the <b>Not Ready</b> DB are stored but not indexed.</li>
146-
<li>The documents in the <b>Removed</b> are stored but blocklisted, hidden from search.
146+
<li>The documents in the <b>Removed</b> DB are stored but blocklisted, hidden from search.
147147
The documents are permanently deleted from storage by
148148
<a href="../proton.html#proton-maintenance-jobs">Proton maintenance jobs</a>.</li>
149149
</ul>
@@ -156,8 +156,8 @@ <h3 id="content-node-database">Content node database</h3>
156156
</p><p>
157157
With <em>searchable-copies</em>=2 and <em>redundancy</em>=2,
158158
each replica is fully indexed on separate content nodes.
159-
Only the documents in <em>Active</em> state is searchable,
160-
the posting lists for a given term is (up to) doubled as compared to <em>searchable-copies</em>=1.
159+
Only the documents in <em>Active</em> state are searchable,
160+
the posting lists for a given term are (up to) doubled as compared to <em>searchable-copies</em>=1.
161161
</p><p>
162162
See <a href="sizing-examples.html">Content cluster Sizing example deployments</a>
163163
for examples using grouped and flat data distribution.
@@ -201,7 +201,7 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
201201
<li>Invokes chains of custom <a href="../jdisc/container-components.html">container components/plugins</a>
202202
which can work on the request and query input and also the results.</li>
203203
<li>Dispatching of query to content nodes in the content cluster(s) for parallel execution.
204-
With flat distribution, queries are dispatched to all content nodes
204+
With flat distribution queries are dispatched to all content nodes,
205205
while with a grouped distribution the query is dispatched to all content nodes within a group
206206
and the queries are load-balanced between the groups using a
207207
<a href="../reference/services-content.html#dispatch-policy">dispatch-policy</a>.</li>
@@ -238,9 +238,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
238238
<li>Build up the query tree from the serialized network representation.</li>
239239
<li>Lookup the query terms in the index and B-tree dictionaries
240240
and estimate the number of hits each term and parts of the query tree will produce.
241-
Terms which searches attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
241+
Terms which search attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
242242
will be given a hit count estimate to the total number of documents.</li>
243-
<li>Optimize and re-arrange the query tree for most efficient performance trying to move terms or
243+
<li>Optimize and re-arrange the query tree for most efficient performance, trying to move terms or
244244
operators with the lowest hit ratio estimate first in the query tree.</li>
245245
<li>Prepare for query execution, by fetching posting lists from the index and B-tree structures.</li>
246246
<li>Multithreaded execution per search starts using the above information.
@@ -255,9 +255,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
255255
<p>
256256
<a href="../jdisc/">Container</a> clusters are stateless and easy to scale horizontally,
257257
and don't require any data distribution during re-sizing.
258-
The set of stateful content clusters can be scaled independently
258+
The set of stateful content nodes can be scaled independently
259259
and <a href="../elasticity.html">re-sized</a> which requires re-distribution of data.
260-
Re-distribution of data in Vespa, is supported and designed to be done without significant serving impact.
260+
Re-distribution of data in Vespa is supported and designed to be done without significant serving impact.
261261
Altering the number of nodes or groups in a Vespa content cluster does not require re-feeding of the corpus,
262262
so it's easy to start out with a sample prototype and scale it to production scale workloads.
263263
</p>
@@ -316,12 +316,12 @@ <h2 id="content-cluster-scalability-model">Content cluster scalability model</h2
316316
</tr>
317317
</table>
318318
<p>
319-
Adding content nodes to content cluster (keeping the total document volume fixed) configured with flat distribution,
320-
reduces the dynamic query work per node (<em>DQW</em>)
319+
Adding content nodes to a content cluster (keeping the total document volume fixed) with flat distribution
320+
reduces the dynamic query work per node (<em>DQW</em>),
321321
but does not reduce the static query work (<em>SQW</em>).
322322
The overall system cost also increases as you need to rent another node.
323323
</p><p>
324-
Since <em>DQW</em> depends and scales almost linearly with the number of documents on the content nodes,
324+
Since <em>DQW</em> depends and scales almost linearly with the number of documents on the content nodes,
325325
one can try to distribute the work over more nodes.
326326
<em>Amdahl's law</em> specifies that the maximum speedup one achieve by parallelizing the
327327
dynamic work (<em>DQW</em>) is given by the formula:
@@ -413,9 +413,9 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
413413
<ul>
414414
<li>
415415
<p>
416-
For the yellow use case,
417-
the measured latency is almost independent of the total document volume. This is called sublinear latency scaling
418-
which calls for scaling up using better flavor specification instead of scaling out.
416+
For the yellow use case the measured latency is almost independent of the total document volume.
417+
This is called sublinear latency scaling, which calls for scaling up using better flavor
418+
specification instead of scaling out.
419419
</p>
420420
<p>
421421
The observed latency at 10M documents per node is almost the same as with 1M documents per node.
@@ -430,8 +430,7 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
430430
</li>
431431
<li>
432432
<p>
433-
For the blue use case,
434-
the measured latency shows a clear correlation with the document volume.
433+
For the blue use case the measured latency shows a clear correlation with the document volume.
435434
This is a case where the dynamic query work portion is high,
436435
and adding nodes to the flat group will reduce the serving latency.
437436
The sweet spot is found where targeted latency SLA is achieved.
@@ -455,7 +454,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
455454
<p>
456455
It is possible to reduce latency of queries
457456
where the <a href="#dynamic-query-work">dynamic query work</a> portion is high.
458-
Using multiple threads per search for a use case where the static query work is high,
457+
Using multiple threads per search for a use case where the static query work is high
459458
will be as wasteful as adding nodes to a flat distribution.
460459
</p>
461460
<figure>
@@ -482,7 +481,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
482481
<li>Sublinear approximate nearest neighbor search latency does not benefit from using more threads per search</li>
483482
</ul>
484483
<p>
485-
By default, the number of threads per search is one,
484+
By default the number of threads per search is one,
486485
as that gives the best resource usage measured as CPU resources used per query.
487486
The optimal threads per search depends on the query use case,
488487
and should be evaluated by benchmarking.
@@ -537,7 +536,7 @@ <h3 id="when-documents-are-too-large">When documents are too large</h4>
537536
increase the amount of temporary memory required for complex ranking expressions like multi-dimensional ColBert maxsim.
538537
As document are processed, indexed, stored and ranked as individual units, working on a few very large documents
539538
at a time may not offer the system enough opportunity to parallelize and result in poor, uneven utilization
540-
of resources, and even a small fraction of very-large documents may impact your mean (and especially higher percentile)
539+
of resources, and even a small fraction of very large documents may impact your mean (and especially higher percentile)
541540
latencies both for processing and query execution.
542541

543542
<h3 id="too-small-documents">When documents are too small</h4>
@@ -565,11 +564,11 @@ <h2 id="scaling-document-volume-per-node">Scaling document volume per node</h2>
565564
<p>
566565
With the latency SLA in mind, benchmark with increasing number of documents per node
567566
and watch system level metrics and Vespa metrics.
568-
If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
567+
If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
569568
overall cost is reduced by fitting more documents into each node
570569
(e.g. by increasing memory, cpu and disk constraints set by the node flavor).
571570
</p><p>
572-
With larger fan-out by using more nodes to partition the data overcomes also higher tail latency
571+
With larger fan-out using more nodes to partition the data also overcomes higher tail latency
573572
as search waits for all results from all nodes. Therefore, the overall execution time depends on
574573
the slowest node at the time of the query. In such cases with large fan-out, using
575574
<a href="../reference/services-content.html#coverage">adaptive timeout</a> is recommended
@@ -603,7 +602,7 @@ <h3 id="memory-usage-sizing">Memory usage sizing</h3>
603602
<p>
604603
The memory usage on a content node increases as the document volume increases.
605604
The memory usage increases almost linearly with the number of documents.
606-
The Vespa vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
605+
The vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
607606
so the virtual memory usage reported might be high,
608607
as both index and summary files are mapped into memory using mmap
609608
and pages are paged into memory as needed.
@@ -630,7 +629,7 @@ <h2 id="scaling-throughput">Scaling Throughput</h2>
630629
Also, that it has capacity to absorb load increases over time,
631630
as well as having sufficient capacity to sustain node outages during peak traffic.
632631
</p><p>
633-
At some throughput level, some resource(s) in the system will be fully saturated,
632+
At some throughput level some resource(s) in the system will be fully saturated,
634633
and requests will be queued up causing latency to spike up,
635634
as requests are spending more time waiting for the saturated resource.
636635
</p><p>

0 commit comments

Comments
 (0)