Skip to content

Commit

Permalink
Merge pull request #3622 from vespa-engine/hmusum/minor-changes
Browse files Browse the repository at this point in the history
Minor changes to language
  • Loading branch information
kkraune authored Feb 10, 2025
2 parents b5b460f + 03a51eb commit cada6d7
Showing 1 changed file with 23 additions and 24 deletions.
47 changes: 23 additions & 24 deletions en/performance/sizing-search.html
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ <h3 id="high-data-availability">High Data Availability</h3>
<p>
Ideally, the data is available and searchable at all times, even during node failures.
High availability costs resources due to data replication.
How many replicas of the data to configure,
How many replicas of the data to configure
depends on what kind of availability guarantees the deployment should provide.
Configure availability vs cost:
</p>
Expand Down Expand Up @@ -143,7 +143,7 @@ <h3 id="content-node-database">Content node database</h3>
In a flat distributed system there is only one active instance of the same document,
while with grouped distribution there is one active instance per group.</li>
<li>The documents in the <b>Not Ready</b> DB are stored but not indexed.</li>
<li>The documents in the <b>Removed</b> are stored but blocklisted, hidden from search.
<li>The documents in the <b>Removed</b> DB are stored but blocklisted, hidden from search.
The documents are permanently deleted from storage by
<a href="../proton.html#proton-maintenance-jobs">Proton maintenance jobs</a>.</li>
</ul>
Expand All @@ -156,8 +156,8 @@ <h3 id="content-node-database">Content node database</h3>
</p><p>
With <em>searchable-copies</em>=2 and <em>redundancy</em>=2,
each replica is fully indexed on separate content nodes.
Only the documents in <em>Active</em> state is searchable,
the posting lists for a given term is (up to) doubled as compared to <em>searchable-copies</em>=1.
Only the documents in <em>Active</em> state are searchable,
the posting lists for a given term are (up to) doubled as compared to <em>searchable-copies</em>=1.
</p><p>
See <a href="sizing-examples.html">Content cluster Sizing example deployments</a>
for examples using grouped and flat data distribution.
Expand Down Expand Up @@ -201,7 +201,7 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
<li>Invokes chains of custom <a href="../jdisc/container-components.html">container components/plugins</a>
which can work on the request and query input and also the results.</li>
<li>Dispatching of query to content nodes in the content cluster(s) for parallel execution.
With flat distribution, queries are dispatched to all content nodes
With flat distribution queries are dispatched to all content nodes,
while with a grouped distribution the query is dispatched to all content nodes within a group
and the queries are load-balanced between the groups using a
<a href="../reference/services-content.html#dispatch-policy">dispatch-policy</a>.</li>
Expand Down Expand Up @@ -238,9 +238,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
<li>Build up the query tree from the serialized network representation.</li>
<li>Lookup the query terms in the index and B-tree dictionaries
and estimate the number of hits each term and parts of the query tree will produce.
Terms which searches attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
Terms which search attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
will be given a hit count estimate to the total number of documents.</li>
<li>Optimize and re-arrange the query tree for most efficient performance trying to move terms or
<li>Optimize and re-arrange the query tree for most efficient performance, trying to move terms or
operators with the lowest hit ratio estimate first in the query tree.</li>
<li>Prepare for query execution, by fetching posting lists from the index and B-tree structures.</li>
<li>Multithreaded execution per search starts using the above information.
Expand All @@ -255,9 +255,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
<p>
<a href="../jdisc/">Container</a> clusters are stateless and easy to scale horizontally,
and don't require any data distribution during re-sizing.
The set of stateful content clusters can be scaled independently
The set of stateful content nodes can be scaled independently
and <a href="../elasticity.html">re-sized</a> which requires re-distribution of data.
Re-distribution of data in Vespa, is supported and designed to be done without significant serving impact.
Re-distribution of data in Vespa is supported and designed to be done without significant serving impact.
Altering the number of nodes or groups in a Vespa content cluster does not require re-feeding of the corpus,
so it's easy to start out with a sample prototype and scale it to production scale workloads.
</p>
Expand Down Expand Up @@ -316,12 +316,12 @@ <h2 id="content-cluster-scalability-model">Content cluster scalability model</h2
</tr>
</table>
<p>
Adding content nodes to content cluster (keeping the total document volume fixed) configured with flat distribution,
reduces the dynamic query work per node (<em>DQW</em>)
Adding content nodes to a content cluster (keeping the total document volume fixed) with flat distribution
reduces the dynamic query work per node (<em>DQW</em>),
but does not reduce the static query work (<em>SQW</em>).
The overall system cost also increases as you need to rent another node.
</p><p>
Since <em>DQW</em> depends and scales almost linearly with the number of documents on the content nodes,
Since <em>DQW</em> depends and scales almost linearly with the number of documents on the content nodes,
one can try to distribute the work over more nodes.
<em>Amdahl's law</em> specifies that the maximum speedup one achieve by parallelizing the
dynamic work (<em>DQW</em>) is given by the formula:
Expand Down Expand Up @@ -413,9 +413,9 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
<ul>
<li>
<p>
For the yellow use case,
the measured latency is almost independent of the total document volume. This is called sublinear latency scaling
which calls for scaling up using better flavor specification instead of scaling out.
For the yellow use case the measured latency is almost independent of the total document volume.
This is called sublinear latency scaling, which calls for scaling up using better flavor
specification instead of scaling out.
</p>
<p>
The observed latency at 10M documents per node is almost the same as with 1M documents per node.
Expand All @@ -430,8 +430,7 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
</li>
<li>
<p>
For the blue use case,
the measured latency shows a clear correlation with the document volume.
For the blue use case the measured latency shows a clear correlation with the document volume.
This is a case where the dynamic query work portion is high,
and adding nodes to the flat group will reduce the serving latency.
The sweet spot is found where targeted latency SLA is achieved.
Expand All @@ -455,7 +454,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
<p>
It is possible to reduce latency of queries
where the <a href="#dynamic-query-work">dynamic query work</a> portion is high.
Using multiple threads per search for a use case where the static query work is high,
Using multiple threads per search for a use case where the static query work is high
will be as wasteful as adding nodes to a flat distribution.
</p>
<figure>
Expand All @@ -482,7 +481,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
<li>Sublinear approximate nearest neighbor search latency does not benefit from using more threads per search</li>
</ul>
<p>
By default, the number of threads per search is one,
By default the number of threads per search is one,
as that gives the best resource usage measured as CPU resources used per query.
The optimal threads per search depends on the query use case,
and should be evaluated by benchmarking.
Expand Down Expand Up @@ -537,7 +536,7 @@ <h3 id="when-documents-are-too-large">When documents are too large</h4>
increase the amount of temporary memory required for complex ranking expressions like multi-dimensional ColBert maxsim.
As document are processed, indexed, stored and ranked as individual units, working on a few very large documents
at a time may not offer the system enough opportunity to parallelize and result in poor, uneven utilization
of resources, and even a small fraction of very-large documents may impact your mean (and especially higher percentile)
of resources, and even a small fraction of very large documents may impact your mean (and especially higher percentile)
latencies both for processing and query execution.

<h3 id="too-small-documents">When documents are too small</h4>
Expand Down Expand Up @@ -565,11 +564,11 @@ <h2 id="scaling-document-volume-per-node">Scaling document volume per node</h2>
<p>
With the latency SLA in mind, benchmark with increasing number of documents per node
and watch system level metrics and Vespa metrics.
If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
overall cost is reduced by fitting more documents into each node
(e.g. by increasing memory, cpu and disk constraints set by the node flavor).
</p><p>
With larger fan-out by using more nodes to partition the data overcomes also higher tail latency
With larger fan-out using more nodes to partition the data also overcomes higher tail latency
as search waits for all results from all nodes. Therefore, the overall execution time depends on
the slowest node at the time of the query. In such cases with large fan-out, using
<a href="../reference/services-content.html#coverage">adaptive timeout</a> is recommended
Expand Down Expand Up @@ -603,7 +602,7 @@ <h3 id="memory-usage-sizing">Memory usage sizing</h3>
<p>
The memory usage on a content node increases as the document volume increases.
The memory usage increases almost linearly with the number of documents.
The Vespa vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
The vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
so the virtual memory usage reported might be high,
as both index and summary files are mapped into memory using mmap
and pages are paged into memory as needed.
Expand All @@ -630,7 +629,7 @@ <h2 id="scaling-throughput">Scaling Throughput</h2>
Also, that it has capacity to absorb load increases over time,
as well as having sufficient capacity to sustain node outages during peak traffic.
</p><p>
At some throughput level, some resource(s) in the system will be fully saturated,
At some throughput level some resource(s) in the system will be fully saturated,
and requests will be queued up causing latency to spike up,
as requests are spending more time waiting for the saturated resource.
</p><p>
Expand Down

0 comments on commit cada6d7

Please sign in to comment.