Merge pull request #3622 from vespa-engine/hmusum/minor-changes

kkraune · web-flow · commit cada6d7aea21 · 2025-02-10T09:44:58.000+01:00
Minor changes to language
diff --git a/en/performance/sizing-search.html b/en/performance/sizing-search.html
@@ -97,7 +97,7 @@ <h3 id="high-data-availability">High Data Availability</h3>
 <p>
 Ideally, the data is available and searchable at all times, even during node failures.
 High availability costs resources due to data replication.
-How many replicas of the data to configure,
+How many replicas of the data to configure
 depends on what kind of availability guarantees the deployment should provide.
 Configure availability vs cost:
 </p>
@@ -143,7 +143,7 @@ <h3 id="content-node-database">Content node database</h3>
     In a flat distributed system there is only one active instance of the same document,
     while with grouped distribution there is one active instance per group.</li>
   <li>The documents in the <b>Not Ready</b> DB are stored but not indexed.</li>
-  <li>The documents in the <b>Removed</b> are stored but blocklisted, hidden from search.
+  <li>The documents in the <b>Removed</b> DB are stored but blocklisted, hidden from search.
     The documents are permanently deleted from storage by
     <a href="../proton.html#proton-maintenance-jobs">Proton maintenance jobs</a>.</li>
 </ul>
@@ -156,8 +156,8 @@ <h3 id="content-node-database">Content node database</h3>
 </p><p>
 With <em>searchable-copies</em>=2 and <em>redundancy</em>=2,
 each replica is fully indexed on separate content nodes.
-Only the documents in <em>Active</em> state is searchable,
-the posting lists for a given term is (up to) doubled as compared to <em>searchable-copies</em>=1.
+Only the documents in <em>Active</em> state are searchable,
+the posting lists for a given term are (up to) doubled as compared to <em>searchable-copies</em>=1.
 </p><p>
 See <a href="sizing-examples.html">Content cluster Sizing example deployments</a>
 for examples using grouped and flat data distribution.
@@ -201,7 +201,7 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
   <li>Invokes chains of custom <a href="../jdisc/container-components.html">container components/plugins</a>
       which can work on the request and query input and also the results.</li>
   <li>Dispatching of query to content nodes in the content cluster(s) for parallel execution.
-      With flat distribution, queries are dispatched to all content nodes
+      With flat distribution queries are dispatched to all content nodes,
       while with a grouped distribution the query is dispatched to all content nodes within a group
       and the queries are load-balanced between the groups using a
       <a href="../reference/services-content.html#dispatch-policy">dispatch-policy</a>.</li>
@@ -238,9 +238,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
   <li>Build up the query tree from the serialized network representation.</li>
   <li>Lookup the query terms in the index and B-tree dictionaries
       and estimate the number of hits each term and parts of the query tree will produce.
-      Terms which searches attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
+      Terms which search attribute fields without <a href="../attributes.html#fast-search">fast-search</a>
       will be given a hit count estimate to the total number of documents.</li>
-  <li>Optimize and re-arrange the query tree for most efficient performance trying to move terms or
+  <li>Optimize and re-arrange the query tree for most efficient performance, trying to move terms or
       operators with the lowest hit ratio estimate first in the query tree.</li>
   <li>Prepare for query execution, by fetching posting lists from the index and B-tree structures.</li>
   <li>Multithreaded execution per search starts using the above information.
@@ -255,9 +255,9 @@ <h2 id="life-of-a-query-in-vespa">Life of a query in Vespa</h2>
 <p>
 <a href="../jdisc/">Container</a> clusters are stateless and easy to scale horizontally,
 and don't require any data distribution during re-sizing.
-The set of stateful content clusters can be scaled independently
+The set of stateful content nodes can be scaled independently
 and <a href="../elasticity.html">re-sized</a> which requires re-distribution of data.
-Re-distribution of data in Vespa, is supported and designed to be done without significant serving impact.
+Re-distribution of data in Vespa is supported and designed to be done without significant serving impact.
 Altering the number of nodes or groups in a Vespa content cluster does not require re-feeding of the corpus,
 so it's easy to start out with a sample prototype and scale it to production scale workloads.
 </p>
@@ -316,12 +316,12 @@ <h2 id="content-cluster-scalability-model">Content cluster scalability model</h2
   </tr>
 </table>
 <p>
-Adding content nodes to content cluster (keeping the total document volume fixed) configured with flat distribution,
-reduces the dynamic query work per node (<em>DQW</em>)
+Adding content nodes to a content cluster (keeping the total document volume fixed) with flat distribution
+reduces the dynamic query work per node (<em>DQW</em>),
 but does not reduce the static query work (<em>SQW</em>).
 The overall system cost also increases as you need to rent another node.
 </p><p>
-Since <em>DQW</em> depends and scales almost linearly with the number of documents  on the content nodes,
+Since <em>DQW</em> depends and scales almost linearly with the number of documents on the content nodes,
 one can try to distribute the work over more nodes.
 <em>Amdahl's law</em> specifies that the maximum speedup one achieve by parallelizing the
 dynamic work (<em>DQW</em>) is given by the formula:
@@ -413,9 +413,9 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
 <ul>
   <li>
     <p>
-    For the yellow use case,
-    the measured latency is almost independent of the total document volume. This is called sublinear latency scaling
-    which calls for scaling up using better flavor specification instead of scaling out.
+    For the yellow use case the measured latency is almost independent of the total document volume.
+    This is called sublinear latency scaling, which calls for scaling up using better flavor
+    specification instead of scaling out.
     </p>
     <p>
     The observed latency at 10M documents per node is almost the same as with 1M documents per node.
@@ -430,8 +430,7 @@ <h2 id="scaling-latency-in-a-content-group">Scaling latency in a content group</
   </li>
   <li>
     <p>
-      For the blue use case,
-      the measured latency shows a clear correlation with the document volume.
+      For the blue use case the measured latency shows a clear correlation with the document volume.
       This is a case where the dynamic query work portion is high,
       and adding nodes to the flat group will reduce the serving latency.
       The sweet spot is found where targeted latency SLA is achieved.
@@ -455,7 +454,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
 <p>
 It is possible to reduce latency of queries
 where the <a href="#dynamic-query-work">dynamic query work</a> portion is high.
-Using multiple threads per search for a use case where the static query work is high,
+Using multiple threads per search for a use case where the static query work is high
 will be as wasteful as adding nodes to a flat distribution.
 </p>
 <figure>
@@ -482,7 +481,7 @@ <h3 id="reduce-latency-with-multi-threaded-per-search-execution">
   <li>Sublinear approximate nearest neighbor search latency does not benefit from using more threads per search</li>
 </ul>
 <p>
-By default, the number of threads per search is one,
+By default the number of threads per search is one,
 as that gives the best resource usage measured as CPU resources used per query.
 The optimal threads per search depends on the query use case,
 and should be evaluated by benchmarking.
@@ -537,7 +536,7 @@ <h3 id="when-documents-are-too-large">When documents are too large</h4>
 increase the amount of temporary memory required for complex ranking expressions like multi-dimensional ColBert maxsim.
 As document are processed, indexed, stored and ranked as individual units, working on a few very large documents
 at a time may not offer the system enough opportunity to parallelize and result in poor, uneven utilization
-of resources, and even a small fraction of very-large documents may impact your mean (and especially higher percentile)
+of resources, and even a small fraction of very large documents may impact your mean (and especially higher percentile)
 latencies both for processing and query execution.
   
 <h3 id="too-small-documents">When documents are too small</h4>
@@ -565,11 +564,11 @@ <h2 id="scaling-document-volume-per-node">Scaling document volume per node</h2>
 <p>
 With the latency SLA in mind, benchmark with increasing number of documents per node
 and watch system level metrics and Vespa metrics.
-If latency is within the stated latency SLA and  the system meets the targeted sustained feed rate,
+If latency is within the stated latency SLA and the system meets the targeted sustained feed rate,
 overall cost is reduced by fitting more documents into each node
 (e.g. by increasing memory, cpu and disk constraints set by the node flavor).
 </p><p>
-With larger fan-out by using more nodes to partition the data overcomes also higher tail latency
+With larger fan-out using more nodes to partition the data also overcomes higher tail latency
 as search waits for all results from all nodes. Therefore, the overall execution time depends on
 the slowest node at the time of the query. In such cases with large fan-out, using
 <a href="../reference/services-content.html#coverage">adaptive timeout</a> is recommended
@@ -603,7 +602,7 @@ <h3 id="memory-usage-sizing">Memory usage sizing</h3>
 <p>
 The memory usage on a content node increases as the document volume increases.
 The memory usage increases almost linearly with the number of documents.
-The Vespa vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
+The vespa-proton-bin process (content node) uses the full 64-bit virtual address space,
 so the virtual memory usage reported might be high,
 as both index and summary files are mapped into memory using mmap
 and pages are paged into memory as needed.
@@ -630,7 +629,7 @@ <h2 id="scaling-throughput">Scaling Throughput</h2>
 Also, that it has capacity to absorb load increases over time,
 as well as having sufficient capacity to sustain node outages during peak traffic.
 </p><p>
-At some throughput level, some resource(s) in the system will be fully saturated,
+At some throughput level some resource(s) in the system will be fully saturated,
 and requests will be queued up causing latency to spike up,
 as requests are spending more time waiting for the saturated resource.
 </p><p>