Best Practices to Prevent OOM with IMap API Calls [CORE-153] (#1222)

Added new best practices for IMap bulk read operations and placed it under best-practices section. --------- Co-authored-by: Oliver Howell <[email protected]>
hazelcast · Aug 29, 2024 · ee6f62c · ee6f62c
1 parent 751da3a
commit ee6f62c
Show file tree

Hide file tree

Showing 4 changed files with 216 additions and 2 deletions.
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
@@ -49,14 +49,16 @@
 ** xref:ingest:overview.adoc[]
 ** xref:computing:distributed-computing.adoc[]
 ** xref:query:overview.adoc[]
-* Best Practices
+* xref:cluster-performance:best-practices.adoc[]
 ** xref:capacity-planning.adoc[]
 ** xref:cluster-performance:performance-tips.adoc[]
 ** xref:cluster-performance:back-pressure.adoc[]
 ** xref:cluster-performance:pipelining.adoc[]
 ** xref:cluster-performance:aws-deployments.adoc[]
 ** xref:cluster-performance:threading.adoc[]
 ** xref:cluster-performance:near-cache.adoc[]
+** xref:cluster-performance:imap-bulk-read-operations.adoc[]
+** xref:cluster-performance:data-affinity.adoc[]
 include::architecture:partial$nav.adoc[]
 * Member/Client Discovery
 ** xref:clusters:discovery-mechanisms.adoc[]

diff --git a/docs/modules/cluster-performance/pages/best-practices.adoc b/docs/modules/cluster-performance/pages/best-practices.adoc
@@ -1,4 +1,4 @@
-= Best Practices
+= Best practices
 :page-aliases: performance:data-affinity.adoc, performance:near-cache.adoc, performance:back-pressure.adoc, performance:cpu-thread-affinity.adoc, performance:best-practices.adoc, performance:pipelining.adoc, performance:slowoperationdetector.adoc, performance:threading-model.adoc
 
 Learn more about best practices and Hazelcast recommendations:
@@ -10,3 +10,5 @@ Learn more about best practices and Hazelcast recommendations:
 * xref:cluster-performance:aws-deployments.adoc[]
 * xref:cluster-performance:threading.adoc[]
 * xref:cluster-performance:near-cache.adoc[]
+* xref:cluster-performance:imap-bulk-read-operations.adoc[]
+* xref:cluster-performance:data-affinity.adoc[]
diff --git a/docs/modules/cluster-performance/pages/bulk-read-operations.adoc b/docs/modules/cluster-performance/pages/bulk-read-operations.adoc
@@ -0,0 +1,105 @@
+= Bulk read operations
+:description: Learn about best practices for IMap bulk read operations.
+
+[[bulk-read-operations]]
+
+To safeguard your cluster and application from becoming Out of Memory
+(OOM), follow these best practices and consider using the described 
+alternatives to bulk read operations.
+
+It's critical to avoid an Out of Memory Error (OOME) as its impact
+can be severe. Hazelcast strives to protect your data but
+an OOME can lead to a loss of cluster availability. This can result
+in increased operation latencies due to triggered migrations. From
+your application's perspective, an OOME could also cause a system
+crash. 
+
+Some specific IMap API calls are particularly risky in this regard. 
+Methods like `IMap#entrySet()` and `IMap#values()` can trigger an OOME, depending
+on the size of your map and the available memory on each member.
+To mitigate this risk, you should follow these best practices.
+
+== Plan capacity
+Proper capacity planning is crucial for providing
+sufficient system resources to the Hazelcast cluster. This
+involves estimating and validating the cluster's capacity
+(memory, CPU, disk, etc.) to determine the best practices
+that help the cluster achieve optimal performance.
+
+For more information, see xref:ROOT:capacity-planning.adoc[].
+
+== Limit query result size
+If you limit query result sizes, this can help prevent the adverse effects of bulk data reads.
+
+[source,java]
+----
+Set<Map.Entry<K, V>> entrySet();
+Set<Map.Entry<K, V>> entrySet(Predicate<K, V> predicate);
+----
+For more information, see xref:data-structures:preventing-out-of-memory.adoc#configuring-query-result-size[Configuring query result size].
+
+== Use Iterator
+The Iterator fetches data in batches, ensuring consistent heap
+utilization. The relevant methods in the IMap API include:
+
+[source,java]
+----
+Iterator<Entry<K, V>> iterator();
+Iterator<Entry<K, V>> iterator(int fetchSize);
+----
+This example shows how to use the Iterator API:
+[source,java]
+----
+IMap<Integer, Integer> testMap = instance.getMap("test");
+for (int i = 0; i < 1_000; i++) {
+    testMap.set(i, i);
+}
+
+// default fetch size is 100 element
+Iterator<Map.Entry<Integer, Integer>> iterator = testMap.iterator();
+while (iterator.hasNext()) {
+    Map.Entry<Integer, Integer> next = iterator.next();
+    System.err.println(next);
+}
+----
+
+
+== Use PartitionPredicate
+You can reduce memory overhead during bulk operations by filtering with *PartitionPredicate*.
+
+For more info, see xref:query:predicate-overview.adoc#filtering-with-partition-predicate[PartitionPredicate].
+
+== Use Entry Processor
+In some scenarios, reversing the traditional approach can be
+more effective. Instead of fetching all data to the local
+application for processing, you can send operations directly to
+the data. This _in-place_ processing method saves both time and
+resources; *Entry Processor* is an excellent tool for this purpose.
+
+For more info, see xref:data-structures:entry-processor.adoc[].
+
+== Use SQL service
+SQL was designed specifically for distributed computing use cases: SQL query results
+are paged, which makes SQL a good tool to fetch data in bulk.
+
+The following example shows a replacement for `IMap#values()`:
+
+[source,java]
+----
+String MAP_NAME = "...";
+HazelcastInstance client = HazelcastClient.newHazelcastClient();
+// Create a SQL mapping for IMap
+client.getSql().execute("CREATE MAPPING " + MAP_NAME + " (__key INT, this VARCHAR)");
+// Run query to replace IMap#values()
+SqlResult result = client.getSql().execute("SELECT this FROM " + MAP_NAME);
+// Process the data in paged fashion
+for (SqlRow row: result) {
+    /* do your processing */
+}
+----
+
+IMPORTANT: You must have Jet enabled to use the SQL service.
+
+For more info, see xref:query:sql-overview.adoc[].
+
+
diff --git a/docs/modules/cluster-performance/pages/imap-bulk-read-operations.adoc b/docs/modules/cluster-performance/pages/imap-bulk-read-operations.adoc
@@ -0,0 +1,105 @@
+= IMap bulk read operations
+:description: Learn about best practices for IMap bulk read operations.
+
+[[bulk-read-operations]]
+
+To safeguard your cluster and application from becoming Out of Memory
+(OOM), follow these best practices and consider using the described 
+alternatives to IMap bulk read operations.
+
+It's critical to avoid an Out of Memory Error (OOME) as its impact
+can be severe. Hazelcast strives to protect your data but
+an OOME can lead to a loss of cluster availability. This can result
+in increased operation latencies due to triggered migrations. From
+your application's perspective, an OOME could also cause a system
+crash. 
+
+Some specific IMap API calls are particularly risky in this regard. 
+Methods like `IMap#entrySet()` and `IMap#values()` can trigger an OOME, depending
+on the size of your map and the available memory on each member.
+To mitigate this risk, you should follow these best practices.
+
+== Plan capacity
+Proper capacity planning is crucial for providing
+sufficient system resources to the Hazelcast cluster. This
+involves estimating and validating the cluster's capacity
+(memory, CPU, disk, etc.) to determine the best practices
+that help the cluster achieve optimal performance.
+
+For more information, see xref:ROOT:capacity-planning.adoc[].
+
+== Limit query result size
+If you limit query result sizes, this can help prevent the adverse effects of bulk data reads.
+
+[source,java]
+----
+Set<Map.Entry<K, V>> entrySet();
+Set<Map.Entry<K, V>> entrySet(Predicate<K, V> predicate);
+----
+For more information, see xref:data-structures:preventing-out-of-memory.adoc#configuring-query-result-size[Configuring query result size].
+
+== Use Iterator
+The Iterator fetches data in batches, ensuring consistent heap
+utilization. The relevant methods in the IMap API include:
+
+[source,java]
+----
+Iterator<Entry<K, V>> iterator();
+Iterator<Entry<K, V>> iterator(int fetchSize);
+----
+This example shows how to use the Iterator API:
+[source,java]
+----
+IMap<Integer, Integer> testMap = instance.getMap("test");
+for (int i = 0; i < 1_000; i++) {
+    testMap.set(i, i);
+}
+
+// default fetch size is 100 element
+Iterator<Map.Entry<Integer, Integer>> iterator = testMap.iterator();
+while (iterator.hasNext()) {
+    Map.Entry<Integer, Integer> next = iterator.next();
+    System.err.println(next);
+}
+----
+
+
+== Use PartitionPredicate
+You can reduce memory overhead during bulk operations by filtering with *PartitionPredicate*.
+
+For more info, see xref:query:predicate-overview.adoc#filtering-with-partition-predicate[PartitionPredicate].
+
+== Use Entry Processor
+In some scenarios, reversing the traditional approach can be
+more effective. Instead of fetching all data to the local
+application for processing, you can send operations directly to
+the data. This _in-place_ processing method saves both time and
+resources; *Entry Processor* is an excellent tool for this purpose.
+
+For more info, see xref:data-structures:entry-processor.adoc[].
+
+== Use SQL service
+SQL was designed specifically for distributed computing use cases: SQL query results
+are paged, which makes SQL a good tool to fetch data in bulk.
+
+The following example shows a replacement for `IMap#values()`:
+
+[source,java]
+----
+String MAP_NAME = "...";
+HazelcastInstance client = HazelcastClient.newHazelcastClient();
+// Create a SQL mapping for IMap
+client.getSql().execute("CREATE MAPPING " + MAP_NAME + " (__key INT, this VARCHAR)");
+// Run query to replace IMap#values()
+SqlResult result = client.getSql().execute("SELECT this FROM " + MAP_NAME);
+// Process the data in paged fashion
+for (SqlRow row: result) {
+    /* do your processing */
+}
+----
+
+IMPORTANT: You must have Jet enabled to use the SQL service.
+
+For more info, see xref:query:sql-overview.adoc[].
+
+