Skip to content

Commit

Permalink
Best Practices to Prevent OOM with IMap API Calls [CORE-153] (#1222)
Browse files Browse the repository at this point in the history
Added new best practices for IMap bulk read operations and placed it
under best-practices section.

---------

Co-authored-by: Oliver Howell <[email protected]>
  • Loading branch information
ahmetmircik and oliverhowell authored Aug 29, 2024
1 parent 751da3a commit ee6f62c
Show file tree
Hide file tree
Showing 4 changed files with 216 additions and 2 deletions.
4 changes: 3 additions & 1 deletion docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,14 +49,16 @@
** xref:ingest:overview.adoc[]
** xref:computing:distributed-computing.adoc[]
** xref:query:overview.adoc[]
* Best Practices
* xref:cluster-performance:best-practices.adoc[]
** xref:capacity-planning.adoc[]
** xref:cluster-performance:performance-tips.adoc[]
** xref:cluster-performance:back-pressure.adoc[]
** xref:cluster-performance:pipelining.adoc[]
** xref:cluster-performance:aws-deployments.adoc[]
** xref:cluster-performance:threading.adoc[]
** xref:cluster-performance:near-cache.adoc[]
** xref:cluster-performance:imap-bulk-read-operations.adoc[]
** xref:cluster-performance:data-affinity.adoc[]
include::architecture:partial$nav.adoc[]
* Member/Client Discovery
** xref:clusters:discovery-mechanisms.adoc[]
Expand Down
4 changes: 3 additions & 1 deletion docs/modules/cluster-performance/pages/best-practices.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Best Practices
= Best practices
:page-aliases: performance:data-affinity.adoc, performance:near-cache.adoc, performance:back-pressure.adoc, performance:cpu-thread-affinity.adoc, performance:best-practices.adoc, performance:pipelining.adoc, performance:slowoperationdetector.adoc, performance:threading-model.adoc

Learn more about best practices and Hazelcast recommendations:
Expand All @@ -10,3 +10,5 @@ Learn more about best practices and Hazelcast recommendations:
* xref:cluster-performance:aws-deployments.adoc[]
* xref:cluster-performance:threading.adoc[]
* xref:cluster-performance:near-cache.adoc[]
* xref:cluster-performance:imap-bulk-read-operations.adoc[]
* xref:cluster-performance:data-affinity.adoc[]
105 changes: 105 additions & 0 deletions docs/modules/cluster-performance/pages/bulk-read-operations.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
= Bulk read operations
:description: Learn about best practices for IMap bulk read operations.

[[bulk-read-operations]]

To safeguard your cluster and application from becoming Out of Memory
(OOM), follow these best practices and consider using the described
alternatives to bulk read operations.

It's critical to avoid an Out of Memory Error (OOME) as its impact
can be severe. Hazelcast strives to protect your data but
an OOME can lead to a loss of cluster availability. This can result
in increased operation latencies due to triggered migrations. From
your application's perspective, an OOME could also cause a system
crash.

Some specific IMap API calls are particularly risky in this regard.
Methods like `IMap#entrySet()` and `IMap#values()` can trigger an OOME, depending
on the size of your map and the available memory on each member.
To mitigate this risk, you should follow these best practices.

== Plan capacity
Proper capacity planning is crucial for providing
sufficient system resources to the Hazelcast cluster. This
involves estimating and validating the cluster's capacity
(memory, CPU, disk, etc.) to determine the best practices
that help the cluster achieve optimal performance.

For more information, see xref:ROOT:capacity-planning.adoc[].

== Limit query result size
If you limit query result sizes, this can help prevent the adverse effects of bulk data reads.

[source,java]
----
Set<Map.Entry<K, V>> entrySet();
Set<Map.Entry<K, V>> entrySet(Predicate<K, V> predicate);
----
For more information, see xref:data-structures:preventing-out-of-memory.adoc#configuring-query-result-size[Configuring query result size].

== Use Iterator
The Iterator fetches data in batches, ensuring consistent heap
utilization. The relevant methods in the IMap API include:

[source,java]
----
Iterator<Entry<K, V>> iterator();
Iterator<Entry<K, V>> iterator(int fetchSize);
----
This example shows how to use the Iterator API:
[source,java]
----
IMap<Integer, Integer> testMap = instance.getMap("test");
for (int i = 0; i < 1_000; i++) {
testMap.set(i, i);
}
// default fetch size is 100 element
Iterator<Map.Entry<Integer, Integer>> iterator = testMap.iterator();
while (iterator.hasNext()) {
Map.Entry<Integer, Integer> next = iterator.next();
System.err.println(next);
}
----


== Use PartitionPredicate
You can reduce memory overhead during bulk operations by filtering with *PartitionPredicate*.

For more info, see xref:query:predicate-overview.adoc#filtering-with-partition-predicate[PartitionPredicate].

== Use Entry Processor
In some scenarios, reversing the traditional approach can be
more effective. Instead of fetching all data to the local
application for processing, you can send operations directly to
the data. This _in-place_ processing method saves both time and
resources; *Entry Processor* is an excellent tool for this purpose.

For more info, see xref:data-structures:entry-processor.adoc[].

== Use SQL service
SQL was designed specifically for distributed computing use cases: SQL query results
are paged, which makes SQL a good tool to fetch data in bulk.

The following example shows a replacement for `IMap#values()`:

[source,java]
----
String MAP_NAME = "...";
HazelcastInstance client = HazelcastClient.newHazelcastClient();
// Create a SQL mapping for IMap
client.getSql().execute("CREATE MAPPING " + MAP_NAME + " (__key INT, this VARCHAR)");
// Run query to replace IMap#values()
SqlResult result = client.getSql().execute("SELECT this FROM " + MAP_NAME);
// Process the data in paged fashion
for (SqlRow row: result) {
/* do your processing */
}
----

IMPORTANT: You must have Jet enabled to use the SQL service.

For more info, see xref:query:sql-overview.adoc[].


105 changes: 105 additions & 0 deletions docs/modules/cluster-performance/pages/imap-bulk-read-operations.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
= IMap bulk read operations
:description: Learn about best practices for IMap bulk read operations.

[[bulk-read-operations]]

To safeguard your cluster and application from becoming Out of Memory
(OOM), follow these best practices and consider using the described
alternatives to IMap bulk read operations.

It's critical to avoid an Out of Memory Error (OOME) as its impact
can be severe. Hazelcast strives to protect your data but
an OOME can lead to a loss of cluster availability. This can result
in increased operation latencies due to triggered migrations. From
your application's perspective, an OOME could also cause a system
crash.

Some specific IMap API calls are particularly risky in this regard.
Methods like `IMap#entrySet()` and `IMap#values()` can trigger an OOME, depending
on the size of your map and the available memory on each member.
To mitigate this risk, you should follow these best practices.

== Plan capacity
Proper capacity planning is crucial for providing
sufficient system resources to the Hazelcast cluster. This
involves estimating and validating the cluster's capacity
(memory, CPU, disk, etc.) to determine the best practices
that help the cluster achieve optimal performance.

For more information, see xref:ROOT:capacity-planning.adoc[].

== Limit query result size
If you limit query result sizes, this can help prevent the adverse effects of bulk data reads.

[source,java]
----
Set<Map.Entry<K, V>> entrySet();
Set<Map.Entry<K, V>> entrySet(Predicate<K, V> predicate);
----
For more information, see xref:data-structures:preventing-out-of-memory.adoc#configuring-query-result-size[Configuring query result size].

== Use Iterator
The Iterator fetches data in batches, ensuring consistent heap
utilization. The relevant methods in the IMap API include:

[source,java]
----
Iterator<Entry<K, V>> iterator();
Iterator<Entry<K, V>> iterator(int fetchSize);
----
This example shows how to use the Iterator API:
[source,java]
----
IMap<Integer, Integer> testMap = instance.getMap("test");
for (int i = 0; i < 1_000; i++) {
testMap.set(i, i);
}
// default fetch size is 100 element
Iterator<Map.Entry<Integer, Integer>> iterator = testMap.iterator();
while (iterator.hasNext()) {
Map.Entry<Integer, Integer> next = iterator.next();
System.err.println(next);
}
----


== Use PartitionPredicate
You can reduce memory overhead during bulk operations by filtering with *PartitionPredicate*.

For more info, see xref:query:predicate-overview.adoc#filtering-with-partition-predicate[PartitionPredicate].

== Use Entry Processor
In some scenarios, reversing the traditional approach can be
more effective. Instead of fetching all data to the local
application for processing, you can send operations directly to
the data. This _in-place_ processing method saves both time and
resources; *Entry Processor* is an excellent tool for this purpose.

For more info, see xref:data-structures:entry-processor.adoc[].

== Use SQL service
SQL was designed specifically for distributed computing use cases: SQL query results
are paged, which makes SQL a good tool to fetch data in bulk.

The following example shows a replacement for `IMap#values()`:

[source,java]
----
String MAP_NAME = "...";
HazelcastInstance client = HazelcastClient.newHazelcastClient();
// Create a SQL mapping for IMap
client.getSql().execute("CREATE MAPPING " + MAP_NAME + " (__key INT, this VARCHAR)");
// Run query to replace IMap#values()
SqlResult result = client.getSql().execute("SELECT this FROM " + MAP_NAME);
// Process the data in paged fashion
for (SqlRow row: result) {
/* do your processing */
}
----

IMPORTANT: You must have Jet enabled to use the SQL service.

For more info, see xref:query:sql-overview.adoc[].


0 comments on commit ee6f62c

Please sign in to comment.