fix: Using BatchStatement instead of execute_concurrent_with_args #163

EXPEbdodla · 2025-01-06T13:47:46Z

What this PR does / why we need it:

fix: Using BatchStatement instead of execute_concurrent_with_args

Which issue(s) this PR fixes:

execute_concurrent_with_args taking longer to inserts records. Using BatchStatement to write all records specific to an entity_key as a batch. This should avoid the network time. If we group different entity_keys in a single batch, it will run as BatchType.LOGGED mode. Based on the docs, it has performance impact. So using BatchType.UNLOGGED mode and batching on partition key.
Concurrency is managed using Queues
Setting TTL at row level (or during insert) instead of Table level
Using rate limiting to control the writes. Noticed an impact to read performance when the materializations run. With this we can control the write throughput and reduce the impact on reads
Removed Expedia specific spark_kafka_processor.py code because we are upgrading materialization process to Spark 3.5 and also noticed an increased write latency for streaming ingestion tasks using mapInPandas()
Using feature view tags with prefix online_store_ to override online store configurations.

Misc

zabarn · 2025-01-07T16:49:52Z

sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py

+        futures = []
+        for batch in batches:
+            futures.append(session.execute_async(batch))
+            if len(futures) >= config.online_store.write_concurrency:
+                # Raises exception if at least one of the batch fails
+                try:
+                    for future in futures:
+                        future.result()
+                    futures = []
+                except Exception as exc:
+                    logger.error(f"Error writing a batch: {exc}")
+                    print(f"Error writing a batch: {exc}")
+                    raise Exception("Error writing a batch") from exc
+
+        if len(futures) > 0:
+            try:
+                for future in futures:
+                    future.result()
+                futures = []
+            except Exception as exc:
+                logger.error(f"Error writing a batch: {exc}")
+                print(f"Error writing a batch: {exc}")
+                raise Exception("Error writing a batch") from exc
+
+        # execute_concurrent_with_args(
+        #     session,
+        #     insert_cql,
+        #     rows,
+        #     concurrency=config.online_store.write_concurrency,
+        # )


This no longer allows for write_concurrency to be set in the feature_store.yaml then, right?

Still using that. Refer Line feast-dev#508

ah, missed that. thanks

zabarn · 2025-02-11T18:06:59Z

sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py

-                # this happens N-1 times, will be corrected outside:
-                if progress:
-                    progress(1)
+        ttl = online_store_config.key_ttl_seconds or 0


How are we setting the feature view level TTL now? Or are we only wanting to do feature_store.yaml TTL?

Moving that logic to materialization process. So we can seamlessly sync with open source feast.

OscarMiranda0 · 2025-02-11T19:54:41Z

sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py

+        write_concurrency = online_store_config.write_concurrency
+        write_rate_limit = online_store_config.write_rate_limit
+        concurrent_queue: Queue = Queue(maxsize=write_concurrency)
+        rate_limiter = SlidingWindowRateLimiter(write_rate_limit, 1)


Ideally, I think we should allow the spark materialization engine to manage the rate limiter. This implementation is a local rate limiter, where the true rate limit is N*write_rate_limit, where N is the number of workers. The objects in the _mapByPartition method are local to the executor as well, so we can't just add the rate limiter there. We've seen concurrent writes be an issue for other online stores as well, so while this is okay for now, I think in the future we should move towards having a centrally managed rate manager (probably the driver).

I agree with you. This should be generic. I'm trying to fix the problem for ScyllaDB at the moment.
For now I mentioned in the config description, rate limiter is per executor for spark materialization engine. Making it generic to all online stores, I'll leave it for future enhancement.

OscarMiranda0 · 2025-02-11T20:46:20Z

sdk/python/feast/infra/online_stores/contrib/cassandra_online_store/cassandra_online_store.py

+        for entity_key, values, timestamp, created_ts in data:
+            batch = BatchStatement(batch_type=BatchType.UNLOGGED)
+            entity_key_bin = serialize_entity_key(
+                entity_key,
+                entity_key_serialization_version=config.entity_key_serialization_version,
+            ).hex()
+            for feature_name, val in values.items():
+                params: Tuple[str, bytes, str, datetime] = (
+                    feature_name,
+                    val.SerializeToString(),
+                    entity_key_bin,
+                    timestamp,
+                )
+                batch.add(insert_cql, params)
+
+            # Wait until the rate limiter allows
+            if not rate_limiter.acquire():
+                while not rate_limiter.acquire():
+                    time.sleep(0.001)


I see we're rate limiting the number of batches per second. If we have a specific entity key with a large number of feature values, we could end up with some batches that have very large number of insert statements in them which will cost more to process than a less popular entity key. My assumption would be that we should rate limit the number of insert_cql statements instead, but maybe im making an incorrect assumption somewhere?

Limit on number of inserts per batch is 65,536. For now batch contains inserts specific to one entity key so it acts only on one partition. Ideally all entity keys will have same number of features. I don't see this as a risk with the default higher limit per batch.

EXPEbdodla force-pushed the use_batch branch from 0ea556c to 4683447 Compare January 7, 2025 03:10

zabarn reviewed Jan 7, 2025

View reviewed changes

EXPEbdodla force-pushed the use_batch branch 3 times, most recently from 01e6130 to 075c4c0 Compare January 15, 2025 04:50

EXPEbdodla force-pushed the use_batch branch 2 times, most recently from a0d66f2 to c535a17 Compare February 6, 2025 23:36

EXPEbdodla changed the title ~~fix: Trying BatchStatement instead of execute_concurrent_with_args~~ fix: Using BatchStatement instead of execute_concurrent_with_args Feb 6, 2025

EXPEbdodla force-pushed the use_batch branch from 6f353b3 to d8e4ba4 Compare February 7, 2025 23:51

zabarn reviewed Feb 11, 2025

View reviewed changes

Bhargav Dodla added 3 commits February 11, 2025 12:27

fix: Using BatchStatement instead of execute_concurrent_with_args

de42776

refactor: moving online store override configuration to materialization

cea7a19

fix: reverted the logging change

b6ff0f9

EXPEbdodla force-pushed the use_batch branch from fce1ea4 to b6ff0f9 Compare February 11, 2025 20:28

OscarMiranda0 reviewed Feb 11, 2025

View reviewed changes

OscarMiranda0 approved these changes Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Using BatchStatement instead of execute_concurrent_with_args #163

fix: Using BatchStatement instead of execute_concurrent_with_args #163

EXPEbdodla commented Jan 6, 2025 •

edited

Loading

zabarn Jan 7, 2025

EXPEbdodla Jan 7, 2025

zabarn Jan 7, 2025

zabarn Feb 11, 2025 •

edited

Loading

EXPEbdodla Feb 11, 2025

OscarMiranda0 Feb 11, 2025 •

edited

Loading

EXPEbdodla Feb 11, 2025 •

edited

Loading

OscarMiranda0 Feb 11, 2025

EXPEbdodla Feb 11, 2025

fix: Using BatchStatement instead of execute_concurrent_with_args #163

Are you sure you want to change the base?

fix: Using BatchStatement instead of execute_concurrent_with_args #163

Conversation

EXPEbdodla commented Jan 6, 2025 • edited Loading

What this PR does / why we need it:

Which issue(s) this PR fixes:

Misc

zabarn Jan 7, 2025

Choose a reason for hiding this comment

EXPEbdodla Jan 7, 2025

Choose a reason for hiding this comment

zabarn Jan 7, 2025

Choose a reason for hiding this comment

zabarn Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

EXPEbdodla Feb 11, 2025

Choose a reason for hiding this comment

OscarMiranda0 Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

EXPEbdodla Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

OscarMiranda0 Feb 11, 2025

Choose a reason for hiding this comment

EXPEbdodla Feb 11, 2025

Choose a reason for hiding this comment

EXPEbdodla commented Jan 6, 2025 •

edited

Loading

zabarn Feb 11, 2025 •

edited

Loading

OscarMiranda0 Feb 11, 2025 •

edited

Loading

EXPEbdodla Feb 11, 2025 •

edited

Loading