Performance impact of spilling out to S3 #6667

ryzhyk · 2022-11-30T06:28:00Z

ryzhyk
Nov 30, 2022

I have another performance-related question. I understand that Hummock serves several related purposes:

Checkpointing: in case of a crash, RisingWave can roll back to the most recent checkpoint in S3.
Scale-out: new executors can fetch all the state they need from Hummock.
Support workloads that don't fit in RAM by spilling the state out to S3.

I am curious about the performance implication of 3. How much performance impact should I expect when the LSM trees maintained by the query engine no longer fit in memory? E.g., what happens when only 20% of the state is cached in memory with 80% spilled out to S3? I realize that the answer is likely workload-dependent, but it would be interesting to see some concrete data points, e.g., "the performance of query X drops by Y% when its working set size exceeds Z". Basically, any pointers that would help me understand the performance of RisingWave running with most of the state stored in persistent store would be very helpful.

hzxa21 · 2022-11-30T13:31:28Z

hzxa21
Nov 30, 2022
Maintainer

Thanks for your interest in RisingWave! This is a great question.

The performance impact is indeed workload-dependent. For example, when playing around a stream join query w/o window, with random generated data and no optimization, we already observed that the performance degrades over time as we have more rows accumulated in the join states per join key. The degradation comes not only from the impact of increasing storage read latency but also from the join cardinality increase.

As for quantifying the impact of remote S3 I/O, we don't have the performance number to disclose at this moment but we are working on publishing a performance dashboard for TPC-H and NEXmark workload. We can provide the results under different settings (e.g. cache size) for specific workloads to help people understand RisingWave performance. Stay tuned.

1 reply

ryzhyk Nov 30, 2022
Author

Thank you, @hzxa21 , and looking forward to the official benchmark results!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RisingWave Labs

Performance impact of spilling out to S3 #6667

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

RisingWave Labs

Performance impact of spilling out to S3 #6667

ryzhyk Nov 30, 2022

Replies: 1 comment · 1 reply

hzxa21 Nov 30, 2022 Maintainer

ryzhyk Nov 30, 2022 Author

ryzhyk
Nov 30, 2022

Replies: 1 comment 1 reply

hzxa21
Nov 30, 2022
Maintainer

ryzhyk Nov 30, 2022
Author