Replies: 1 comment 1 reply
-
Thanks for your interest in RisingWave! This is a great question. The performance impact is indeed workload-dependent. For example, when playing around a stream join query w/o window, with random generated data and no optimization, we already observed that the performance degrades over time as we have more rows accumulated in the join states per join key. The degradation comes not only from the impact of increasing storage read latency but also from the join cardinality increase. As for quantifying the impact of remote S3 I/O, we don't have the performance number to disclose at this moment but we are working on publishing a performance dashboard for TPC-H and NEXmark workload. We can provide the results under different settings (e.g. cache size) for specific workloads to help people understand RisingWave performance. Stay tuned. |
Beta Was this translation helpful? Give feedback.
-
I have another performance-related question. I understand that Hummock serves several related purposes:
I am curious about the performance implication of 3. How much performance impact should I expect when the LSM trees maintained by the query engine no longer fit in memory? E.g., what happens when only 20% of the state is cached in memory with 80% spilled out to S3? I realize that the answer is likely workload-dependent, but it would be interesting to see some concrete data points, e.g., "the performance of query X drops by Y% when its working set size exceeds Z". Basically, any pointers that would help me understand the performance of RisingWave running with most of the state stored in persistent store would be very helpful.
Beta Was this translation helpful? Give feedback.
All reactions