feat(batch): batch group chunks for dml #13872

chenzl25 · 2023-12-08T06:35:57Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

For the sysbench oltp_insert workload, concurrent inserts will generate lots of small chunks. We can group them into a larger chunk to improve the performance.
Performance increased about 10% (feat(batch): batch group chunks for dml #13872 (comment))

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added test labels as necessary. See details.
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
All checks passed in ./risedev check (or alias, ./risedev c)
My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)

My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

chenzl25 · 2023-12-08T06:39:48Z

Will test performance in cloud before get merged

chenzl25 · 2023-12-11T03:58:51Z

Sysbench oltp_insert workload shows that this PR can improve the oltp insert throughput about 10% utilizing approximately 16% less CPU resources.
this PR: 4818 QPS and CPU (of compute node) 84%
Main: 4346 QPS and CPU (of compute node) 100%

codecov · 2023-12-11T04:48:39Z

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (d8dd8f4) 68.04% compared to head (c632a1d) 67.96%.

Files	Patch %	Lines
src/stream/src/from_proto/dml.rs	0.00%	2 Missing ⚠️
src/stream/src/executor/dml.rs	91.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #13872      +/-   ##
==========================================
- Coverage   68.04%   67.96%   -0.08%     
==========================================
  Files        1536     1536              
  Lines      265364   265375      +11     
==========================================
- Hits       180557   180368     -189     
- Misses      84807    85007     +200

Flag	Coverage Δ
rust	`67.96% <78.57%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BugenZhao · 2023-12-11T04:59:04Z

src/stream/src/executor/dml.rs

+                                // txn buffer isn't small, so yield.
+                                for chunk in txn_buffer.vec.drain(..) {
+                                    yield Message::Chunk(chunk);
+                                }


Should we concatenate them before yielding?

Data from the txn_buffer should have already tried its best to concat its chunk, so I think we don't need to do here.

~~What about small chunks from different transactions?~~ Oh, chunks in the same buffer always come from the same transaction.

src/stream/src/executor/dml.rs

BugenZhao · 2023-12-11T05:11:13Z

src/stream/src/executor/dml.rs

+                                if txn_buffer_cardinality < batch_group_cardinality {
+                                    mem::swap(&mut txn_buffer.vec, &mut batch_group);
+                                }
+                                let concat_chunk = StreamChunk::concat(txn_buffer.vec);


If there're remaining spaces in the "txn buffer", should we also yield some chunks from the "batch group" here?

Can we utilize StreamChunkBuilder for doing this all stuff?

If we use that space I am afraid it would make small transaction data interleaved with each other.

That's true. Will this lead to problems?

Depends on the downstream table conflicts handling logic. I think the default policy is the first writer win, so it is possible to cause some undetermined behavior under concurrent writes. Prefer to keep this invariant.

I see. However if I understand correctly, records written by transactions from different sessions are already interleaved by the shuffle in the downstream. 🤔

Yes, the hash shuffle in the downstream is unavoidable, but I don't want to introduce another variable.

Co-authored-by: Bugen Zhao <[email protected]>

fuyufjh

The rest LGTM

fuyufjh · 2023-12-13T05:32:38Z

src/stream/src/executor/dml.rs

+                                // txn buffer is small and batch group has no space, so yield the large one.
+                                if txn_buffer_cardinality < batch_group_cardinality {
+                                    mem::swap(&mut txn_buffer.vec, &mut batch_group);
+                                }


IIUC the inserted records might be reordered here, which seems to be inconsistent with this discussion? #13872 (comment)

Current implementation at least guarantees that data from the same transaction would be sent to downstream at the same time. Otherwise, part of data from the same transaction would be sent to downstream at first and the other part is kept in the batch_group.

Ok, sounds acceptable to me

batch group chunks for dml

84c089d

github-actions bot added the type/feature label Dec 8, 2023

Merge branch 'main' into dylan/batch_group_chunks_for_dml

b338e33

chenzl25 requested review from BugenZhao and fuyufjh December 8, 2023 06:40

chenzl25 and others added 2 commits December 8, 2023 15:24

fix tests

491250e

Merge branch 'main' into dylan/batch_group_chunks_for_dml

dfac6a3

fix test

3614cd7

BugenZhao reviewed Dec 11, 2023

View reviewed changes

chenzl25 and others added 3 commits December 11, 2023 15:44

Update src/stream/src/executor/dml.rs

3c007a1

Co-authored-by: Bugen Zhao <[email protected]>

Update src/stream/src/executor/dml.rs

f550a76

Co-authored-by: Bugen Zhao <[email protected]>

use StreamChunkBuilder

f0f7797

chenzl25 requested a review from BugenZhao December 11, 2023 08:21

fuyufjh approved these changes Dec 13, 2023

View reviewed changes

BugenZhao approved these changes Dec 13, 2023

View reviewed changes

chenzl25 added this pull request to the merge queue Dec 13, 2023

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 13, 2023

chenzl25 added 2 commits December 13, 2023 20:18

Merge branch 'main' into dylan/batch_group_chunks_for_dml

8ca03f6

Merge branch 'main' into dylan/batch_group_chunks_for_dml

c632a1d

chenzl25 enabled auto-merge December 14, 2023 07:13

chenzl25 added this pull request to the merge queue Dec 14, 2023

Merged via the queue into main with commit 2b2bcc9 Dec 14, 2023
6 of 7 checks passed

chenzl25 deleted the dylan/batch_group_chunks_for_dml branch December 14, 2023 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): batch group chunks for dml #13872

feat(batch): batch group chunks for dml #13872

chenzl25 commented Dec 8, 2023 •

edited by fuyufjh

Loading

chenzl25 commented Dec 8, 2023

chenzl25 commented Dec 11, 2023

codecov bot commented Dec 11, 2023 •

edited

Loading

BugenZhao Dec 11, 2023

chenzl25 Dec 11, 2023

BugenZhao Dec 12, 2023 •

edited

Loading

BugenZhao Dec 11, 2023

chenzl25 Dec 11, 2023

BugenZhao Dec 12, 2023

chenzl25 Dec 12, 2023

BugenZhao Dec 13, 2023

chenzl25 Dec 13, 2023

fuyufjh left a comment

fuyufjh Dec 13, 2023

chenzl25 Dec 13, 2023

fuyufjh Dec 13, 2023

feat(batch): batch group chunks for dml #13872

feat(batch): batch group chunks for dml #13872

Conversation

chenzl25 commented Dec 8, 2023 • edited by fuyufjh Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

chenzl25 commented Dec 8, 2023

chenzl25 commented Dec 11, 2023

codecov bot commented Dec 11, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BugenZhao Dec 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fuyufjh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenzl25 commented Dec 8, 2023 •

edited by fuyufjh

Loading

codecov bot commented Dec 11, 2023 •

edited

Loading

BugenZhao Dec 12, 2023 •

edited

Loading