[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency #7335

hawk9821 · 2024-08-07T08:28:21Z

Purpose of this pull request

Support dynamic bucket splitting improves Paimon writing efficiency

Does this PR introduce any user-facing change?

no

How was this patch tested?

e2e: PaimonSinkDynamicBucketIT
UT: PaimonBucketAssignerTest#bucketAssigner
e2e case: PaimonSinkDynamicBucketIT#testPaimonBucketCountOnSparkAndFlink ，because spark and Flink engine can not auto create paimon table on worker node in local file, this e2e case work on local hdfs environment.

Check list

If any new Jar binary package adding in your PR, please add License Notice according
New License Guide
If necessary, please update the documentation to describe the new feature. https://github.com/apache/seatunnel/tree/dev/docs
If you are contributing the connector code, please check that the following files are updated:
1. Update change log that in connector document. For more details you can refer to connector-v2
2. Update plugin-mapping.properties and add new connector information in it
3. Update the pom file of seatunnel-dist
Update the release-note.

Hisoka-X · 2024-08-07T08:51:46Z

cc @dailai and @TaoZex

docs/zh/connector-v2/sink/Paimon.md

...ink-13/src/main/java/org/apache/seatunnel/translation/flink/sink/FlinkSinkWriterContext.java

...common/src/main/java/org/apache/seatunnel/translation/flink/sink/FlinkSinkWriterContext.java

dailai · 2024-08-21T08:22:48Z

Please retrigger the ci.

.../java/org/apache/seatunnel/connectors/seatunnel/paimon/sink/bucket/PaimonBucketAssigner.java

dailai · 2024-08-26T00:56:30Z

Thinks @hawk9821 . Good job. I think your e2e case needs to be added to the case of multi-parallelism, the current case is all single parallelism. In this way, we can effectively verify whether the dynamic bucketing will change depending on the degree of parallelism of the job. Also, I think you should check the bucket count in every case instead of making a separate case. In addition, each of your cases should verify that the dynamic-bucket.target-row-num argument works as expected.

docs/en/connector-v2/sink/Paimon.md

...n-e2e/src/test/java/org/apache/seatunnel/e2e/connector/paimon/PaimonSinkDynamicBucketIT.java

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/SinkWriter.java

docs/en/connector-v2/sink/Paimon.md

* [Improve] Update snapshot version to 2.3.8 * [Improve] Update snapshot version to 2.3.8

…mon writing efficiency

[Feature][CONNECTORS-V2-Paimon] spark task parallelism

[Feature][CONNECTORS-V2-Paimon] update doc [Feature][CONNECTORS-V2-Paimon] write to dynamic bucket table , spark flink e2e

…mon writing efficiency

Hisoka-X

LGTM if ci passes. Thanks @hawk9821

github-actions bot added document flink Zeta connectors-v2 e2e paimon api labels Aug 7, 2024

Hisoka-X changed the title ~~[Feature][CONNECTORS-V2-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency~~ [Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency Aug 7, 2024

Hisoka-X reviewed Aug 7, 2024

View reviewed changes

docs/zh/connector-v2/sink/Paimon.md Outdated Show resolved Hide resolved

dailai reviewed Aug 8, 2024

View reviewed changes

...ink-13/src/main/java/org/apache/seatunnel/translation/flink/sink/FlinkSinkWriterContext.java Show resolved Hide resolved

...common/src/main/java/org/apache/seatunnel/translation/flink/sink/FlinkSinkWriterContext.java Show resolved Hide resolved

github-actions bot removed the flink label Aug 8, 2024

hawk9821 force-pushed the paimon_dynamic_bucket branch 2 times, most recently from 1b445f0 to a5d18ee Compare August 21, 2024 00:44

github-actions bot added the dependencies Pull requests that update a dependency file label Aug 21, 2024

dailai reviewed Aug 21, 2024

View reviewed changes

.../java/org/apache/seatunnel/connectors/seatunnel/paimon/sink/bucket/PaimonBucketAssigner.java Show resolved Hide resolved

hawk9821 force-pushed the paimon_dynamic_bucket branch 5 times, most recently from 50764df to c93f7b8 Compare August 23, 2024 01:11

github-actions bot removed the dependencies Pull requests that update a dependency file label Aug 23, 2024

hawk9821 requested review from dailai and Hisoka-X August 24, 2024 10:11

github-actions bot added dependencies Pull requests that update a dependency file CI&CD core SeaTunnel core module and removed paimon labels Aug 29, 2024

github-actions bot added the reviewed label Sep 2, 2024

Hisoka-X reviewed Sep 2, 2024

View reviewed changes

...n-e2e/src/test/java/org/apache/seatunnel/e2e/connector/paimon/PaimonSinkDynamicBucketIT.java Show resolved Hide resolved

hawk9821 force-pushed the paimon_dynamic_bucket branch 2 times, most recently from 974e481 to af318d5 Compare September 3, 2024 05:51

Hisoka-X reviewed Sep 4, 2024

View reviewed changes

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/SinkWriter.java Show resolved Hide resolved

Hisoka-X self-assigned this Sep 4, 2024

hawk9821 force-pushed the paimon_dynamic_bucket branch from af318d5 to 3874aae Compare September 12, 2024 15:53

github-actions bot added core SeaTunnel core module flink and removed paimon labels Sep 12, 2024

hawk9821 force-pushed the paimon_dynamic_bucket branch 3 times, most recently from ad8281b to e0cd7d8 Compare September 12, 2024 17:11

Hisoka-X reviewed Sep 13, 2024

View reviewed changes

docs/en/connector-v2/sink/Paimon.md Outdated Show resolved Hide resolved

hawk9821 force-pushed the paimon_dynamic_bucket branch 4 times, most recently from dc92c7b to a1b351d Compare September 18, 2024 08:39

wuchunfu and others added 5 commits September 20, 2024 10:08

[Improve] Update snapshot version to 2.3.8 (apache#7435)

5982f70

* [Improve] Update snapshot version to 2.3.8 * [Improve] Update snapshot version to 2.3.8

[Feature][CONNECTORS-V2-Paimon] Dynamic bucket splitting improves Pai…

c5f342b

…mon writing efficiency

[Feature][CONNECTORS-V2-Paimon] spark task parallelism

07bcc4f

[Feature][CONNECTORS-V2-Paimon] spark task parallelism

[Feature][CONNECTORS-V2-Paimon] flink task parallelism

c61ab39

[Feature][CONNECTORS-V2-Paimon] update doc [Feature][CONNECTORS-V2-Paimon] write to dynamic bucket table , spark flink e2e

[Feature][CONNECTORS-V2-Paimon] Dynamic bucket splitting improves Pai…

9a7dab1

…mon writing efficiency

hawk9821 force-pushed the paimon_dynamic_bucket branch from b34a78a to 9a7dab1 Compare September 20, 2024 02:18

Hisoka-X approved these changes Sep 20, 2024

View reviewed changes

github-actions bot added the approved label Sep 20, 2024

hawk9821 requested a review from dailai September 20, 2024 05:32

hailin0 approved these changes Sep 20, 2024

View reviewed changes

hailin0 merged commit bc0326c into apache:dev Sep 20, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency #7335

[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency #7335

hawk9821 commented Aug 7, 2024 •

edited

Loading

Hisoka-X commented Aug 7, 2024

dailai commented Aug 21, 2024

dailai commented Aug 26, 2024 •

edited

Loading

Hisoka-X left a comment

[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency #7335

[Feature][Connector-Paimon] Support dynamic bucket splitting improves Paimon writing efficiency #7335

Conversation

hawk9821 commented Aug 7, 2024 • edited Loading

Purpose of this pull request

Does this PR introduce any user-facing change?

How was this patch tested?

Check list

Hisoka-X commented Aug 7, 2024

dailai commented Aug 21, 2024

dailai commented Aug 26, 2024 • edited Loading

Hisoka-X left a comment

Choose a reason for hiding this comment

hawk9821 commented Aug 7, 2024 •

edited

Loading

dailai commented Aug 26, 2024 •

edited

Loading