-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8920] Optimized SerDe costs of Flink write, simple bucket and non bucket cases #12796
base: master
Are you sure you want to change the base?
Conversation
…nk write, simple bucket index
33cc3b7
to
68028fc
Compare
…RecordTypeInfo` and `HoodieFlinkRecordSerializer` for `HoodieFlinkRecord`
68028fc
to
9377d36
Compare
.booleanType() | ||
.defaultValue(false) | ||
.withDescription("Optimized Flink write into Hudi table, which uses customized serialization/deserialization. " | ||
+ "Note, that only SIMPLE BUCKET index is supported for now."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR's title says "simple bucket and non bucket cases"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed it. Thanks! Fixed in 5a81536.
I've added IT tests in 6ac8c9f, and also manually checked restore from Flink checkpoint for non bucket case. Restoring from checkpoint was successful. |
….mode` in non bucket case
466c013
to
5a81536
Compare
This draft PR contains exactly the same commits as here. But I've added one extra, which contains |
Change Logs
Changes in Flink stream write into Hudi table with simple bucket index corresponding to #12697.
Benchmark description
Lineitem
table from TPC-H benchmark was used. 60 mln rows, from which 20 mln rows are unique.Perfomance estimation results
Flink operators
Current with Kryo:
![0 operators - 1 reference](https://private-user-images.githubusercontent.com/67073364/410488358-dfe5c2f7-f625-42b5-a350-5c5e7685549e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMDAxMzQsIm5iZiI6MTczOTE5OTgzNCwicGF0aCI6Ii82NzA3MzM2NC80MTA0ODgzNTgtZGZlNWMyZjctZjYyNS00MmI1LWEzNTAtNWM1ZTc2ODU1NDllLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDE1MDM1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFhZDAyOGMwMjZhY2M1NDc1Y2FiZTk2MGI5ZTUxMGYzYmZmMDA4ZmZkZWMzYzhmYzI1ZDc5NmE3YjNkZDAxMmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.mJZHnzZ44fyYA1W9a3aJLVlQidApPfZVm-atgM6DhpI)
After switch to
![0 operators - 2 HoodieFlinkRecord](https://private-user-images.githubusercontent.com/67073364/410488431-f449df03-a8c0-41ac-aee9-171dc78a19d5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkyMDAxMzQsIm5iZiI6MTczOTE5OTgzNCwicGF0aCI6Ii82NzA3MzM2NC80MTA0ODg0MzEtZjQ0OWRmMDMtYThjMC00MWFjLWFlZTktMTcxZGM3OGExOWQ1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEwVDE1MDM1NFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFmYWYwZTMyODEyODcxMWY3Y2VmNzU2MjY1M2VkNjdiYzA1MWNkOTQ0OGRhMzU4NGI0NzNhZTZjNDlhYjg5OTgmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.49LBwJpuOlQfm8gmhG4ZeoSJp5p5c24CZRx8TJsG6Dk)
HoodieFlinkRecord
:Impact
Flink write performance improvement.
Risk level (write none, low medium or high below)
Low
Documentation Update
After merge
Contributor's checklist