Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] no need to wait after a successful retry #138

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fornaix
Copy link

@fornaix fornaix commented Sep 8, 2023

Problem Summary:

When using Utils.retry, even if it succeeds, we still need to wait for doris.sink.batch.interval.ms milliseconds.

This pr is to fix it.

Checklist(Required)

  1. Does it affect the original behavior: (No)
  2. Has unit tests been added: (No)
  3. Has document been added or modified: (No Need)
  4. Does it need to update dependencies: (No)
  5. Are there any changes that cannot be rolled back: (No)

@gnehil
Copy link
Contributor

gnehil commented Sep 11, 2023

This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?

@fornaix
Copy link
Author

fornaix commented Sep 12, 2023

This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?

Got it, thx. @gnehil
If we increase the retry interval, we will wait a long time after each insertion. Maybe it would be better to split it into two parameters?

@gnehil
Copy link
Contributor

gnehil commented Sep 13, 2023

This configuration is to prevent exceptions caused by too frequent imports. What problems will the pause between batches cause to your job?

Got it, thx. @gnehil If we increase the retry interval, we will wait a long time after each insertion. Maybe it would be better to split it into two parameters?

You can reduce the batch loading interval by setting the doris.sink.batch.interval.ms option. The default value of this option is 50 (ms). Or you can set it to 0, so there will be no interval between batches.
And can you briefly describe the idea of "split into two parameters"?

@fornaix
Copy link
Author

fornaix commented Sep 13, 2023

@gnehil I means providing two parameters:

  1. doris.sink.batch.interval.ms: Control the batch flush interval
  2. doris.sink.retry.interval.ms: Control the retry interval

When the retry interval is increased, it will not affect the batch flush interval.

@gnehil
Copy link
Contributor

gnehil commented Sep 14, 2023

@gnehil I means providing two parameters:

  1. doris.sink.batch.interval.ms: Control the batch flush interval
  2. doris.sink.retry.interval.ms: Control the retry interval

When the retry interval is increased, it will not affect the batch flush interval.

Good idea, you can submit PR for this

@fornaix
Copy link
Author

fornaix commented Sep 15, 2023

cc @gnehil

@JNSimba
Copy link
Member

JNSimba commented Oct 10, 2023

Iterator retry will lose data, refer to pr #145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants