-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[improve][broker] Reschedule reads with increasing backoff when no messages are dispatched #23226
[improve][broker] Reschedule reads with increasing backoff when no messages are dispatched #23226
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I think the backoff should be configurable
.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/pulsar/broker/service/persistent/PersistentDispatcherMultipleConsumers.java
Outdated
Show resolved
Hide resolved
8068b31
to
b99479a
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #23226 +/- ##
============================================
+ Coverage 73.57% 74.53% +0.96%
+ Complexity 32624 2755 -29869
============================================
Files 1877 1924 +47
Lines 139502 144915 +5413
Branches 15299 15845 +546
============================================
+ Hits 102638 108016 +5378
+ Misses 28908 28637 -271
- Partials 7956 8262 +306
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…ssages are dispatched (apache#23226)
…ssages are dispatched (apache#23226) (cherry picked from commit d98e51f) (cherry picked from commit f990e28)
This introduces a regression that is fixed by #23284 |
…ssages are dispatched (apache#23226) (cherry picked from commit d98e51f) (cherry picked from commit f990e28)
…en no messages are dispatched (apache#23226)" This reverts commit f990e28. (cherry picked from commit 0fd9f17)
…en no messages are dispatched (apache#23226)" This reverts commit f990e28. (cherry picked from commit 0fd9f17)
Main Issue: #23200
Motivation
There's currently a clear problem with Key_Shared that in normal operations, it causes a lot of "ack holes" which result in several problems. One of the problems is the latency issues that are explained in #23200. Another problem is that the large number of "ack holes" exceed managedLedgerMaxUnackedRangesToPersist (10000) in usual cases such as in the demonstration in #23200.
There are multiple other issues where there has been a large number of "ack holes" when Pulsar users have experienced problems. One of the previous mitigations is PIP-299: Stop dispatch messages if the individual acks will be lost in the persistent storage.. The need for PIP-299 proves that the large number of "ack holes" is a fairly common problem.
Modifications
While experimenting on #23200, it was determined that #7105 changes were related to the cause of the issue.
I also noticed that #18315 contained some impactful changes (https://github.com/apache/pulsar/pull/18315/files#diff-c48d5c94842ac8c9a0c9314b207298069f38c8dcfeda4a9886fb3bb1f77843f2). For Key_Shared subscriptions, the change in #10762 makes the case of not dispatching any messages a common case.
Based on this information, I decided to implement a solution where there would be a backoff when no messages are dispatched.
This PR contains a change that reschedules a call to
readMoreEntries
where the delay is exponentially increasing as long as no entries are dispatched. The backoff delay starts at 100ms and is limited to 1000ms. These values are configurable in broker.conf.Additional context
While testing this change, I happened to notice that this change mitigates the problem in the reproducer of of #23200.
With the changes of this PR, these are the results:
Documentation
doc
doc-required
doc-not-needed
doc-complete