Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] [broker] Fix system topic can not be loaded up if it contains data offloaded #23279

Merged

Conversation

poorbarcode
Copy link
Contributor

Motivation

After #22497, the broker will not offload data or read data from offloaded for the system topic, the PR has not considered compatibility for the cluster that already offloaded data, which leads to the system topics can not be loaded up if they contain data offloaded.

364696514-3bca994b-b8de-4ab8-8d9d-f7b01b758613 364697327-fa02b97d-4608-41c2-b486-2813a2b22500
2024-09-09T13:02:53,677+0000 [broker-client-shared-internal-executor-6-1] WARN  org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://{tenant}/{ns}/__change_events] Compaction failure.
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException: The subscription __compaction of the topic persistent://{tenant}/{ns}/__change_events gets the last message id was failed
{"errorMsg":"Failed to recover Transaction Buffer.","reqId":283067771769117143, "remote":"127.0.0.6:6650", "local":"/127.0.0.6:43676"}
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1141) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$hasMessageAvailableAsync$60(ConsumerImpl.java:2458) 
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalGetLastMessageIdAsync$67(ConsumerImpl.java:2584) 
	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:990) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:974) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
	at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:797) 
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:192) 
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[io.netty-netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[io.netty-netty-codec-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[io.netty-netty-handler-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918) ~[io.netty-netty-transport-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[io.netty-netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[io.netty-netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[io.netty-netty-transport-classes-epoll-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994) ~[io.netty-netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.111.Final.jar:4.1.111.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.111.Final.jar:4.1.111.Final]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]
Caused by: org.apache.pulsar.client.api.PulsarClientException: The subscription __compaction of the topic persistent://{tenant}/{ns}/__change_events gets the last message id was failed
{"errorMsg":"Failed to recover Transaction Buffer.","reqId":283067771769117143, "remote":"127.0.0.6:6650", "local":"/127.0.0.6:43676"}
	at org.apache.pulsar.client.api.PulsarClientException.wrap(PulsarClientException.java:1052) 
	at org.apache.pulsar.client.impl.ConsumerImpl.lambda$internalGetLastMessageIdAsync$67(ConsumerImpl.java:2585) 
	... 29 more

Modifications

fix compatibility

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost release/3.0.7 release/3.3.2 labels Sep 10, 2024
@poorbarcode poorbarcode added this to the 4.0.0 milestone Sep 10, 2024
@poorbarcode poorbarcode self-assigned this Sep 10, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 10, 2024
@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work @poorbarcode

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2024

Codecov Report

Attention: Patch coverage is 82.05128% with 7 lines in your changes missing coverage. Please review.

Project coverage is 74.52%. Comparing base (bbc6224) to head (5448893).
Report is 576 commits behind head on master.

Files with missing lines Patch % Lines
...per/mledger/impl/NonAppendableLedgerOffloader.java 63.63% 4 Missing ⚠️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 76.92% 0 Missing and 3 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23279      +/-   ##
============================================
+ Coverage     73.57%   74.52%   +0.94%     
- Complexity    32624    33763    +1139     
============================================
  Files          1877     1927      +50     
  Lines        139502   145055    +5553     
  Branches      15299    15858     +559     
============================================
+ Hits         102638   108099    +5461     
+ Misses        28908    28694     -214     
- Partials       7956     8262     +306     
Flag Coverage Δ
inttests 27.64% <15.38%> (+3.05%) ⬆️
systests 24.69% <46.15%> (+0.37%) ⬆️
unittests 73.87% <82.05%> (+1.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...org/apache/bookkeeper/mledger/LedgerOffloader.java 73.68% <100.00%> (+1.46%) ⬆️
...e/bookkeeper/mledger/impl/NullLedgerOffloader.java 40.00% <100.00%> (+4.28%) ⬆️
...rg/apache/pulsar/broker/service/BrokerService.java 81.86% <100.00%> (+1.08%) ⬆️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 82.02% <76.92%> (+1.36%) ⬆️
...per/mledger/impl/NonAppendableLedgerOffloader.java 63.63% <63.63%> (ø)

... and 558 files with indirect coverage changes

@poorbarcode poorbarcode merged commit fc0e4e3 into apache:master Sep 11, 2024
51 checks passed
poorbarcode added a commit that referenced this pull request Sep 11, 2024
poorbarcode added a commit that referenced this pull request Sep 11, 2024
michalcukierman pushed a commit to michalcukierman/pulsar that referenced this pull request Sep 11, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 12, 2024
…ata offloaded (apache#23279)

(cherry picked from commit fc0e4e3)
(cherry picked from commit bd95463)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 12, 2024
…ata offloaded (apache#23279)

(cherry picked from commit fc0e4e3)
(cherry picked from commit bd95463)
poorbarcode added a commit that referenced this pull request Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-3.0 cherry-picked/branch-3.2 cherry-picked/branch-3.3 doc-not-needed Your PR changes do not impact docs ready-to-test release/3.0.7 release/3.2.5 release/3.3.2 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants