Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[client][fix] Bookie WatchTask may be stuck #4481

Merged
merged 3 commits into from
Aug 17, 2024

Conversation

wenbingshen
Copy link
Member

@wenbingshen wenbingshen commented Aug 16, 2024

Motivation

Before understanding the problem solved by this PR, you can try to use this Test method to execute it in the existing master branch, and the unit test will fail.

@Test
    public void testBookieWatcher() throws Exception {
        ClientConfiguration conf = new ClientConfiguration();
        conf.setMetadataServiceUri(zkUtil.getMetadataServiceUri());

        StaticDNSResolver tested = new StaticDNSResolver();
        try (BookKeeper bkc = BookKeeper
                .forConfig(conf)
                .dnsResolver(tested)
                .build()) {
            final Map<BookieId, BookieInfoReader.BookieInfo> bookieInfo = bkc.getBookieInfo();

            // 1. check all bookies in client cache successfully.
            bookieInfo.forEach((bookieId, info) -> {
                final CompletableFuture<Versioned<BookieServiceInfo>> bookieServiceInfo = bkc.getMetadataClientDriver()
                        .getRegistrationClient().getBookieServiceInfo(bookieId);
                assertTrue(bookieServiceInfo.isDone());
                assertFalse(bookieServiceInfo.isCompletedExceptionally());
            });

            // 2. add a task to scheduler, blocking zk watch for bookies cache
            bkc.getClientCtx().getScheduler().schedule(() -> {
                try {
                    Thread.sleep(Long.MAX_VALUE);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }, 0, TimeUnit.MILLISECONDS);

            // 3. restart one bookie, so the client should update cache by WatchTask
            restartBookie(bookieInfo.keySet().iterator().next());

            // 4. after restart bookie, check again for the client cache
            final CompletableFuture<Versioned<BookieServiceInfo>> bookieServiceInfo =
                    bkc.getMetadataClientDriver().getRegistrationClient()
                            .getBookieServiceInfo(bookieInfo.keySet().iterator().next());
            assertTrue(bookieServiceInfo.isDone());
            // 5. Previously, we used scheduler, and here getting bookie from client cache would fail.
            // 6. After this PR, we introduced independent internal thread pool watchTaskScheduler,
            // and here it will succeed.
            assertFalse(bookieServiceInfo.isCompletedExceptionally());
        }
    }

Next, let me tell you my problem:

  1. We execute bookie offline and then go online in ReadOnly state, hoping that these bookies can continue to accept client read requests after they come back online.

  2. Then, in some of our brokers, we found that some bookies were not re-listed in the BookKeeper client cache in ManagedLedger after they were online. The following picture shows that Update BookieInfoCache was only executed once, but it should be executed twice in theory, because in addition to ManagedLedger in Pulsar's Broker, there is also a BookKeeper Client in BookKeeperSchemaStroge. As shown in the second figure.

  3. According to the stack analysis, our Scheduler thread is frequently performing the following operations:
    Since the thread has been performing this operation, the watch event triggered by ZK has been in the queue of the scheduler thread and has not been executed, and no new watch listeners will be registered in ZK. Finally, the online Bookie node will no longer be updated in the cache of the bk client.

···
BookKeeperClientScheduler-OrderedScheduler-0-0
at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap$Section.removeIf(Ljava/util/function/BiPredicate;)I (ConcurrentOpenHashMap.java:406)
at org.apache.bookkeeper.util.collections.ConcurrentOpenHashMap.removeIf(Ljava/util/function/BiPredicate;)I (ConcurrentOpenHashMap.java:172)
at org.apache.bookkeeper.proto.PerChannelBookieClient.checkTimeoutOnPendingOperations()V (PerChannelBookieClient.java:1015)
at org.apache.bookkeeper.proto.DefaultPerChannelBookieClientPool.checkTimeoutOnPendingOperations()V (DefaultPerChannelBookieClientPool.java:132)
at org.apache.bookkeeper.proto.BookieClientImpl.monitorPendingOperations()V (BookieClientImpl.java:572)
at org.apache.bookkeeper.proto.BookieClientImpl.lambda$new$0()V (BookieClientImpl.java:131)
at org.apache.bookkeeper.proto.BookieClientImpl$$Lambda$77.run()V (Unknown Source)
at org.apache.bookkeeper.util.SafeRunnable$1.safeRun()V (SafeRunnable.java:43)
at org.apache.bookkeeper.common.util.SafeRunnable.run()V (SafeRunnable.java:36)
at com.google.common.util.concurrent.MoreExecutors$ScheduledListeningDecorator$NeverSuccessfulListenableFutureTask.run()V (MoreExecutors.java:705)
at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object; (Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset()Z (FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)Z (ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V (ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V (ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run()V (ThreadPoolExecutor.java:617)
at io.netty.util.concurrent.FastThreadLocalRunnable.run()V (FastThreadLocalRunnable.java:30)
at java.lang.Thread.run()V (Thread.java:748)
···

  1. The above stack issues have been fixed in the latest version through multiple PRs, but I see that the Scheduler thread is still accessed by multiple components and publicly accessed through ClientContext, which is a very dangerous operation. The harm we suffered this time was that we could not get the bookie node from the cache, which affected the reading of consumer data, and coincidentally, the three replica bookies node that was originally going to read the data just experienced our offline and online operations.

So I suggest that WatchTask provide an internal independent thread pool.

Changes

Add inner thread for WatchTask.

@wenbingshen
Copy link
Member Author

@merlimat @eolivelli @hangc0276 @dlg99 @zymap @shoothzj PTAL. This problem has certain harmful.

@eolivelli eolivelli changed the title [Improve] Add inner thread for WatchTask [client][fix] Bookie WatchTask may be stuck Aug 16, 2024
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great catch

I have changed the title, this is not an "improvement", but actually a "bugfix"

@shoothzj
Copy link
Member

can we consider to prevent outside call of bkc.getClientCtx().getScheduler().schedule, BTW, is this widely used in pulsar project?

@wenbingshen
Copy link
Member Author

wenbingshen commented Aug 16, 2024

can we consider to prevent outside call of bkc.getClientCtx().getScheduler().schedule, BTW, is this widely used in pulsar project?

@shoothzj

  1. Simply blocking external calls to bkc.getClientCtx().getScheduler().schedule can reduce the risk of the problem, but it cannot fundamentally solve the problem because the scheduler is still shared in internal components.
  2. ClientContext and scheduler were introduced from bookkeeper-4.9.0 and are not widely used in Pulsar, but it is uncertain whether they are relied on in other people's projects. Blocking calls will introduce compatibility issues.
  3. If you want to optimize the sharing of scheduler by internal components to avoid blocking WatchTask, the code modification is more complicated than the current PR.
  4. Providing an independent thread pool for WatchTask is the simplest way, and the thread pool is only allowed for internal use, and it is indicated in the name and comments that external use is prohibited.
  5. Bookie is a cloud-native elastic expansion component, and WatchTask is crucial. Once blocked, it will cause data loss and Broker partition or consumption unavailable. It deserves an exclusive thread pool.

@wenbingshen
Copy link
Member Author

According to @hangc0276 suggestion, change the execution thread pool name to: highPriorityTaskExecutor

@wenbingshen wenbingshen merged commit a569a49 into apache:master Aug 17, 2024
23 checks passed
@wenbingshen wenbingshen deleted the wenbing/fix_wrong_bookie_cache branch August 17, 2024 15:40
@wenbingshen wenbingshen added this to the 4.18.0 milestone Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants