Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the failure of the "remove" method. A retry should be added to the remove method #138

Open
wjw465150 opened this issue Apr 7, 2023 · 3 comments
Labels

Comments

@wjw465150
Copy link

Questions

It was found that in high concurrency and weak network environments, the "remove" method of SubsMapHelper may be called earlier than the "put" method, leading to the failure of the "remove" method. A retry should be added to the remove method

Version

4.4.1

solution to the problem

  public void remove(String address, RegistrationInfo registrationInfo, Promise<Void> promise) {
    try {
      if (registrationInfo.localOnly()) {
        localSubs.computeIfPresent(address, (add, curr) -> removeFromSet(registrationInfo, curr));
        fireRegistrationUpdateEvent(address);
        promise.complete();
      } else {
        //-> @wjw_add The deletion instruction came too early and the node does not exist yet. Try again a few times at this time!
        int retryCount=0;
        org.apache.zookeeper.data.Stat stat = curator.checkExists().forPath(fullPath.apply(address, registrationInfo));
        while(stat==null && retryCount<3) {
          log.warn(MessageFormat.format("要删除的Zookeeper节点不存在:{0}, 重试第:{1}次!", fullPath.apply(address, registrationInfo), retryCount));
          java.util.concurrent.TimeUnit.SECONDS.sleep(1);
          retryCount++;
          stat = curator.checkExists().forPath(fullPath.apply(address, registrationInfo));
        }
        if(stat==null) {
          log.warn(MessageFormat.format("重试几次后,要删除的Zookeeper节点还不存在:{0}", fullPath.apply(address, registrationInfo)));
        }
        //<- @wjw_add
        
        curator.delete().guaranteed().inBackground((c, e) -> {
          if (e.getType() == CuratorEventType.DELETE) {
            vertx.runOnContext(aVoid -> {
              ownSubs.computeIfPresent(address, (add, curr) -> removeFromSet(registrationInfo, curr));
              promise.complete();
            });
          }
        }).forPath(fullPath.apply(address, registrationInfo));
      }
    } catch (Exception e) {
      log.error(String.format("remove subs address %s failed.", address), e);
      promise.fail(e);
    }
  }
@wjw465150 wjw465150 added the bug label Apr 7, 2023
@wjw465150
Copy link
Author

Why don't use a Watcher to monitor? Because the node may have already been created when creating the Watcher, so it cannot be listened to!

@wjw465150
Copy link
Author

wjw465150 commented Apr 9, 2023

Changing to asynchronous retry(use vertx.setPeriodic method) is better!

  public void remove(String address, RegistrationInfo registrationInfo, Promise<Void> promise) {
    try {
      if (registrationInfo.localOnly()) {
        localSubs.computeIfPresent(address, (add, curr) -> removeFromSet(registrationInfo, curr));
        fireRegistrationUpdateEvent(address);
        promise.complete();
      } else {
        //-> @wjw_add 删除指令来的早了,节点还不存在,这时候重试几次!
        //@wjw_comment: 为何不用Watcher来监听?因为可能在创建Watcher的时候节点已经创建了,就监听不到了!
        String nodeFullPath = fullPath.apply(address, registrationInfo);
        if (curator.checkExists().forPath(nodeFullPath) == null) {
          java.util.concurrent.atomic.AtomicInteger retryCount = new java.util.concurrent.atomic.AtomicInteger(0);
          vertx.setPeriodic(100, 100, timerID -> {
            try {
              log.warn(MessageFormat.format("要删除的Zookeeper节点不存在:{0}, 重试第:{1}次!", nodeFullPath, retryCount.incrementAndGet()));
              if (curator.checkExists().forPath(nodeFullPath) != null) {
                vertx.cancelTimer(timerID);
                curator.delete().guaranteed().forPath(nodeFullPath);
                log.warn(MessageFormat.format("重试第:{0}次后,成功删除Zookeeper节点:{1}", retryCount.get(), nodeFullPath));
                promise.complete();
                return;
              }
              
              if (retryCount.get() > 10) {
                vertx.cancelTimer(timerID);
                String errMessage = MessageFormat.format("重试{0}次后,要删除的Zookeeper节点还不存在:{1}", retryCount.get(), nodeFullPath);
                log.warn(errMessage);
                throw new IllegalStateException(errMessage); 
              }
            } catch (Exception e) {
              log.error(e.getMessage(), e);
              promise.fail(e);
            }
          });

          return;
        }
        //<- @wjw_add
        
        curator.delete().guaranteed().inBackground((c, e) -> {
          if (e.getType() == CuratorEventType.DELETE) {
            vertx.runOnContext(aVoid -> {
              ownSubs.computeIfPresent(address, (add, curr) -> removeFromSet(registrationInfo, curr));
              promise.complete();
            });
          }
        }).forPath(fullPath.apply(address, registrationInfo));
      }
    } catch (Exception e) {
      log.error(String.format("remove subs address %s failed.", address), e);
      promise.fail(e);
    }
  }

@wjw465150
Copy link
Author

Has anyone else encountered this? The eventbus-bridge websocket used by one of my projects will often encounter this situation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

1 participant