Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name Resolution/Connect Failure After Adding a Server #356

Open
marcusolini opened this issue May 25, 2022 · 3 comments
Open

Name Resolution/Connect Failure After Adding a Server #356

marcusolini opened this issue May 25, 2022 · 3 comments
Labels
todo-backlog An issue to be addressed in the future

Comments

@marcusolini
Copy link

Hi,

We are encountering a name resolution issue when adding a server. The nuraft::add_srv() method is successful but the subsequent internal asynchronous connection (from the leader) to the (follower) server appears to be failing with a name resolution issue (see below log excerpts). So, the server is unable to successfully join the cluster.

We are using NuRaft on Linux with blocking mode. From the (leader) server we can successfully resolve the (follower) server using name resolution tools. Manual entries in /etc/hosts does not resolve the issue.

Are there any settings that allows the nuraft::add_srv() call to be more synchronous and create the connection during the nuraft::add_srv() call?

Are there any settings to configure or use an alternate name resolution?

Any guidance or insight is appreciated,
Mark.

LOG EXCERPTS

05/25/2022 10:24:37.597 PID:409 TID:140421944866560 [process_req] Receive a add_server_request message from 0 with LastLogIndex=0, LastLogTerm=0, EntriesLength=1, CommitIndex=0 and Term=0(raft_server.cxx:628)
05/25/2022 10:24:37.605 PID:409 TID:140421944866560 [asio_rpc_client] asio client created: 0x7fb65004ce98(asio_service.cxx:860)
05/25/2022 10:24:37.614 PID:409 TID:140421944866560 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:37.622 PID:409 TID:140421944866560 [send] socket 0x7fb65004ce98 to pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634 is not opened yet(asio_service.cxx:946)
05/25/2022 10:24:37.630 PID:409 TID:140421944866560 [invite_srv_to_join_cluster] sent join request to peer 1002, pemjm-2.policyjobmgr.nbux.svc.cluster.local:2634(handle_join_leave.cxx:134)
05/25/2022 10:24:37.639 PID:409 TID:140421944866560 [process_req] Response back a add_server_response message to 1000 with Accepted=1, Term=1, NextIndex=3(raft_server.cxx:698)

05/25/2022 10:24:38.047 PID:409 TID:140421229422336 [handle_rpc_result] resp of req 1000 -> 1002, type join_cluster_request, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative)(peer.cxx:107)
05/25/2022 10:24:38.648 PID:409 TID:140421229422336 [handle_ext_resp_err] receive an rpc error response from peer server, failed to resolve host pemjm-2.policyjobmgr.nbux.svc.cluster.local due to error 1, Host not found (authoritative) 12(raft_server.cxx:1408)
05/25/2022 10:24:38.714 PID:409 TID:140421229422336 [handle_ext_resp_err] retry the request(raft_server.cxx:1448)

05/25/2022 10:24:40.831 PID:409 TID:140421221029632 [on_retryable_req_err] retry the request join_cluster_request for 1002(raft_server.cxx:1460)
05/25/2022 10:24:40.865 PID:409 TID:140421221029632 [send_req] send req 1000 -> 1002, type join_cluster_request(peer.cxx:44)
05/25/2022 10:24:40.881 PID:409 TID:140421221029632 [send_req] rpc local is null(peer.cxx:53)

@greensky00
Copy link
Contributor

Hi @marcusolini

There is no way to make "add server" synchronous. Instead, it is possible to provide an API to attach your custom resolver to NuRaft so that you can return the IP address from the given host. Does it make sense to you?

@marcusolini
Copy link
Author

Hi @greensky00

Sure, using a custom resolver sounds like a good approach. How do you suggest we integrate a custom resolver?

@greensky00
Copy link
Contributor

greensky00 commented May 31, 2022

You can assign your callback in asio_service_options, and that callback function will look like:

void custom_resolver(const std::string& host, 
                     const std::string& port, 
                     const resolver_resp& when_done) {
    // do IP address lookup..
    when_done( resolved_ip_address, port );
}

Such a callback will be an asynchronous task so that you can call when_done in a different thread. Let me work on it shortly.

@greensky00 greensky00 added the todo-backlog An issue to be addressed in the future label May 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
todo-backlog An issue to be addressed in the future
Projects
None yet
Development

No branches or pull requests

2 participants