Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Only one pod join the cluster #377

Open
SamueleAlpino opened this issue Apr 12, 2022 · 3 comments
Open

Only one pod join the cluster #377

SamueleAlpino opened this issue Apr 12, 2022 · 3 comments

Comments

@SamueleAlpino
Copy link

SamueleAlpino commented Apr 12, 2022

Hi, recently i tried to upgrade hazelcast from :

com.hazelcast hazelcast-spring 4.0.6 com.hazelcast hazelcast 4.0.6 com.hazelcast hazelcast-kubernetes 2.2.3

to:

com.hazelcast hazelcast-spring 5.1.1 com.hazelcast hazelcast 5.1.1

I have a kubernetes cluster with some applications with those dependencies, last time when i upgraded the application the only think that i did was to kill the pods with the old version and no problems happened.
according the documentation i have a bean to configure the service dns:
@bean
public Config hazelcastConfig(Environment environment) {
Config config = new Config();
config.getNetworkConfig()
.getJoin()
.getMulticastConfig()
.setEnabled(false);

    config.getNetworkConfig()
            .getJoin()
            .getKubernetesConfig()
            .setEnabled(true)
            .setProperty(KubernetesProperties.SERVICE_DNS.key(), serviceName);

    return config;
}

i have also :
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: apps-role
namespace: ${cluster_env}
rules:

  • apiGroups: [""]
    resources: ["pods", "nodes", "services", "endpoints", "deployments"]
    verbs: ["get", "list", "watch"]

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: apps-role-binding
namespace: ${cluster_env}
roleRef:
kind: ClusterRole
name: apps-role
apiGroup: rbac.authorization.k8s.io
subjects:

  • kind: ServiceAccount
    name: default
    namespace: ${cluster_env}

The problem is that one pod goes on 2/2 without problems , the other one continue to crash and print:
java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999991351 NANOSECONDS.
My claim to be master is rejected!
Setting master address to null
NOT sending master question to blacklisted endpoints
ending master question to [...]:5701
Connection to: [...]:5701 streamId:-1 is already in progress
am i missing any configurations?

@kkrol89
Copy link

kkrol89 commented Sep 16, 2022

Same problem occurs for my embedded hazelcast setup based on hazelcast version 5.1.2. Application is deployed on k8s cluster with istio sidecar injection enabled.

First node starts without any problem.

When second node starts, connection between them is established:
[5.1.2] Established socket connection between /10.2.12.114:40435 and /10.2.10.135:5701

But when newly started node raises question about master, it remains unresolved:
Sending master question to [10.2.10.135]:5701

I can see couple of these messages in between:
Connection to: [...]:5701 streamId:-1 is already in progress

Then, they disagree about the master node:
[5.1.2] My claim to be master is rejected! Voting endpoints: [[10.2.10.135]:5701]

and I can see the same exception as the one raised by @SamueleAlpino:
Caused by: java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999992250 NANOSECONDS.

@SamueleAlpino, have you found the reason? Any workaround?

@calnighters
Copy link

I've been seeing the exact same issue...

I am trying to run a Spring Boot service with embedded Hazelcast using kubernetes API discovery mode. This is running in a kubernetes cluster with an istio service mesh.

Debug logs look to show that it has gained a connection but it doesn't seem to then be able to connect.

This is working fine in a namespace without istio enabled, so maybe its something to do with sitting behind a proxy?

@kkrol89 @SamueleAlpino have you figured out any work arounds yet?

@EnricoDamini
Copy link

EnricoDamini commented Jan 16, 2024

I've resolved by adding appProtocol in service definition like this:

apiVersion: v1
kind: Service
metadata:
   name: {{ printf "%s-%s" .Values.projectName "service-headless" }}
   labels:
      app: {{ printf "%s-%s" .Values.projectName "serviceapp" }}
spec:
   type: ClusterIP
   clusterIP: None
   publishNotReadyAddresses: true
   selector:
      app: {{ printf "%s-%s" .Values.projectName "app" }}
   ports:
    - name: "hazelcast"
      port: 5701
      protocol: TCP
      appProtocol: tcp

I found the solution in this thread
hazelcast/hazelcast#22256 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants