Only one pod join the cluster #377

SamueleAlpino · 2022-04-12T13:02:51Z

Hi, recently i tried to upgrade hazelcast from :

com.hazelcast hazelcast-spring 4.0.6 com.hazelcast hazelcast 4.0.6 com.hazelcast hazelcast-kubernetes 2.2.3

to:

com.hazelcast hazelcast-spring 5.1.1 com.hazelcast hazelcast 5.1.1

I have a kubernetes cluster with some applications with those dependencies, last time when i upgraded the application the only think that i did was to kill the pods with the old version and no problems happened.
according the documentation i have a bean to configure the service dns:
@bean
public Config hazelcastConfig(Environment environment) {
Config config = new Config();
config.getNetworkConfig()
.getJoin()
.getMulticastConfig()
.setEnabled(false);

    config.getNetworkConfig()
            .getJoin()
            .getKubernetesConfig()
            .setEnabled(true)
            .setProperty(KubernetesProperties.SERVICE_DNS.key(), serviceName);

    return config;
}

i have also :
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: apps-role
namespace: ${cluster_env}
rules:

apiGroups: [""]
resources: ["pods", "nodes", "services", "endpoints", "deployments"]
verbs: ["get", "list", "watch"]

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: apps-role-binding
namespace: ${cluster_env}
roleRef:
kind: ClusterRole
name: apps-role
apiGroup: rbac.authorization.k8s.io
subjects:

kind: ServiceAccount
name: default
namespace: ${cluster_env}

The problem is that one pod goes on 2/2 without problems , the other one continue to crash and print:
java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999991351 NANOSECONDS.
My claim to be master is rejected!
Setting master address to null
NOT sending master question to blacklisted endpoints
ending master question to [...]:5701
Connection to: [...]:5701 streamId:-1 is already in progress
am i missing any configurations?

The text was updated successfully, but these errors were encountered:

kkrol89 · 2022-09-16T07:38:50Z

Same problem occurs for my embedded hazelcast setup based on hazelcast version 5.1.2. Application is deployed on k8s cluster with istio sidecar injection enabled.

First node starts without any problem.

When second node starts, connection between them is established:
[5.1.2] Established socket connection between /10.2.12.114:40435 and /10.2.10.135:5701

But when newly started node raises question about master, it remains unresolved:
Sending master question to [10.2.10.135]:5701

I can see couple of these messages in between:
Connection to: [...]:5701 streamId:-1 is already in progress

Then, they disagree about the master node:
[5.1.2] My claim to be master is rejected! Voting endpoints: [[10.2.10.135]:5701]

and I can see the same exception as the one raised by @SamueleAlpino:
Caused by: java.util.concurrent.TimeoutException: JoinMastershipClaimOp failed to complete within 9999992250 NANOSECONDS.

@SamueleAlpino, have you found the reason? Any workaround?

calnighters · 2023-07-17T09:59:07Z

I've been seeing the exact same issue...

I am trying to run a Spring Boot service with embedded Hazelcast using kubernetes API discovery mode. This is running in a kubernetes cluster with an istio service mesh.

Debug logs look to show that it has gained a connection but it doesn't seem to then be able to connect.

This is working fine in a namespace without istio enabled, so maybe its something to do with sitting behind a proxy?

@kkrol89 @SamueleAlpino have you figured out any work arounds yet?

EnricoDamini · 2024-01-16T12:50:37Z

I've resolved by adding appProtocol in service definition like this:

apiVersion: v1
kind: Service
metadata:
   name: {{ printf "%s-%s" .Values.projectName "service-headless" }}
   labels:
      app: {{ printf "%s-%s" .Values.projectName "serviceapp" }}
spec:
   type: ClusterIP
   clusterIP: None
   publishNotReadyAddresses: true
   selector:
      app: {{ printf "%s-%s" .Values.projectName "app" }}
   ports:
    - name: "hazelcast"
      port: 5701
      protocol: TCP
      appProtocol: tcp

I found the solution in this thread
hazelcast/hazelcast#22256 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only one pod join the cluster #377

Only one pod join the cluster #377

SamueleAlpino commented Apr 12, 2022 •

edited

Loading

kkrol89 commented Sep 16, 2022

calnighters commented Jul 17, 2023

EnricoDamini commented Jan 16, 2024 •

edited

Loading

Only one pod join the cluster #377

Only one pod join the cluster #377

Comments

SamueleAlpino commented Apr 12, 2022 • edited Loading

kkrol89 commented Sep 16, 2022

calnighters commented Jul 17, 2023

EnricoDamini commented Jan 16, 2024 • edited Loading

SamueleAlpino commented Apr 12, 2022 •

edited

Loading

EnricoDamini commented Jan 16, 2024 •

edited

Loading