GlusterD kubernetes: systemctl start glusterd silent failures. #1496

jayunit100 · 2019-02-04T19:45:56Z

Note: I didn't setup an ETCD url. I assume that either way, glusterd should fail fast and obviously if ETCD isnt working, however, its a silent failure.

Observed behavior

Running the kube cluster recipes Gluster pods are running and healthy, but systemctl status glusterd2 tells another story, it completely failed.

Expected/desired behavior

Pods should exit if glusterd can't startup, or at least log this to stderr. Right now no logs and only way to know its broken is to run glustercli peer status or similar inside the pod.

Details on how to reproduce (minimal and precise)

Create the following file:

---
apiVersion: v1
kind: Namespace
metadata:
  name: gluster-storage
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gluster
  namespace: gluster-storage
  labels:
    gluster-storage: glusterd2
spec:
  selector:
    matchLabels:
      name: glusterd2-daemon
  template:
    metadata:
      labels:
        name: glusterd2-daemon
    spec:
      containers:
        - name: glusterd2
          image: docker.io/gluster/glusterd2-nightly:20190204
# TODO: Enable the below once passing environment variables to the containers is fixed
#          env:
#            - name: GD2_RESTAUTH
#              value: "false"
# Enable if an external etcd cluster has been set up etcd
#            - name: GD2_ETCDENDPOINTS
#              value: "http://gluster-etcd:2379"
# Generate and set a random uuid here
#            - name: GD2_CLUSTER_ID
#              value: "9610ec0b-17e7-405e-82f7-5f78d0b22463"
          securityContext:
            capabilities: {}
            privileged: true
          volumeMounts:
            - name: gluster-dev
              mountPath: "/dev"
            - name: gluster-cgroup
              mountPath: "/sys/fs/cgroup"
              readOnly: true
            - name: gluster-lvm
              mountPath: "/run/lvm"
            - name: gluster-kmods
              mountPath: "/usr/lib/modules"
              readOnly: true

      volumes:
        - name: gluster-dev
          hostPath:
            path: "/dev"
        - name: gluster-cgroup
          hostPath:
            path: "/sys/fs/cgroup"
        - name: gluster-lvm
          hostPath:
            path: "/run/lvm"
        - name: gluster-kmods
          hostPath:
            path: "/usr/lib/modules"

---
apiVersion: v1
kind: Service
metadata:
  name: glusterd2-service
  namespace: gluster-storage
spec:
  selector:
    name: glusterd2-daemon
  ports:
    - protocol: TCP
      port: 24007
      targetPort: 24007
# GD2 will be available on kube-host:31007 externally
      nodePort: 31007
  type: NodePort

And exec -t -i into one of the pods, you'll see its healthy, but running systemctl status glusterd2 will show error logs. re running this command manually, you will then see the following logs

WARNING: 2019/02/04 19:43:51 grpc: addrConn.createTransport failed to connect to {[fe80::345c:baff:fefe:edc6]:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp [fe80::345c:baff:fefe:edc6]:2379: connect: invalid argument". Reconnecting...
WARNING: 2019/02/04 19:43:51 grpc: addrConn.createTransport failed to connect to {[fe80::345c:baff:fefe:edc6]:2379 0  <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp [fe80::345c:baff:fefe:edc6]:2379: connect: invalid argument". Reconnecting...

The text was updated successfully, but these errors were encountered:

Madhu-1 · 2019-02-05T04:54:04Z

@jayunit100 I don't see below part of the code in your template, which is responsible for the health check

livenessProbe:
            httpGet:
              path: /ping
              port: 24007
            initialDelaySeconds: 10
            periodSeconds: 60

please refer https://github.com/gluster/gcs/blob/master/deploy/templates/gcs-manifests/gcs-gd2.yml.j2 for more info

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GlusterD kubernetes: systemctl start glusterd silent failures. #1496

GlusterD kubernetes: systemctl start glusterd silent failures. #1496

jayunit100 commented Feb 4, 2019

Madhu-1 commented Feb 5, 2019

GlusterD kubernetes: systemctl start glusterd silent failures. #1496

GlusterD kubernetes: systemctl start glusterd silent failures. #1496

Comments

jayunit100 commented Feb 4, 2019

Observed behavior

Expected/desired behavior

Details on how to reproduce (minimal and precise)

Madhu-1 commented Feb 5, 2019