Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

volume info shows the volume state as started but volume status shows it as offline #1452

Open
PrasadDesala opened this issue Jan 2, 2019 · 3 comments
Assignees
Labels
brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug

Comments

@PrasadDesala
Copy link

Observed behavior

volume info shows the volume as started but volume status shows it as offline.

[root@gluster-kube1-0 bricks]# glustercli volume status pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
Volume : pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
| 4112110f-4443-4d85-9bd8-1914efcee897 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick | false | 0 | 0 |
| 6927f476-a7e8-40cd-8205-a7668d667ada | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick | false | 0 | 0 |
| 2584d0c9-c6d7-4d3d-94b6-6b909fbf4b67 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
[root@gluster-kube1-0 bricks]# glustercli volume info pvc-46967f93-0e6e-11e9-af0b-525400f94cb8

Volume Name: pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
Type: Replicate
Volume ID: d5f6bb15-8d2d-4b27-a700-96e86ad972e4
State: Started
Capacity: 1.0 GiB
Transport-type: tcp
Options:
performance/io-cache.io-cache: off
performance/write-behind.write-behind: off
cluster/replicate.self-heal-daemon: on
debug/io-stats.count-fop-hits: on
performance/open-behind.open-behind: off
performance/quick-read.quick-read: off
performance/read-ahead.read-ahead: off
performance/readdir-ahead.readdir-ahead: off
debug/io-stats.latency-measurement: on
performance/md-cache.md-cache: off
Number of Bricks: 3
Brick1: gluster-kube3-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick
Brick2: gluster-kube2-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick
Brick3: gluster-kube1-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick

And volume start on that volume is failing with response "Volume already started"

[root@gluster-kube1-0 bricks]# glustercli volume start pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
volume start failed

Response headers:
X-Gluster-Cluster-Id: 98ef5bef-583c-41ee-b594-50f3d4784679
X-Gluster-Peer-Id: 4e752b45-aa0a-4784-83f7-6b487e886b4d
X-Request-Id: 16694478-f0d7-4488-a440-6f744e476bad

Response body:
volume already started

Expected/desired behavior

volume status shows the volume as online.

Details on how to reproduce (minimal and precise)

  1. Create a 3 node gcs system using vagrant.
  2. with brick-mux enabled, create 100 pvc.
  3. Stop and start all the volumes.
    glustercli volume stop pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-38485e48-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-38853d16-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-38c06414-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-38f92094-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-39337ec5-0e6e-11e9-af0b-525400f94cb8
    glustercli volume stop pvc-396d7779-0e6e-11e9-af0b-525400f94cb8
    ....
    ..
    .
    Same way start all the volumes.
    glustercli volume start pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8
    glustercli volume start pvc-38485e48-0e6e-11e9-af0b-525400f94cb8
    glustercli volume start pvc-38853d16-0e6e-11e9-af0b-525400f94cb8
    glustercli volume start pvc-38c06414-0e6e-11e9-af0b-525400f94cb8
    ..
    .

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.94.git601ba61
Operating system used: Centos 7.6
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
Using External ETCD: (yes/no, if yes ETCD version): yes; version 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: Kubernetes

  • Output of statedump from any one of the node
@vpandey-RH
Copy link
Contributor

vpandey-RH commented Jan 2, 2019

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of -
''' ps aux | grep glusterfsd ''' after you start the PVCs.

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

@PrasadDesala
Copy link
Author

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of -
''' ps aux | grep glusterfsd ''' after you start the PVCs.

[root@gluster-kube1-0 bricks]# ps aux | grep glusterfsd
root 8113 0.0 0.0 9088 672 pts/2 S+ 12:54 0:00 grep --color=auto glusterfsd
root 21733 8.5 2.6 13825056 870656 ? Ssl 11:34 6:52 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/3c5c17b3422e2a07.socket --brick-name /var/run/glusterd2/bricks/pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Volumes are started once all the stop requests are completed.
Stop 100 volumes {1..100} --> wait till volume stop completes on all the volumes --> Start 100 volumes {1..100}

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

Its been more than an hour I hit this issue. Still the volume status shows as not online.

I see this error in glusterd2 logs when I tried to start the volume. path "/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick" is present though.
[root@gluster-kube1-0 bricks]# ll /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick
total 0

time="2019-01-02 11:49:10.525084" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"

@vpandey-RH Let me know if you need any other information, I have the system in the same state.

@atinmu atinmu added brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug labels Jan 16, 2019
@atinmu
Copy link
Contributor

atinmu commented Jan 16, 2019

@vpandey-RH did we figure out what's the cause of this state?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug
Projects
None yet
Development

No branches or pull requests

3 participants