volume info shows the volume state as started but volume status shows it as offline #1452

PrasadDesala · 2019-01-02T11:48:46Z

Observed behavior

volume info shows the volume as started but volume status shows it as offline.

[root@gluster-kube1-0 bricks]# glustercli volume status pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
Volume : pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
| BRICK ID | HOST | PATH | ONLINE | PORT | PID |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
| 4112110f-4443-4d85-9bd8-1914efcee897 | gluster-kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick | false | 0 | 0 |
| 6927f476-a7e8-40cd-8205-a7668d667ada | gluster-kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick | false | 0 | 0 |
| 2584d0c9-c6d7-4d3d-94b6-6b909fbf4b67 | gluster-kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick | false | 0 | 0 |
+--------------------------------------+-------------------------------+-----------------------------------------------------------------------------------------+--------+------+-----+
[root@gluster-kube1-0 bricks]# glustercli volume info pvc-46967f93-0e6e-11e9-af0b-525400f94cb8

Volume Name: pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
Type: Replicate
Volume ID: d5f6bb15-8d2d-4b27-a700-96e86ad972e4
State: Started
Capacity: 1.0 GiB
Transport-type: tcp
Options:
performance/io-cache.io-cache: off
performance/write-behind.write-behind: off
cluster/replicate.self-heal-daemon: on
debug/io-stats.count-fop-hits: on
performance/open-behind.open-behind: off
performance/quick-read.quick-read: off
performance/read-ahead.read-ahead: off
performance/readdir-ahead.readdir-ahead: off
debug/io-stats.latency-measurement: on
performance/md-cache.md-cache: off
Number of Bricks: 3
Brick1: gluster-kube3-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick
Brick2: gluster-kube2-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick2/brick
Brick3: gluster-kube1-0.glusterd2.gcs:/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick

And volume start on that volume is failing with response "Volume already started"

[root@gluster-kube1-0 bricks]# glustercli volume start pvc-46967f93-0e6e-11e9-af0b-525400f94cb8
volume start failed

Response headers:
X-Gluster-Cluster-Id: 98ef5bef-583c-41ee-b594-50f3d4784679
X-Gluster-Peer-Id: 4e752b45-aa0a-4784-83f7-6b487e886b4d
X-Request-Id: 16694478-f0d7-4488-a440-6f744e476bad

Response body:
volume already started

Expected/desired behavior

volume status shows the volume as online.

Details on how to reproduce (minimal and precise)

Create a 3 node gcs system using vagrant.
with brick-mux enabled, create 100 pvc.
Stop and start all the volumes.
glustercli volume stop pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-38485e48-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-38853d16-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-38c06414-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-38f92094-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-39337ec5-0e6e-11e9-af0b-525400f94cb8
glustercli volume stop pvc-396d7779-0e6e-11e9-af0b-525400f94cb8
....
..
.
Same way start all the volumes.
glustercli volume start pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8
glustercli volume start pvc-38485e48-0e6e-11e9-af0b-525400f94cb8
glustercli volume start pvc-38853d16-0e6e-11e9-af0b-525400f94cb8
glustercli volume start pvc-38c06414-0e6e-11e9-af0b-525400f94cb8
..
.

Information about the environment:

Glusterd2 version used (e.g. v4.1.0 or master): v6.0-dev.94.git601ba61
Operating system used: Centos 7.6
Glusterd2 compiled from sources, as a package (rpm/deb), or container:
Using External ETCD: (yes/no, if yes ETCD version): yes; version 3.3.8
If container, which container image:
Using kubernetes, openshift, or direct install:
If kubernetes/openshift, is gluster running inside kubernetes/openshift or outside: Kubernetes

Output of statedump from any one of the node

The text was updated successfully, but these errors were encountered:

vpandey-RH · 2019-01-02T12:38:57Z

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of -
''' ps aux | grep glusterfsd ''' after you start the PVCs.

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

PrasadDesala · 2019-01-02T12:59:50Z

@PrasadDesala I will need more information than just volume status as there could be numerous reasons as to why the Port shows as 0. One of them could be that the bricks need some time to Sign In after glusterfsd has been spawned. Can you get me the output of -
''' ps aux | grep glusterfsd ''' after you start the PVCs.

[root@gluster-kube1-0 bricks]# ps aux | grep glusterfsd
root 8113 0.0 0.0 9088 672 pts/2 S+ 12:54 0:00 grep --color=auto glusterfsd
root 21733 8.5 2.6 13825056 870656 ? Ssl 11:34 6:52 /usr/sbin/glusterfsd --volfile-server gluster-kube1-0.glusterd2.gcs --volfile-server-port 24007 --volfile-id pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8.4e752b45-aa0a-4784-83f7-6b487e886b4d.var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick -p /var/run/glusterd2/4e752b45-aa0a-4784-83f7-6b487e886b4d-var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.pid -S /var/run/glusterd2/3c5c17b3422e2a07.socket --brick-name /var/run/glusterd2/bricks/pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8/subvol1/brick1/brick -l /var/log/glusterd2/glusterfs/bricks/var-run-glusterd2-bricks-pvc-381a5faf-0e6e-11e9-af0b-525400f94cb8-subvol1-brick1-brick.log --xlator-option *-posix.glusterd-uuid=4e752b45-aa0a-4784-83f7-6b487e886b4d

Also, can you tell me if the volume stop and start requests are sequential. I mean the start requests are sent after all stop requests have been sent or the requests don't have a particular ordering ?

Volumes are started once all the stop requests are completed.
Stop 100 volumes {1..100} --> wait till volume stop completes on all the volumes --> Start 100 volumes {1..100}

Also, After all start requests have been sent and returned successful, can you give the bricks sometime to sign in and check after that if you have the same observations ?

Its been more than an hour I hit this issue. Still the volume status shows as not online.

I see this error in glusterd2 logs when I tried to start the volume. path "/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick" is present though.
[root@gluster-kube1-0 bricks]# ll /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick
total 0

time="2019-01-02 11:49:10.525084" level=error msg="registry.SearchByBrickPath() failed for brick" brick=/var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick error="SearchByBrickPath: port for brick /var/run/glusterd2/bricks/pvc-46967f93-0e6e-11e9-af0b-525400f94cb8/subvol1/brick3/brick not found" source="[rpc_prog.go:104:pmap.(*GfPortmap).PortByBrick]"

@vpandey-RH Let me know if you need any other information, I have the system in the same state.

atinmu · 2019-01-16T11:19:33Z

@vpandey-RH did we figure out what's the cause of this state?

atinmu assigned vpandey-RH Jan 2, 2019

atinmu added brick-multiplexing-issue tracker label to capture all issues related to brick multiplexing feature bug labels Jan 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

volume info shows the volume state as started but volume status shows it as offline #1452

volume info shows the volume state as started but volume status shows it as offline #1452

PrasadDesala commented Jan 2, 2019

vpandey-RH commented Jan 2, 2019 •

edited

Loading

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 16, 2019

volume info shows the volume state as started but volume status shows it as offline #1452

volume info shows the volume state as started but volume status shows it as offline #1452

Comments

PrasadDesala commented Jan 2, 2019

Observed behavior

Expected/desired behavior

Details on how to reproduce (minimal and precise)

Information about the environment:

vpandey-RH commented Jan 2, 2019 • edited Loading

PrasadDesala commented Jan 2, 2019

atinmu commented Jan 16, 2019

vpandey-RH commented Jan 2, 2019 •

edited

Loading