Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gluster expects bricks to be hosted on newly peer-probed machine... #4386

Open
babipanghang opened this issue Jul 1, 2024 · 6 comments
Open

Comments

@babipanghang
Copy link

I'm trying to move data from one brick to a new one. The new brick is to be hosted on a new machine. Peer probing the new machine succeeds:

company@gluster2:/var/lib/glusterd$ sudo gluster peer probe gluster20241.storage.company.nl
peer probe: success

However, after that, gluster cannot seem to access any information on my bricks/volumes anymore, reporting it's looking for my bricks on the freshly probed peer:

company@gluster2:~$ sudo gluster volume heal systemdisks statistics heal-count
Gathering count of entries to be healed on volume systemdisks has been unsuccessful:
Staging failed on gluster20241.storage.company.nl. Error: Volume systemdisks does not exist

The gluster volumes appear to be happily humming along otherwise. The only way to get it working again i found was detaching the freshly attached peer again.

The full output of the command that failed:
See above

Expected results:
A report of heal counts (or other volume information when requested).

Mandatory info:
- The output of the gluster volume info command:

Volume Name: backups
Type: Replicate
Volume ID: 7b38980c-3c9a-44be-acba-4db72277e2c6
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/backups/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/backups/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/backups/gluster (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
storage.linux-io_uring: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.cache-samba-metadata: on
performance.readdir-ahead: on
performance.parallel-readdir: on
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.nl-cache-positive-entry: on
user.smb: disable
storage.batch-fsync-delay-usec: 0
performance.write-behind: off
client.event-threads: 4
server.event-threads: 4

Volume Name: data
Type: Replicate
Volume ID: e11631b8-5556-4a49-9019-b1462483fe5f
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/data/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/data/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/data/gluster (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: on
storage.linux-io_uring: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.cache-samba-metadata: on
performance.readdir-ahead: on
performance.parallel-readdir: on
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.nl-cache-positive-entry: on
user.smb: disable
storage.batch-fsync-delay-usec: 0
performance.write-behind: off
client.event-threads: 4
server.event-threads: 6
cluster.self-heal-window-size: 32
performance.cache-refresh-timeout: 10

Volume Name: okl
Type: Replicate
Volume ID: b2065a03-9fab-4dcd-8c9e-81c7892dc3bb
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/okl/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/okl/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/okl/gluster (arbiter)
Options Reconfigured:
cluster.self-heal-daemon: enable
cluster.shd-max-threads: 2
server.event-threads: 6
client.event-threads: 4
performance.write-behind: off
storage.batch-fsync-delay-usec: 0
user.smb: disable
performance.nl-cache-positive-entry: on
performance.nl-cache-timeout: 600
performance.nl-cache: on
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.cache-samba-metadata: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.linux-io_uring: on
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.self-heal-window-size: 32
performance.cache-refresh-timeout: 10
performance.cache-size: 256MB

Volume Name: oudeplotfile
Type: Replicate
Volume ID: fe524085-8eda-4d41-898d-fcacc8fe0162
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/oudeplotfile/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/oudeplotfile/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/oudeplotfile/gluster (arbiter)
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.write-behind: off
storage.batch-fsync-delay-usec: 0
user.smb: disable
performance.nl-cache-positive-entry: on
performance.nl-cache-timeout: 600
performance.nl-cache: on
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.cache-samba-metadata: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.linux-io_uring: on
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.self-heal-window-size: 32

Volume Name: plotfile
Type: Replicate
Volume ID: b55dfb4b-306f-4be9-9af8-486daafa7efa
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/plotfile/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/plotfile/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/plotfile/gluster (arbiter)
Options Reconfigured:
cluster.shd-max-threads: 2
cluster.use-anonymous-inode: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
cluster.granular-entry-heal: disable
storage.linux-io_uring: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.cache-samba-metadata: on
performance.readdir-ahead: on
performance.parallel-readdir: on
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.nl-cache-positive-entry: on
user.smb: disable
storage.batch-fsync-delay-usec: 0
performance.write-behind: off
client.event-threads: 4
server.event-threads: 6
server.allow-insecure: on
cluster.self-heal-window-size: 32
performance.cache-refresh-timeout: 10
performance.cache-size: 256MB
performance.write-behind-window-size: 64MB

Volume Name: scans
Type: Replicate
Volume ID: d5f73be9-c488-454f-8c94-7db6226eccf4
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/scans/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/scans/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/scans/gluster (arbiter)
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.write-behind: off
storage.batch-fsync-delay-usec: 0
user.smb: disable
performance.nl-cache-positive-entry: on
performance.nl-cache-timeout: 600
performance.nl-cache: on
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.cache-samba-metadata: on
network.inode-lru-limit: 200000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.linux-io_uring: on
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off

Volume Name: sd2023
Type: Replicate
Volume ID: 8f5845df-daa7-412f-bb32-abbb16cb1f99
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/sd2023/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/sd2023/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/sd2023/gluster (arbiter)
Options Reconfigured:
cluster.lookup-optimize: off
server.keepalive-count: 5
server.keepalive-interval: 2
server.keepalive-time: 10
server.tcp-user-timeout: 20
network.ping-timeout: 20
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 2
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
performance.strict-o-direct: on
network.remote-dio: disable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on

Volume Name: systemdisks
Type: Replicate
Volume ID: 67f874b2-1766-49b9-b2a0-ae55ef54a5d1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster1.storage.company.nl:/opt/glusterdata/systemdisks/gluster
Brick2: gluster2.storage.company.nl:/opt/glusterdata/systemdisks/gluster
Brick3: arbiter2.storage.company.nl:/opt/glusterdata/systemdisks/gluster (arbiter)
Options Reconfigured:
cluster.shd-max-threads: 2
features.shard-block-size: 64MB
features.shard: on
features.scrub: Active
features.bitrot: on
performance.io-thread-count: 12
nfs.disable: on
transport.address-family: inet
storage.fips-mode-rchecksum: on
server.allow-insecure: on
cluster.self-heal-daemon: enable
performance.enable-least-priority: no
cluster.use-anonymous-inode: yes
cluster.data-self-heal-algorithm: full

- The output of the gluster volume status command:

Status of volume: backups
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/backups/gluster                  N/A       N/A        N       N/A
Brick gluster2.storage.company.nl:/opt/g
lusterdata/backups/gluster                  56560     0          Y       2881
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/backups/gluster                  57516     0          Y       1865
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume backups
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/data/gluster                     N/A       N/A        N       N/A
Brick gluster2.storage.company.nl:/opt/g
lusterdata/data/gluster                     54003     0          Y       3027
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/data/gluster                     52542     0          Y       1913
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume data
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: okl
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/okl/gluster                      53067     0          Y       2332
Brick gluster2.storage.company.nl:/opt/g
lusterdata/okl/gluster                      49961     0          Y       3075
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/okl/gluster                      55594     0          Y       1961
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume okl
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: oudeplotfile
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/oudeplotfile/gluster             N/A       N/A        N       N/A
Brick gluster2.storage.company.nl:/opt/g
lusterdata/oudeplotfile/gluster             57812     0          Y       3113
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/oudeplotfile/gluster             53851     0          Y       2058
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume oudeplotfile
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: plotfile
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/plotfile/gluster                 55348     0          Y       2553
Brick gluster2.storage.company.nl:/opt/g
lusterdata/plotfile/gluster                 58304     0          Y       51563
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/plotfile/gluster                 59045     0          Y       591219
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume plotfile
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: scans
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/scans/gluster                    51060     0          Y       2600
Brick gluster2.storage.company.nl:/opt/g
lusterdata/scans/gluster                    55019     0          Y       3194
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/scans/gluster                    50918     0          Y       2135
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume scans
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: sd2023
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/sd2023/gluster                   56151     0          Y       2141
Brick gluster2.storage.company.nl:/opt/g
lusterdata/sd2023/gluster                   58378     0          Y       2847
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/sd2023/gluster                   54447     0          Y       1811
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761

Task Status of Volume sd2023
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: systemdisks
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster1.storage.company.nl:/opt/g
lusterdata/systemdisks/gluster              52284     0          Y       2657
Brick gluster2.storage.company.nl:/opt/g
lusterdata/systemdisks/gluster              60998     0          Y       73423
Brick arbiter2.storage.company.nl:/opt/g
lusterdata/systemdisks/gluster              55865     0          Y       1031565
Self-heal Daemon on localhost               N/A       N/A        Y       3320
Bitrot Daemon on localhost                  N/A       N/A        Y       314460
Scrubber Daemon on localhost                N/A       N/A        Y       314463
Self-heal Daemon on arbiter2.storage.dagser
vice.nl                                     N/A       N/A        Y       2301
Bitrot Daemon on arbiter2.storage.dagservic
e.nl                                        N/A       N/A        Y       3389755
Scrubber Daemon on arbiter2.storage.dagserv
ice.nl                                      N/A       N/A        Y       3389758
Self-heal Daemon on gluster1.storage.dagser
vice.nl                                     N/A       N/A        Y       2761
Bitrot Daemon on gluster1.storage.dagservic
e.nl                                        N/A       N/A        Y       9364
Scrubber Daemon on gluster1.storage.dagserv
ice.nl                                      N/A       N/A        Y       9367

Task Status of Volume systemdisks
------------------------------------------------------------------------------
There are no active volume tasks

- The output of the gluster volume heal command:
Not particularly relevant

**- Provide logs present on following locations of client and server nodes -
/var/log/glusterfs/glusterd.log on gluster2:

[2024-07-01 12:49:37.284292 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:50:18.720539 +0000] I [MSGID: 106499] [glusterd-handler.c:4372:__glusterd_handle_status_volume] 0-management: Received status volume req for volume systemdisks
[2024-07-01 12:50:18.724385 +0000] E [MSGID: 106152] [glusterd-syncop.c:102:gd_collate_errors] 0-glusterd: Staging failed on gluster20241.storage.company.nl. Error: Volume systemdisks does not exist
[2024-07-01 12:50:42.294827 +0000] I [MSGID: 106487] [glusterd-handler.c:1256:__glusterd_handle_cli_deprobe] 0-glusterd: Received CLI deprobe req
[2024-07-01 12:50:42.296144 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:589:__glusterd_friend_remove_cbk] 0-glusterd: Received ACC from uuid: 11d72bab-514c-4910-ad3b-7f69d198dd31, host: gluster2.storage.company.nl, port: 0
[2024-07-01 12:50:42.298071 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:50:42.298104 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:265:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:50:42.312556 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:50:42.312617 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:681:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483
[2024-07-01 13:03:01.762614 +0000] I [MSGID: 106061] [glusterd-utils.c:10724:glusterd_volume_status_copy_to_op_ctx_dict] 0-management: Dict get failed [{Key=count}]

glusterd.log on gluster20241 (sorry, not sure which is the relevant part here):

[2024-07-01 12:49:36.609631 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2024-07-01 12:49:36.815703 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000
[2024-07-01 12:49:36.878803 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483, host: arbiter2.storage.dagservice.nl, port: 0
[2024-07-01 12:49:36.998858 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: afb619d7-18a2-4080-ae8f-34a22b7acf2d
[2024-07-01 12:49:37.259932 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483
[2024-07-01 12:49:37.261653 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to arbiter2.storage.dagservice.nl (0), ret: 0, op_ret: 0
[2024-07-01 12:49:37.264032 +0000] I [MSGID: 106061] [glusterd-utils.c:4163:gd_import_friend_volume_rebal_dict] 0-management: Dict get failed [{Key=volume1.rebal-dict-count}, {errno=2}, {error=No such file or directory}]
[2024-07-01 12:49:37.264132 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1093:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname_id missing in payload for backups
[2024-07-01 12:49:37.264156 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1104:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname missing in payload for backups
[2024-07-01 12:49:37.264186 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1115:gd_import_volume_snap_details] 0-management: volume1.snap_plugin missing in payload for backups
[2024-07-01 12:49:37.264273 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload
[2024-07-01 12:49:37.264302 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload
[2024-07-01 12:49:37.264342 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:49:37.264362 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:49:37.264378 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:49:37.264390 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:49:37.264420 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:49:37.264434 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped
[2024-07-01 12:49:37.286284 +0000] I [MSGID: 106163] [glusterd-handshake.c:1493:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 100000
[2024-07-01 12:49:37.411063 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483
[2024-07-01 12:49:37.411675 +0000] I [MSGID: 106492] [glusterd-handler.c:2896:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483
[2024-07-01 12:49:37.413933 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2024-07-01 12:49:37.504241 +0000] I [MSGID: 106490] [glusterd-handler.c:2691:__glusterd_handle_incoming_friend_req] 0-glusterd: Received probe from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:49:37.505692 +0000] I [MSGID: 106493] [glusterd-handler.c:3982:glusterd_xfer_friend_add_resp] 0-glusterd: Responded to gluster1.storage.dagservice.nl (0), ret: 0, op_ret: 0
[2024-07-01 12:49:37.507798 +0000] I [MSGID: 106061] [glusterd-utils.c:4163:gd_import_friend_volume_rebal_dict] 0-management: Dict get failed [{Key=volume1.rebal-dict-count}, {errno=2}, {error=No such file or directory}]
[2024-07-01 12:49:37.507879 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1093:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname_id missing in payload for backups
[2024-07-01 12:49:37.507907 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1104:gd_import_volume_snap_details] 0-management: volume1.restored_from_snapname missing in payload for backups
[2024-07-01 12:49:37.507941 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1115:gd_import_volume_snap_details] 0-management: volume1.snap_plugin missing in payload for backups
[2024-07-01 12:49:37.508042 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:986:gd_import_new_brick_snap_details] 0-management: volume1.brick1.origin_path missing in payload
[2024-07-01 12:49:37.508103 +0000] E [MSGID: 106061] [glusterd-snapshot-utils.c:1004:gd_import_new_brick_snap_details] 0-management: volume1.brick1.snap_type missing in payload
[2024-07-01 12:49:37.508142 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:49:37.508161 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:49:37.508176 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:49:37.508189 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:49:37.508211 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:49:37.508225 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped
[2024-07-01 12:49:37.541630 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 47dcff58-1ced-48d8-ac95-d02fa3178483
[2024-07-01 12:49:37.722360 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:461:__glusterd_friend_add_cbk] 0-glusterd: Received ACC from uuid: 40b18764-c424-441e-b72a-b7310993029e, host: gluster1.storage.dagservice.nl, port: 0
[2024-07-01 12:49:37.761069 +0000] I [MSGID: 106492] [glusterd-handler.c:2896:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:49:37.763229 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2024-07-01 12:49:38.796414 +0000] I [MSGID: 106492] [glusterd-handler.c:2896:__glusterd_handle_friend_update] 0-glusterd: Received friend update from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:49:38.798785 +0000] I [MSGID: 106502] [glusterd-handler.c:2943:__glusterd_handle_friend_update] 0-management: Received my uuid as Friend
[2024-07-01 12:49:39.653497 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:49:40.447423 +0000] I [MSGID: 106493] [glusterd-rpc-ops.c:675:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 40b18764-c424-441e-b72a-b7310993029e
[2024-07-01 12:50:18.725188 +0000] E [MSGID: 106048] [glusterd-op-sm.c:1814:glusterd_op_stage_status_volume] 0-management: Failed to get volinfo [{Volume=systemdisks}]
[2024-07-01 12:50:18.725281 +0000] E [MSGID: 106301] [glusterd-op-sm.c:5870:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Status', Status : -1
[2024-07-01 12:50:42.296732 +0000] I [MSGID: 106491] [glusterd-handler.c:2743:__glusterd_handle_incoming_unfriend_req] 0-glusterd: Received unfriend from uuid: afb619d7-18a2-4080-ae8f-34a22b7acf2d
[2024-07-01 12:50:42.297031 +0000] I [MSGID: 106493] [glusterd-handler.c:3956:glusterd_xfer_friend_remove_resp] 0-glusterd: Responded to gluster2.storage.dagservice.nl (0), ret: 0
[2024-07-01 12:50:42.297109 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:50:42.297161 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:50:42.297179 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:50:42.297192 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:50:42.297206 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:50:42.297228 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped
[2024-07-01 12:50:42.298333 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:50:42.298365 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:50:42.298428 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:50:42.298443 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:50:42.298457 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:50:42.298468 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped
[2024-07-01 12:50:42.298521 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:50:42.298539 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:50:42.298567 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:50:42.298579 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:50:42.298602 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:50:42.298615 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped
[2024-07-01 12:50:42.298631 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: quotad already stopped
[2024-07-01 12:50:42.298643 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: quotad service is stopped
[2024-07-01 12:50:42.298664 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: bitd already stopped
[2024-07-01 12:50:42.298677 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: bitd service is stopped
[2024-07-01 12:50:42.298689 +0000] I [MSGID: 106131] [glusterd-proc-mgmt.c:81:glusterd_proc_stop] 0-management: scrub already stopped
[2024-07-01 12:50:42.298709 +0000] I [MSGID: 106568] [glusterd-svc-mgmt.c:262:glusterd_svc_stop] 0-management: scrub service is stopped

**- Is there any crash ? Provide the backtrace and coredump
None

Additional info:

- The operating system / glusterfs version:
Existing machines:
Gluster 10.5 on ubuntu jammy

New machine (peer probed):
Gluster 11.1 on ubuntu jammy

@apunkt
Copy link

apunkt commented Oct 1, 2024

I copy that.
Same error. Same effects. Trying to connect
Gluster 9.6 on Ubuntu focal to
Gluster 11.1 on Ubuntu noble

Volumes replicate and distributed-replicate.

@apunkt
Copy link

apunkt commented Oct 2, 2024

DON'T upgrade your cluster to 11.1 trying to join a new peer!

I upgraded 9.6 -> 10.5 flawlessley, but when upgrading the first node 10.5 -> 11.1 it is not able to rejoin the cluster anymore:

[2024-10-02 06:37:40.317161 +0000] E [MSGID: 106061] [glusterd.c:597:glusterd_crt_georep_folders] 0-glusterd: Dict get failed [{Key=log-group}, {errno=2}, {error=No such file or directory}] 
[2024-10-02 06:37:44.930153 +0000] E [MSGID: 106010] [glusterd-utils.c:3542:glusterd_compare_friend_volume] 0-management: Version of Cksums gv1 differ. local cksum = 3551822466, remote cksum = 735098502 on peer 192.168.0.40 
[2024-10-02 06:38:02.515216 +0000] E [MSGID: 106048] [glusterd-shd-svc.c:507:glusterd_shdsvc_start] 0-glusterd: Failed to attach shd svc(volume=gv1) to pid=-1 
[2024-10-02 06:38:02.515375 +0000] E [MSGID: 106615] [glusterd-shd-svc.c:678:glusterd_shdsvc_restart] 0-management: Couldn't start shd for vol: gv1 on restart

downgrading to 10.5 unfortunately doesn't work, too:

[2024-10-02 06:50:11.293733 +0000] E [rpc-transport.c:282:rpc_transport_load] 0-rpc-transport: /usr/lib/x86_64-linux-gnu/glusterfs/10.5/rpc-transport/socket.so: undefined symbol: xdr_gfs3_read_rsp
[2024-10-02 06:50:11.293777 +0000] E [MSGID: 106244] [glusterd.c:1853:init] 0-management: creation of listener failed 
[2024-10-02 06:50:11.293797 +0000] E [MSGID: 101019] [xlator.c:640:xlator_init] 0-management: Initialization of volume failed. review your volfile again. [{name=management}] 
[2024-10-02 06:50:11.293812 +0000] E [MSGID: 101066] [graph.c:424:glusterfs_graph_init] 0-management: initializing translator failed 
[2024-10-02 06:50:11.293832 +0000] E [MSGID: 101176] [graph.c:765:glusterfs_graph_activate] 0-graph: init failed

leaving you in an unhealty state.

@apunkt
Copy link

apunkt commented Oct 3, 2024

DON'T upgrade your cluster to 11.1 trying to join a new peer!

I upgraded 9.6 -> 10.5 flawlessley, but when upgrading the first node 10.5 -> 11.1 it is not able to rejoin the cluster anymore:

leaving you in an unhealty state.

Solved this by bravely desperately upgrading all other nodes to 11.1.
The cause for the above to happen was nfs.disable option set on the volume, which was made optional in 11 but earlier a requirement.
When you upgrade your first node, the option is removed forcefully during upgrade, causing checksum error with the rest of the cluster having this option set. The release notes say this in a cryptical sentence and also promise to disappear by just restarting glusterd, which did not work on my cluster running all they way through since v3.x.
Thus online update procedure was not possible on my end. Had to take down the cluster, upgrade, then spin up again and finally heal.

Hope after spending a full day on this sharing this information here will prevent you ending up in the same situation.

@apunkt
Copy link

apunkt commented Oct 3, 2024

Finally you are able to expand the cluster with a new 11.1 node.
Peer probe is now successful.

@beat
Copy link

beat commented Nov 2, 2024

cross-linking a related issue I found while in search of a solution to upgrade a glusterFS cluster from 10.5 to 11.1 without downtime: #4409

still trying to understand if the new NFS Ganesha could be enabled temporarily to remove that deprecated "nfs.disabled" setting that it seems as if it can't be removed with a command ?

@BarryLuijten
Copy link

BarryLuijten commented Dec 12, 2024

Hope after spending a full day on this sharing this information here will prevent you ending up in the same situation.

Thank you @apunkt! It didn't prevent me from spending a few hours on this issue, but I was happy to finally find someone who found the culprit!
I hope this can be resolved in a later version, because bringing down the entire cluster and everything that depends on it, is not something I would enjoy to do on a production cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants