Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: VolumeGroupReplicationContent controller to regenerate the OMAP data #4750

Merged
merged 5 commits into from
Jan 28, 2025

Conversation

iPraveenParihar
Copy link
Contributor

@iPraveenParihar iPraveenParihar commented Aug 6, 2024

Describe what this PR does

This commit adds new controller that watches for the
VolumeGroupReplicationContent and regenerates the OMAP data if
it doesn't exists.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release notes updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@iPraveenParihar iPraveenParihar self-assigned this Aug 6, 2024
@mergify mergify bot added the component/rbd Issues related to RBD label Aug 6, 2024
type volumeGroup struct {
// id is a unique value for this volume group in the Ceph cluster, it
// VolumeGroup handles all requests for 'rbd group' operations.
type VolumeGroup struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't export these, they are private intentional. They implement the interface in the internal/rbd/types/*.go files. You can use rbd.NewManager() to get a manager struct and use the Manager API to get the VolumeGroup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are referring to GetVolumeGroupByID, then It doesn't suite well to use it here. Because in GetVolumeGroup we have GetPoolName by LocationID (which is PoolID) and PoolID may be different in secondary cluster.

pool, err := util.GetPoolName(mons, creds, csiID.LocationID)
if err != nil {
return nil, fmt.Errorf("failed to get pool for volume group id %q: %w", id, err)
}

Else we can do it by passing an optional parameter PoolName for GetVolumeGroup ?
If it is passed as empty, then call GetPoolName else use the PoolName passed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to add more functions to the manager and other types. Ideally the API that the manager provides stays simple to use.

@iPraveenParihar iPraveenParihar force-pushed the vg/replication-omap-regenerate branch 4 times, most recently from 2a75eec to 6d757b8 Compare August 6, 2024 11:26
@iPraveenParihar iPraveenParihar marked this pull request as ready for review August 6, 2024 11:31
@iPraveenParihar
Copy link
Contributor Author

Initial Test results -

E0806 10:45:48.804306       1 omap.go:80] omap not found (pool="replicapool", namespace="", name="csi.groups.default"): rados: ret=-2, No such file or directory
I0806 10:45:48.804338       1 rbd_journal.go:764] the journal does not contain a reservation for a volume group with name "vgrcontent-88b1d055-4c66-4218-a81c-487ba0b58ccf" yet
I0806 10:45:48.811316       1 omap.go:159] set omap keys (pool="replicapool", namespace="", name="csi.groups.default"): map[csi.volume.group.vgrcontent-88b1d055-4c66-4218-a81c-487ba0b58ccf:88b1d055-4c66-4218-a81c-487ba0b58ccf])
I0806 10:45:48.814878       1 omap.go:159] set omap keys (pool="replicapool", namespace="", name="csi.volume.group.88b1d055-4c66-4218-a81c-487ba0b58ccf"): map[csi.groupname:csi-vol-group-88b1d055-4c66-4218-a81c-487ba0b58ccf csi.volname:vgrcontent-88b1d055-4c66-4218-a81c-487ba0b58ccf])
I0806 10:45:48.816837       1 omap.go:89] got omap values: (pool="replicapool", namespace="", name="csi.volume.5bfefce1-6427-45a3-b088-6d43d19b1e9c"): map[csi.imageid:288758945f2b8 csi.imagename:csi-vol-5bfefce1-6427-45a3-b088-6d43d19b1e9c csi.volname:pvc-34dc4369-2eb3-492a-88aa-5cd37b557154 csi.volume.owner:rook-ceph]
I0806 10:45:48.838187       1 omap.go:89] got omap values: (pool="replicapool", namespace="", name="csi.volume.896c6912-604b-43db-916c-1e386a71973f"): map[csi.imageid:216b4d44baf60 csi.imagename:csi-vol-896c6912-604b-43db-916c-1e386a71973f csi.volname:pvc-2506eb3c-69e9-4054-8b8d-405de060c0f4 csi.volume.owner:rook-ceph]
I0806 10:45:48.856998       1 rbd_journal.go:799] adding volume mapping for volume "0001-0009-rook-ceph-0000000000000001-5bfefce1-6427-45a3-b088-6d43d19b1e9c" to volume group "csi-vol-group-88b1d055-4c66-4218-a81c-487ba0b58ccf"
I0806 10:45:48.860180       1 omap.go:159] set omap keys (pool="replicapool", namespace="", name="csi.volume.group.88b1d055-4c66-4218-a81c-487ba0b58ccf"): map[0001-0009-rook-ceph-0000000000000001-5bfefce1-6427-45a3-b088-6d43d19b1e9c:])
I0806 10:45:48.860287       1 rbd_journal.go:799] adding volume mapping for volume "0001-0009-rook-ceph-0000000000000001-896c6912-604b-43db-916c-1e386a71973f" to volume group "csi-vol-group-88b1d055-4c66-4218-a81c-487ba0b58ccf"
I0806 10:45:48.863734       1 omap.go:159] set omap keys (pool="replicapool", namespace="", name="csi.volume.group.88b1d055-4c66-4218-a81c-487ba0b58ccf"): map[0001-0009-rook-ceph-0000000000000001-896c6912-604b-43db-916c-1e386a71973f:])
I0806 10:45:48.863972       1 rbd_journal.go:816] re-generated Group ID (0001-0009-rook-ceph-0000000000000001-88b1d055-4c66-4218-a81c-487ba0b58ccf) and Group Name (csi-vol-group-88b1d055-4c66-4218-a81c-487ba0b58ccf) for request name (vgrcontent-88b1d055-4c66-4218-a81c-487ba0b58ccf)
bash-5.1$ rados listomapvals csi.groups.default -p replicapool
csi.volume.group.vgrcontent-88b1d055-4c66-4218-a81c-487ba0b58ccf
value (36 bytes) :
00000000  38 38 62 31 64 30 35 35  2d 34 63 36 36 2d 34 32  |88b1d055-4c66-42|
00000010  31 38 2d 61 38 31 63 2d  34 38 37 62 61 30 62 35  |18-a81c-487ba0b5|
00000020  38 63 63 66                                       |8ccf|
00000024

bash-5.1$ rados listomapvals csi.volume.group.88b1d055-4c66-4218-a81c-487ba0b58ccf -p replicapool
0001-0009-rook-ceph-0000000000000001-5bfefce1-6427-45a3-b088-6d43d19b1e9c
value (0 bytes) :

0001-0009-rook-ceph-0000000000000001-896c6912-604b-43db-916c-1e386a71973f
value (0 bytes) :

csi.groupname
value (50 bytes) :
00000000  63 73 69 2d 76 6f 6c 2d  67 72 6f 75 70 2d 38 38  |csi-vol-group-88|
00000010  62 31 64 30 35 35 2d 34  63 36 36 2d 34 32 31 38  |b1d055-4c66-4218|
00000020  2d 61 38 31 63 2d 34 38  37 62 61 30 62 35 38 63  |-a81c-487ba0b58c|
00000030  63 66                                             |cf|
00000032

csi.volname
value (47 bytes) :
00000000  76 67 72 63 6f 6e 74 65  6e 74 2d 38 38 62 31 64  |vgrcontent-88b1d|
00000010  30 35 35 2d 34 63 36 36  2d 34 32 31 38 2d 61 38  |055-4c66-4218-a8|
00000020  31 63 2d 34 38 37 62 61  30 62 35 38 63 63 66     |1c-487ba0b58ccf|

@iPraveenParihar iPraveenParihar force-pushed the vg/replication-omap-regenerate branch 4 times, most recently from 0290729 to 628b00b Compare August 6, 2024 13:35
Copy link
Contributor

mergify bot commented Aug 6, 2024

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

internal/rbd/manager.go Outdated Show resolved Hide resolved
Copy link
Contributor

mergify bot commented Aug 9, 2024

The PR description contains the unsupported [skip ci] command, please update the description and mark the PR ready for review again.

@mergify mergify bot marked this pull request as draft August 9, 2024 10:07
@iPraveenParihar iPraveenParihar marked this pull request as ready for review August 12, 2024 04:15
@iPraveenParihar iPraveenParihar force-pushed the vg/replication-omap-regenerate branch 5 times, most recently from 17e3418 to 62e2f4a Compare August 12, 2024 13:42
internal/rbd/rbd_journal.go Outdated Show resolved Hide resolved
internal/rbd/rbd_journal.go Outdated Show resolved Hide resolved
@iPraveenParihar iPraveenParihar force-pushed the vg/replication-omap-regenerate branch 2 times, most recently from 87e64d7 to 01ca0f3 Compare August 14, 2024 11:40
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jan 28, 2025

@Mergifyio queue

Copy link
Contributor

mergify bot commented Jan 28, 2025

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

This commit adds groupUUID param for `ReserveName` to be used for
OMAP name reserve instead of auto-generating.
This is useful for mirroring and metro-DR ensuring that mirrored
resources have consistent OMAP names across mirrored clusters.

Signed-off-by: Praveen M <[email protected]>
This commit adds `RegenerateVolumeGroupJournal` to Manager
interface. RegenerateVolumeGroupJournal regenerate the omap
data for the volume group.

This performs the following operations:
  - extracts clusterID and Mons from the cluster mapping
  - Retrieves pool and journalPool parameters from the VolumeGroupReplicationClass
  - Reserves omap data
  - Add volumeIDs mapping to the reserved volume group omap object
  - Generate new volume group handle

Returns the generated volume group handler.

Signed-off-by: Praveen M <[email protected]>
This commit adds new controller that watches for the
VolumeGroupReplicationContent and regenerates the OMAP data if
it doesn't exists.

Signed-off-by: Praveen M <[email protected]>
VolumeGroupReplicationContent controller needs `get`, `list` and `watch`
access control for resource `VolumeGroupReplicationContents`. And `get`
access control for resource `VolumeGroupReplicationClasses`.

Signed-off-by: Praveen M <[email protected]>
@iPraveenParihar iPraveenParihar force-pushed the vg/replication-omap-regenerate branch from 79d6faa to fae2a39 Compare January 28, 2025 14:38
@mergify mergify bot added the ok-to-test Label to trigger E2E tests label Jan 28, 2025
@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-cephfs

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/upgrade-tests-rbd

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/k8s-e2e-external-storage/1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e-helm/k8s-1.32

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.30

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.31

@ceph-csi-bot
Copy link
Collaborator

/test ci/centos/mini-e2e/k8s-1.32

@ceph-csi-bot ceph-csi-bot removed the ok-to-test Label to trigger E2E tests label Jan 28, 2025
Copy link
Contributor

mergify bot commented Jan 28, 2025

This pull request has been removed from the queue for the following reason: checks failed.

The merge conditions cannot be satisfied due to failing checks:

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.

If you want to requeue this pull request, you need to post a comment with the text: @mergifyio requeue

@nixpanic
Copy link
Member

/retest ci/centos/mini-e2e-helm/k8s-1.30

@nixpanic
Copy link
Member

/retest ci/centos/mini-e2e-helm/k8s-1.30

retry, failed to deploy rook and components.

@nixpanic
Copy link
Member

@Mergifyio requeue

Copy link
Contributor

mergify bot commented Jan 28, 2025

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@mergify mergify bot merged commit 15ffa48 into ceph:devel Jan 28, 2025
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD keepalive This label can be used to disable stale bot activiity in the repo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants