Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: VolumeGroupSnapshot support #4502

Open
wants to merge 14 commits into
base: devel
Choose a base branch
from

Conversation

nixpanic
Copy link
Member

@nixpanic nixpanic commented Mar 15, 2024

Add support for VolumeGroupSnapshots in RBD. The last two commits enable the feature in the Group Controller Server and expose the capability. All other commits provide the functionality through the rbd.Manager interface.

Currently there is no Ceph container-image release that provides the required librbd features. Building from this PR will not provide support for VolumeGroupSnapshot yet, the base container-image needs to be set to Ceph CI main branch (quay.ceph.io/ceph-ci/ceph:main) for that. A test-image based on Ceph main can be found at quay.io/nixpanic/cephcsi:pr_4739.

Notable changes:

  • internal/rbd_types package with interfaces so that objects can be passed around cleaner
  • internal/rbd/volume.go implementing the new Volume interface for rbdImage
  • internal/rbd_group package for all RBD-group functionalities
  • internal/rbd/group_controller.go for all CSI VolumeGroup service procedures

Depends-on: #4794
Depends-on: #4870
Depends-on: #4871
Depends-on: #4884
Depends-on: #4885
Depends-on: #4898
Depends-on: #4902
Depends-on: #4904


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@mergify mergify bot added the component/rbd Issues related to RBD label Mar 15, 2024
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 17, 2024
@nixpanic nixpanic force-pushed the rbd/group-snapshot branch 3 times, most recently from 57e136d to 6d85b34 Compare June 25, 2024 10:57
Copy link
Contributor

mergify bot commented Jul 8, 2024

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

@nixpanic
Copy link
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.30

@nixpanic
Copy link
Member Author

nixpanic commented Oct 3, 2024

@Mergifyio refresh

Copy link
Contributor

mergify bot commented Oct 3, 2024

refresh

✅ Pull request refreshed

@nixpanic nixpanic force-pushed the rbd/group-snapshot branch 2 times, most recently from 878d937 to 2286e5a Compare October 18, 2024 07:30
@nixpanic nixpanic marked this pull request as ready for review October 18, 2024 07:34
@nixpanic
Copy link
Member Author

/test ci/centos/mini-e2e-helm/k8s-1.30

Each object is responsible for maintaining a connection to the journal.

By sharing a single journal, cleanup of objects becomes more complex as
the journal is used in deferred functions and only the last should
destroy the journal connection resources.

Signed-off-by: Niels de Vos <[email protected]>
The NewSnapshotByID() function makes it possible to clone a new Snapshot
from an existing RBD-image and the ID of an RBD-snapshot on that image.

This will be used by the VolumeGroupSnapshot feature, where the ID of an
RBD-snapshot is obtained for the RBD-snapshot on the RBD-images.

Signed-off-by: Niels de Vos <[email protected]>
When the rbd.Manager creates a VolumeGroupSnapshot, each RBD-snapshot
that is created as part of the RBD-group needs to be cloned into its own
RBD-image that will be used as a CSI Snapshot.

The VolumeGroup.CreateSnapshots() creates the RBD-group snapshot and
returns a list of the Snapshot structs.

Signed-off-by: Niels de Vos <[email protected]>
The VolumeGroupSnapshot type will be used by the rbd.Manager to create,
inspect and delete VolumeGroupSnapshos.

Signed-off-by: Niels de Vos <[email protected]>
A (CSI) VolumeGroupSnapshot object contains references to Snapshot IDs
(or CSI Snapshot handles). In order to work with a VolumeGroupSnapshot
struct, the Snapshot IDs need to be resolved into rbdSnapshot structs.

Signed-off-by: Niels de Vos <[email protected]>
Implement the CreateVolumeGroupSnapshot for the rbd.Manager. A Group
Controller Server can use the rbd.Manager to create VolumeGroupSnapshots
in an easy an idempotent way.

Signed-off-by: Niels de Vos <[email protected]>
The GetVolumeGroupSnapshotByID function makes it possible to get a
VolumeGroupSnapshot object from the Manager by passing a request-id.
This makes it simple for the Group Controller Server to check if a
VolumeGroupSnapshot already exists, so it is not needed to try and
re-create an existing one.

Signed-off-by: Niels de Vos <[email protected]>
The Group Controller Server may need to fetch a VolumeGroupSnapshot that
was statically provisioned. In that case, only the name of the
VolumeGroupSnapshot is known and should be resolved to an object.

Signed-off-by: Niels de Vos <[email protected]>
When creating a Snapshot with the new NewSnapshotByID() function, the
name of the RBD-image that is created is the same as the name of the
Snapshot. The `RbdImageName` points to the name of parent image, which
causes deleting the Snapshot to delete the parent image instead.

Correcting the `RbdImageName` and setting it to the `RbdSnapName` makes
sure that upon deletion, the Snapshot RBD-image is removed, and not the
parent image.

Signed-off-by: Niels de Vos <[email protected]>
When the GroupSnapGetInfo go-ceph function is supported by librbd, the
Group Controller Servive and VolumeGroupSnapshot capabilities can be
exposed to the Container Orchestrator.

Signed-off-by: Niels de Vos <[email protected]>
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nixpanic we can rebase this and run the E2E to ensure that we are good. and i had couple of questions related to existing VolumeGroup and how both are going to work when the group is created by volumeGroup

Comment on lines +36 to +37
// snapshots is a list of rbd-images that are part of the group. The ID
// of each snapshot is stored in the journal.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the comment above the member (to line 34)

return nil, fmt.Errorf("failed to get volume attributes for id %q: %w", vgs, err)
}

volumeMap := make(map[string]string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
volumeMap := make(map[string]string)
volumeMap := make(map[string]string,len(snapshots))

creds *util.Credentials,
snapshotResolver types.SnapshotResolver,
) (types.VolumeGroupSnapshot, error) {
cleanVGS := true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we are following this pattern in this code, does it make sense to cleanup incase of error instead of having one more bool for it?

return vgs, nil
}

// ToCSI creates a CSI-Addons type for the VolumeGroupSnapshot.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSI-Addons to CSI

for _, volume := range vgs.snapshotsToFree {
volume.Destroy(ctx)
}
vgs.snapshotsToFree = make([]types.Snapshot, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vgs.snapshotsToFree=nil works here?

Comment on lines +350 to +353
journalPool, ok := mgr.parameters["journalPool"]
if !ok || journalPool == "" {
journalPool = pool
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as i see we are not using journalPool for rados, should we drop it?

groupSnapshot types.VolumeGroupSnapshot

// the VG and VGS should not have the same name
vgName = req.GetName() + "-vg" // stable temporary name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there is already a volumeGroup that was created for VolumeGroup that was used for replication?

for _, volume := range volumes {
if vg != nil {
// 'normal' cleanup, remove all images from the group
vgErr := vg.RemoveVolume(ctx, volume)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good for the group we are creating as part of this function? what about this if the group already created for volumeGroupReplication?

Comment on lines +161 to +163
// FIXME: more checking of the request in needed
// 1. verify that all snapshots in the request are all snapshots in the group
// 2. delete the group
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it going to be done in a followup PR?

Comment on lines +177 to +183
err = groupSnapshot.Delete(ctx)
if err != nil {
return nil, status.Errorf(
codes.Internal,
"failed to delete volume group snapshot %q: %v",
groupSnapshot, err)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to take care of deleting all the clones isn't it? just checking to make sure we are good here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants