Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: do not execute rbd sparsify when volume is in use #3985

Merged
merged 1 commit into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions internal/csi-addons/rbd/reclaimspace.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@ package rbd

import (
"context"
"errors"
"fmt"

csicommon "github.com/ceph/ceph-csi/internal/csi-common"
rbdutil "github.com/ceph/ceph-csi/internal/rbd"
"github.com/ceph/ceph-csi/internal/util"
"github.com/ceph/ceph-csi/internal/util/log"

"github.com/container-storage-interface/spec/lib/go/csi"
rs "github.com/csi-addons/spec/lib/go/reclaimspace"
Expand Down Expand Up @@ -69,6 +71,13 @@ func (rscs *ReclaimSpaceControllerServer) ControllerReclaimSpace(
defer rbdVol.Destroy()

err = rbdVol.Sparsify()
if errors.Is(err, rbdutil.ErrImageInUse) {
// FIXME: https://github.com/csi-addons/kubernetes-csi-addons/issues/406.
// treat sparsify call as no-op if volume is in use.
log.DebugLog(ctx, fmt.Sprintf("volume with ID %q is in use, skipping sparsify operation", volumeID))

return &rs.ControllerReclaimSpaceResponse{}, nil
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
}
if err != nil {
// TODO: check for different error codes?
return nil, status.Errorf(codes.Internal, "failed to sparsify volume %q: %s", rbdVol, err.Error())
Expand Down
11 changes: 11 additions & 0 deletions internal/rbd/diskusage.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,18 @@ import (
// Sparsify checks the size of the objects in the RBD image and calls
// rbd_sparify() to free zero-filled blocks and reduce the storage consumption
// of the image.
// This function will return ErrImageInUse if the image is in use, since
// sparsifying an image on which i/o is in progress is not optimal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to sound like a broken record, but I have to ask again: do you have any data that would support automating rbd sparsify runs at all? In particular:

  • Have you done any research into how likely they are to be encountered?
  • Do you have ceph df outputs captured before and after rbd sparsify is run for typical customer scenarios?

If the answer is no, I don't understand why rbd sparsify is being conditioned on isInUse() instead of just being dropped entirely (to be resurrected as an optional step when the relevant CSI addon spec allows taking options from the user). Sparsifying the image the way Ceph CSI does it is likely not optimal in general, not just when there is competing I/O.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on one heavily used cluster, I suspect I've seen 33% to 100% extra usage without sparsify. Hadn't had the chance to run it offline and haven't had the guts to run it online. This makes it sound like it was good for me to be hesitent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- I would suggest starting with fstrim (assuming you aren't using raw block PVs).

func (ri *rbdImage) Sparsify() error {
inUse, err := ri.isInUse()
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
return fmt.Errorf("failed to check if image is in use: %w", err)
}
if inUse {
// if the image is in use, we should not sparsify it, return ErrImageInUse.
return ErrImageInUse
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
}

image, err := ri.open()
if err != nil {
return err
Expand Down
2 changes: 2 additions & 0 deletions internal/rbd/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,6 @@ var (
ErrFetchingMirroringInfo = errors.New("failed to get mirroring info of image")
// ErrResyncImageFailed is returned when the operation to resync the image fails.
ErrResyncImageFailed = errors.New("failed to resync image")
// ErrImageInUse is returned when the image is in use.
ErrImageInUse = errors.New("image is in use")
)