Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: support QoS based on capacity for rbd volume #5016

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

YiteGu
Copy link
Member

@YiteGu YiteGu commented Dec 12, 2024

  1. QoS provides settings for rbd volume read/write iops and read/write bandwidth.
  2. All QoS parameters are placed in the SC, send QoS parameters from SC to Cephcsi through PVC create request.
  3. We need provide QoS parameters in the SC as below:
    • BaseReadIops
    • BaseWriteIops
    • BaseReadBytesPerSecond
    • BaseWriteBytesPerSecond
    • ReadIopsPerGB
    • WriteIopsPerGB
    • ReadBpsPerGB
    • WriteBpsPerGB
    • BaseVolSizeBytes
      There are 4 base qos parameters among them, when users apply for a volume capacity equal to or less than BaseVolSizebytes, use base qos limit. For the portion of capacity exceeding BaseVolSizebytes, QoS will be increased in steps set per GB. If the step size parameter per GB is not provided, only base QoS limit will be used and not associated with capacity size.
  4. If PVC has resize request, adjust the QoS limit according to the QoS parameters after resizing.

Describe what this PR does

Provide some context for the reviewer

Is there anything that requires special attention

Do you have any questions?

Is the change backward compatible?

Are there concerns around backward compatibility?

Provide any external context for the change, if any.

For example:

  • Kubernetes links that explain why the change is required
  • CSI spec related changes/catch-up that necessitates this patch
  • golang related practices that necessitates this change

Related issues

Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.

Fixes: #issue_number

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 8a47010 to c07a41a Compare December 12, 2024 10:20
@YiteGu YiteGu changed the title cephcsi: support QoS based on capacity for rbd volume rbd: support QoS based on capacity for rbd volume Dec 12, 2024
@mergify mergify bot added the component/rbd Issues related to RBD label Dec 12, 2024
@YiteGu YiteGu force-pushed the rbd-qos-support branch 3 times, most recently from 63ea87d to f60f7c3 Compare December 12, 2024 12:06
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is really a nice feature to have. A couple of comments that need to be addressed to improve the maintainability and usability.

  • move all qos related functions, structs, constants to a new qos.go file
  • all new functions that receive a *rbdVolume as argument should become a function of the rbdVolume struct (see example for the SetQOS function)
  • new parameters for the StorageClass need to be added to the documentation
  • documentation should explain that the QoS settings are currently only used in combination with the NBD mounter
  • ideally add unit tests for functions that do not interact with a Ceph cluster
  • ideally add some form or e2e testing, maybe set QoS parameters in the StorageClass, and verify that those are set on an RBD-image?

Thanks again, this is a great start!

internal/rbd/controllerserver.go Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
@nixpanic nixpanic added the enhancement New feature or request label Dec 13, 2024
@YiteGu YiteGu force-pushed the rbd-qos-support branch 10 times, most recently from 2b895ad to 31ce867 Compare December 18, 2024 08:29
@YiteGu
Copy link
Member Author

YiteGu commented Dec 18, 2024

/test ci/centos/mini-e2e/k8s-1.31

internal/rbd/qos_test.go Outdated Show resolved Hide resolved
@nixpanic
Copy link
Member

Looks quite complete, many thanks for adding the unit and e2e tests!

@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

not sure why that did not work, but the CI is running now (takes ~2 hours to complete): https://jenkins-ceph-csi.apps.ocp.cloud.ci.centos.org/job/mini-e2e_k8s-1.31/283/display/redirect

@YiteGu
Copy link
Member Author

YiteGu commented Dec 18, 2024

Looks quite complete, many thanks for adding the unit and e2e tests!

Write it briefly to verify that e2e testing is available. More scenes need to be added. :-)

@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 4d90640 to 0c09729 Compare December 19, 2024 11:05
@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 47bd9ab to 795ea78 Compare December 19, 2024 11:30
@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu YiteGu force-pushed the rbd-qos-support branch 3 times, most recently from bd4d17e to 4774a82 Compare December 20, 2024 05:05
@YiteGu
Copy link
Member Author

YiteGu commented Dec 20, 2024

/test ci/centos/mini-e2e/k8s-1.31

1 similar comment
@YiteGu
Copy link
Member Author

YiteGu commented Dec 20, 2024

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu YiteGu requested a review from nixpanic December 20, 2024 09:39
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 19, 2025
@nixpanic nixpanic removed the stale label Jan 21, 2025
@nixpanic nixpanic requested a review from Madhu-1 January 21, 2025 10:15
nixpanic
nixpanic previously approved these changes Jan 21, 2025
@nixpanic nixpanic requested a review from a team January 21, 2025 13:08
| `BaseWriteIops` | no | the base limit of write operations per second |
| `BaseReadBytesPerSecond` | no | the base limit of read bytes per second |
| `BaseWriteBytesPerSecond` | no | the base limit of write bytes per second |
| `ReadIopsPerGB` | no | the limit of read operations per GiB |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the key GB is used and in the comments its mentioned GiB is that correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, there is confuse, we should use unit of bytes(IEC).

| `WriteIopsPerGB` | no | the limit of write operations per GiB |
| `ReadBpsPerGB` | no | the limit of read bytes per GiB |
| `WriteBpsPerGB` | no | the limit of write bytes per GiB |
| `BaseVolSizeBytes` | no | the min size of volume what use to calc qos beased on capacity |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calc to calculate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -73,6 +73,15 @@ make image-cephcsi
| `stripeUnit` | no | stripe unit in bytes |
| `stripeCount` | no | objects to stripe over before looping |
| `objectSize` | no | object size in bytes |
| `BaseReadIops` | no | the base limit of read operations per second |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can all these operations can be supported for krbd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these operations can be supported for krbd, but we need to set QoS by cgroup after get QoS limit result by calcQosBasedOnCapacity. if we support burst QoS, krbd can not be support.

e2e/rbd.go Outdated
Comment on lines 4648 to 4652
qosParameters := map[string]string{
"BaseReadIops": "2000",
"BaseWriteIops": "1000",
"BaseReadBytesPerSecond": "209715200",
"BaseWriteBytesPerSecond": "104857600",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please define a var for numeric values as its used in two places to have single source

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated


pvc.Namespace = f.UniqueName
pvc.Spec.Resources.Requests[v1.ResourceStorage] = resource.MustParse("100Gi")
wants = map[string]string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen for cloned PVC or PVC created from snapshot, are they going to get the same values or different values?

Can you please add a test case of that as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloned PVC or PVC created from snapshot get the same values after my testing on the ceph.

Comment on lines 236 to 238
if errors.Is(err, librbd.ErrNotFound) {
// rbdQosPerGBType does not exist, set it empty.
perGBLimit = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need explict setting it to empty string here , AFAIK it returns empty string in case of an error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 239 to 234
} else if err != nil {
log.ErrorLog(ctx, "failed to get metadata: %s. %v", param.rbdQosPerGBType, err)

return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we dont need if and else if check here. we can have single check err !=nil && !errors.Is(err, librbd.ErrNotFound)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 247 to 244
if errors.Is(err, librbd.ErrNotFound) {
// rbdBaseQosVolSize does not exist, set it empty.
baseVolSize = ""
} else if err != nil {
log.ErrorLog(ctx, "failed to get metadata: %s. %v", baseQosVolSize, err)

return nil, err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here as well we dont need two checks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

) error {
rbdQosParameters, err := getRbdImageQOS(ctx, rv)
if err != nil {
log.ErrorLog(ctx, "get rbd image qos failed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log the error and also why we are only logging here not in other places in this function?

Copy link
Member Author

@YiteGu YiteGu Jan 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this log deleted

return rbdQosParameters
}

func (rv *rbdVolume) SetQOS(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can use rbImage instead of rbdVolume for most of the methods getting added in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any particular opinion on this point, I respect your suggestion. :)

@@ -162,6 +162,38 @@ parameters:
# stripeCount: <>
# (optional) The object size in bytes.
# objectSize: <>

# rbd volume QoS.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use https://kubernetes.io/docs/concepts/storage/volume-attributes-classes/#the-volumeattributesclass-api as these values are not immutable as anything mentioned in the SC is mutable?

@mergify mergify bot dismissed nixpanic’s stale review January 26, 2025 09:48

Pull request has been modified.

@YiteGu
Copy link
Member Author

YiteGu commented Jan 26, 2025

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu
Copy link
Member Author

YiteGu commented Jan 26, 2025

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu
Copy link
Member Author

YiteGu commented Jan 27, 2025

/test ci/centos/mini-e2e/k8s-1.31

1. QoS provides settings for rbd volume read/write iops
   and read/write bandwidth.
2. All QoS parameters are placed in the SC,
   send QoS parameters from SC to Cephcsi through PVC create request.
3. We need provide QoS parameters in the SC as below:
   - BaseReadIops
   - BaseWriteIops
   - BaseReadBytesPerSecond
   - BaseWriteBytesPerSecond
   - ReadIopsPerGB
   - WriteIopsPerGB
   - ReadBpsPerGB
   - WriteBpsPerGB
   - BaseVolSizeBytes
   There are 4 base qos parameters among them, when users apply for
   a volume capacity equal to or less than BaseVolSizebytes, use base
   qos limit. For the portion of capacity exceeding BaseVolSizebytes,
   QoS will be increased in steps set per GB. If the step size parameter
   per GB is not provided, only base QoS limit will be used and not associated
   with capacity size.
4. If PVC has resize request, adjust the QoS limit
   according to the QoS parameters after resizing.

Signed-off-by: Yite Gu <[email protected]>
@YiteGu
Copy link
Member Author

YiteGu commented Jan 27, 2025

/test ci/centos/mini-e2e/k8s-1.31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants