Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-cephfsplugin E1224 rados: Invalid argument: "invalid value specified for ceph.dir.subvolume" #5041

Open
Pivert opened this issue Dec 22, 2024 · 4 comments

Comments

@Pivert
Copy link

Pivert commented Dec 22, 2024

Describe the bug

New PVCs are stuck in "Pending" state, and we can see many lines like this in the provisioner:

invalid value specified for ceph.dir.subvolume

Example PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: testpvc
  namespace: test
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: csi-cephfs-sc
  volumeMode: Filesystem

A clear and concise description of what the bug is.
New PVs & PVCs are not created anymore

Environment details

  • Helm chart version : Just upgraded to the latest version, but it did not fix the problem. The problem originally came in while using 3.9.0 for more than 6 months without any problem.
$helm list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
ceph-csi-cephfs ceph-csi-cephfs 2               2024-12-22 22:10:08.088066185 +0000 UTC deployed        ceph-csi-cephfs-3.13.0  3.13.0     
$kgoyaml sc csi-cephfs-sc 
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    meta.helm.sh/release-name: ceph-csi-cephfs
    meta.helm.sh/release-namespace: ceph-csi-cephfs
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2023-10-09T02:15:09Z"
  labels:
    app: ceph-csi-cephfs
    app.kubernetes.io/managed-by: Helm
    chart: ceph-csi-cephfs-3.13.0
    heritage: Helm
    release: ceph-csi-cephfs
  name: csi-cephfs-sc
  resourceVersion: "338433307"
  uid: 9026a977-bc57-49bc-9d12-a69b98fc254c
parameters:
  clusterID: e7628d51-32b5-4f5c-8eec-1cafb41ead74
  csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-cephfs
  fsName: cephfs
  volumeNamePrefix: rke
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
  • Kernel version :
    • Server: Linux pve1 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux
    • Client (Worker): Linux worker1 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC
  • Mounter used for mounting PVC: kernel
  • Kubernetes cluster version : 1.31 (but had the problem with 1.30 already)
  • Ceph cluster version : 18.2.4 - But it was working before with that same ceph version.

Steps to reproduce

  • Configure Ceph + ceph-csi on cephfs with kernel 5.x
  • Upgrade your cluster to kernel 6.x (especially your workers/clients)
  • The PV won't be created anymore, and the error "invalid value specified for ceph.dir.subvolume" will appear in the logs.

Actual results

"invalid value specified for ceph.dir.subvolume"

Describe what happened
The most probable change that triggered the problem, is the upgrade of the worker from Ubuntu LTS 22.04 to 24.04, also upgrading the kernel from 5.x to 6.x.

Expected behavior

PV to create and PVC to bind

A clear and concise description of what you expected to happen.

Logs

ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:06.727521       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"test/testpvc\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:06.795100       1 controller.go:951] "Retrying syncing claim" key="1c28ee51-2abd-4453-a3aa-0966aab528fe" failures=0
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner E1222 22:25:06.795126       1 controller.go:974] error syncing claim "1c28ee51-2abd-4453-a3aa-0966aab528fe": failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:06.795217       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"csi-cephfs-sc\": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: \"invalid value specified for ceph.dir.subvolume\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:07.295695       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"test/testpvc\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:07.359538       1 controller.go:951] "Retrying syncing claim" key="1c28ee51-2abd-4453-a3aa-0966aab528fe" failures=1
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner E1222 22:25:07.359591       1 controller.go:974] error syncing claim "1c28ee51-2abd-4453-a3aa-0966aab528fe": failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:07.359620       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"csi-cephfs-sc\": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: \"invalid value specified for ceph.dir.subvolume\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:08.359928       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"test/testpvc\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:08.422300       1 controller.go:951] "Retrying syncing claim" key="1c28ee51-2abd-4453-a3aa-0966aab528fe" failures=2
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner E1222 22:25:08.422328       1 controller.go:974] error syncing claim "1c28ee51-2abd-4453-a3aa-0966aab528fe": failed to provision volume with StorageClass "csi-cephfs-sc": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: "invalid value specified for ceph.dir.subvolume"
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:08.422428       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="ProvisioningFailed" message="failed to provision volume with StorageClass \"csi-cephfs-sc\": rpc error: code = Internal desc = rados: ret=-22, Invalid argument: \"invalid value specified for ceph.dir.subvolume\""
ceph-csi-cephfs-provisioner-d555f5858-6dkkr csi-provisioner I1222 22:25:10.423574       1 event.go:389] "Event occurred" object="test/testpvc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Provisioning" message="External provisioner is provisioning volume for claim \"test/testpvc\""

Didn't find anything relevant in dmesg.

@Rakshith-R
Copy link
Contributor

hey @Pivert ,
Can you provide complete logs as specified in issue template https://github.com/ceph/ceph-csi/issues/new?template=bug_report.md

# Logs #

If the issue is in PVC creation, deletion, cloning please attach complete logs
of below containers.

- csi-provisioner and csi-rbdplugin/csi-cephfsplugin container logs from the
  provisioner pod.

@Pivert
Copy link
Author

Pivert commented Dec 23, 2024

ceph-csi-cephfs_2024-12-23.log.xz.gz

Hi, please find a full log gathered with:
k stern -l app=ceph-csi-cephfs --no-follow |xz > ceph-csi-cephfs_$(date -I).log.xz
The log starts from the upgrade 3.9->3.13 - but problem is the same.

I just had to gzip it, because it seems xz attachment is not supported in this forum.

Regards,

@Rakshith-R
Copy link
Contributor

ceph-csi-cephfs_2024-12-23.log.xz.gz

Hi, please find a full log gathered with: k stern -l app=ceph-csi-cephfs --no-follow |xz > ceph-csi-cephfs_$(date -I).log.xz The log starts from the upgrade 3.9->3.13 - but problem is the same.

I just had to gzip it, because it seems xz attachment is not supported in this forum.

Regards,

This is the nodeplugin logs.
I need logs from the provisioner

@Pivert
Copy link
Author

Pivert commented Dec 23, 2024

That's a stern, just grep on provisioner, you'll have all the logs (there are 3 provisioners). After restoring the xz with gunzip:

xz -dc ceph-csi-cephfs_2024-12-23.log.xz |rg provisioner

Adding ceph fs status

root@pve1:~# ceph fs status
cephfs - 18 clients
======
RANK  STATE   MDS      ACTIVITY     DNS    INOS   DIRS   CAPS  
 0    active  pve2  Reqs:   16 /s   901k   773k  31.9k  38.9k  
      POOL         TYPE     USED  AVAIL  
cephfs_metadata  metadata  2334M  1253G  
  cephfs_data      data    1524G  1253G  
STANDBY MDS  
    pve1     
    pve3     
MDS version: ceph version 18.2.4 (2064df84afc61c7e63928121bfdd74c59453c893) reef (stable)

@Pivert Pivert changed the title cephfs csi : Invalid argument: "invalid value specified for ceph.dir.subvolume" csi-cephfsplugin : Invalid argument: "invalid value specified for ceph.dir.subvolume" Dec 24, 2024
@Pivert Pivert changed the title csi-cephfsplugin : Invalid argument: "invalid value specified for ceph.dir.subvolume" csi-cephfsplugin E1224 rados: Invalid argument: "invalid value specified for ceph.dir.subvolume" Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants