-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement (un)stage/(un)publish volume according csi spec for mount mode #2195
Implement (un)stage/(un)publish volume according csi spec for mount mode #2195
Conversation
62d8f3b
to
2e70895
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Я вижу потенциальную проблему с рестартом кублета при переезде на новую схему: когда он рестартует то восстанавливает картину мира, полагаясь на идемпотентность запросов - просто повторно посылает NodeStage и NodePublish всем вольюмам, которые по его мнению должны быть замонтированы.
Если диск был подключен со старым драйвером Stage попробует стартовать эндпоинт по новому пути и я предполагаю что возникнет конфликт (если NBS в этом месте не останавливает автоматически старый эндпоинт).
В случае ВМ у нас в качесте признака что эндпоинт стартует на Stage является атрибут instanceId, для инфракубера я не вижу как можно отличить старый диск от нового.
83eece4
to
d7dafeb
Compare
Можно попробовать так: |
d7dafeb
to
5496f67
Compare
cloud/blockstore/tools/csi_driver/stage-publish-unpublish-unstage-flow.md
Outdated
Show resolved
Hide resolved
cloud/blockstore/tools/csi_driver/stage-publish-unpublish-unstage-flow.md
Outdated
Show resolved
Hide resolved
93ca36c
to
520bc78
Compare
Note This is an automated comment that will be appended during run. 🔴 linux-x86_64-relwithdebinfo: some tests FAILED for commit 520bc78.
🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 520bc78.
|
Problem:
CSI Driver implementation violates CSI specification in terms of stage/publish/unstage/unpublish volumes.
At this moment StageVolume step is completely ignored and start endpoint/mounting volumes happens at PublishVolume step. As a result CSI Driver doesn't support ReadWriteOnce access mode in the correct way and only one pod on the same node can mount the volume, however it should be allowed to mount the same volume into multiple pods on the same node.
According to CSI Driver specification:
As we already have current implementation of CSI Driver in production clusters we need to handle migration from existing implementation of mounting volumes(only NodePublishVolume/NodeUnpublishVolume is implemented) to the new implementation.
The tricky part here is using different UnixSocketPath/InstanceId/ClientId for already bounded volumes and "new" volumes.
Current format of UnixSocketPath:
socketsDir/podId/volumeId
New format of UnixSocketPath:
socketsDir/nodeId/volumeId
Current format of InstanceId:
podId
New format of InstanceId:
nodeId
Current format of ClientId:
clientID-podId
New format of ClientId:
clientID-nodeId
Possible scenarios:
Migration is splitted for differnt modes
VM mode: #1982
Mount mode: #2195
Block mode: #2269
After migration of all volumes to the new endpoints we can remove backward compatibility
with old format of endpoints.
External links/documentation
https://github.com/container-storage-interface/spec/blob/master/spec.md#node-service-rpc