Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

Open
rahullepakshi opened this issue Aug 28, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@rahullepakshi
Copy link
Contributor

This might be tough to implement as RBD image metadata are part of different omap state files but we need to find a way to restrict creation of namespaces with RBD images which are already part of other GW groups as namespaces, as these volumes can be accessed by multiple initiators causing data inconsistency. With huge scale, say 1K to 4K RBD images, it will be difficult for user to keep track of used/unused images to create namespaces . Please let me know your thoughts.

@rahullepakshi rahullepakshi added the bug Something isn't working label Aug 28, 2024
@github-project-automation github-project-automation bot moved this to 🆕 New in NVMe-oF Aug 28, 2024
@idryomov
Copy link
Contributor

At the RBD level, this can be done with rbd_lock_acquire() API, passing in RBD_LOCK_MODE_EXCLUSIVE for lock_mode. Internally this is implemented by disabling automatic exclusive lock transitions, so it is an option only for images with exclusive-lock feature enabled.

Another option is to employ advisory locking at the RADOS level, placing a lock on image's rbd_header.XYZ object using rados_lock_exclusive() API.

Yet another option might to be use RBD per-image metadata, but it wouldn't be atomic unlike an approach that involves locks.

How any of these approaches would interact with HA and images being potentially moved between groups would need to be investigated.

@caroav
Copy link
Collaborator

caroav commented Aug 29, 2024

@idryomov I think that the request is more about the configuration. Not about which gw is using the image to do IO. Even if a all GWs in the group are in maintenance for example, the request is to not allow the user to use the same images for other nvmeof namespaces in another group. Given that, do yo still think that rbd_lock_acquire() API addresses it?

Does it make sense in this case to use RBD namespaces? I.e. a namespaces per gw group?

@idryomov
Copy link
Contributor

Even if a all GWs in the group are in maintenance for example, the request is to not allow the user to use the same images for other nvmeof namespaces in another group. Given that, do yo still think that rbd_lock_acquire() API addresses it?

Likely not. If so, I think it should be enforced through OMAP state files^Wobjects, even if that is not entirely trivial.

@idryomov
Copy link
Contributor

(Rahul reached out to me asking specifically for input from the RBD perspective on "setting a flag/lock on a image", so I may have misinterpreted this.)

Does it make sense in this case to use RBD namespaces? I.e. a namespaces per gw group?

An RBD namespace can be thought of as a directory within a pool. If the concern is an operator exporting an image that they shouldn't be exporting at that moment (because it's already exported), but in general they should be able to (meaning that it's not a matter of access control), I don't see how placing images in namespaces would make a difference.

@caroav
Copy link
Collaborator

caroav commented Aug 29, 2024

I'm not familiar with rbd namespaces. But my thinking was that if in each state file (omap) of a group, we access different rbd namespace (i.e. we look in a namespace), then maybe we could avoid the mix. But then it probably means that nvmeof users will need to create the images in the right rbd namespace.

@idryomov
Copy link
Contributor

idryomov commented Aug 29, 2024

This might be too restrictive. Would a scenario of images being "moved" between groups be common? An operator wanting to e.g. export an image through group1 today and through group2 tomorrow seems reasonable to me.

@gbregman
Copy link
Contributor

gbregman commented Nov 4, 2024

A partial solution could be to add some unique prefix to the image name in case we create it in the gateway. Similar to what we do when we create a subsystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants