Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

rahullepakshi · 2024-08-28T15:38:11Z

This might be tough to implement as RBD image metadata are part of different omap state files but we need to find a way to restrict creation of namespaces with RBD images which are already part of other GW groups as namespaces, as these volumes can be accessed by multiple initiators causing data inconsistency. With huge scale, say 1K to 4K RBD images, it will be difficult for user to keep track of used/unused images to create namespaces . Please let me know your thoughts.

idryomov · 2024-08-29T11:43:24Z

At the RBD level, this can be done with rbd_lock_acquire() API, passing in RBD_LOCK_MODE_EXCLUSIVE for lock_mode. Internally this is implemented by disabling automatic exclusive lock transitions, so it is an option only for images with exclusive-lock feature enabled.

Another option is to employ advisory locking at the RADOS level, placing a lock on image's rbd_header.XYZ object using rados_lock_exclusive() API.

Yet another option might to be use RBD per-image metadata, but it wouldn't be atomic unlike an approach that involves locks.

How any of these approaches would interact with HA and images being potentially moved between groups would need to be investigated.

caroav · 2024-08-29T11:54:41Z

@idryomov I think that the request is more about the configuration. Not about which gw is using the image to do IO. Even if a all GWs in the group are in maintenance for example, the request is to not allow the user to use the same images for other nvmeof namespaces in another group. Given that, do yo still think that rbd_lock_acquire() API addresses it?

Does it make sense in this case to use RBD namespaces? I.e. a namespaces per gw group?

idryomov · 2024-08-29T12:04:36Z

Even if a all GWs in the group are in maintenance for example, the request is to not allow the user to use the same images for other nvmeof namespaces in another group. Given that, do yo still think that rbd_lock_acquire() API addresses it?

Likely not. If so, I think it should be enforced through OMAP state files^Wobjects, even if that is not entirely trivial.

idryomov · 2024-08-29T12:14:08Z

(Rahul reached out to me asking specifically for input from the RBD perspective on "setting a flag/lock on a image", so I may have misinterpreted this.)

Does it make sense in this case to use RBD namespaces? I.e. a namespaces per gw group?

An RBD namespace can be thought of as a directory within a pool. If the concern is an operator exporting an image that they shouldn't be exporting at that moment (because it's already exported), but in general they should be able to (meaning that it's not a matter of access control), I don't see how placing images in namespaces would make a difference.

caroav · 2024-08-29T12:34:29Z

I'm not familiar with rbd namespaces. But my thinking was that if in each state file (omap) of a group, we access different rbd namespace (i.e. we look in a namespace), then maybe we could avoid the mix. But then it probably means that nvmeof users will need to create the images in the right rbd namespace.

idryomov · 2024-08-29T13:58:37Z

This might be too restrictive. Would a scenario of images being "moved" between groups be common? An operator wanting to e.g. export an image through group1 today and through group2 tomorrow seems reasonable to me.

gbregman · 2024-11-04T09:25:32Z

A partial solution could be to add some unique prefix to the image name in case we create it in the gateway. Similar to what we do when we create a subsystem.

rahullepakshi added the bug Something isn't working label Aug 28, 2024

github-project-automation bot added this to NVMe-oF Aug 28, 2024

github-project-automation bot moved this to 🆕 New in NVMe-oF Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

rahullepakshi commented Aug 28, 2024

idryomov commented Aug 29, 2024

caroav commented Aug 29, 2024

idryomov commented Aug 29, 2024

idryomov commented Aug 29, 2024

caroav commented Aug 29, 2024

idryomov commented Aug 29, 2024 •

edited

Loading

gbregman commented Nov 4, 2024

Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

Restrict addition of namespaces with RBD images which are already part of other NVMeOF GW groups as namespaces to avoid data corruption #833

Comments

rahullepakshi commented Aug 28, 2024

idryomov commented Aug 29, 2024

caroav commented Aug 29, 2024

idryomov commented Aug 29, 2024

idryomov commented Aug 29, 2024

caroav commented Aug 29, 2024

idryomov commented Aug 29, 2024 • edited Loading

gbregman commented Nov 4, 2024

idryomov commented Aug 29, 2024 •

edited

Loading