Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Apply deletion permission checks when syncing with replace (#14161) #20720

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andrii-korotkov-verkada
Copy link
Contributor

@andrii-korotkov-verkada andrii-korotkov-verkada commented Nov 8, 2024

Closes #14161

Sync with replace can be quite dangerous, but we currently allow it if the sync is allowed. Since sync with replace deletes resources and recreates them, it's reasonable to check deletion permissions for application and/or specific resources.

Let's cherry-pick this to 2.13.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@andrii-korotkov-verkada andrii-korotkov-verkada requested a review from a team as a code owner November 8, 2024 15:02
Copy link

bunnyshell bot commented Nov 8, 2024

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch 2 times, most recently from 34f1ce4 to 2c6b63a Compare November 8, 2024 15:08
Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on the contributors' call I think I'm opposed to this change.

First, I think that people with "sync" access are generally going to be people with git access. They have the ability to delete resources directly through git. So this would be primarily a safety mechanism, not a security mechanism. Using RBAC (a security mechanism) to apply a safety check seems like poor design.

Second, the behavior for syncing specific resources is strange and confusing. If I specify resources in my sync request, then RBAC is checked for each of those resources. But if I do not specify the resources, then I can sync the whole app (including all contained resources) without resource-specific delete permissions. I also must have both app-level and resource-specific delete permissions if I specify resources, which is different from how other fine-grained access works.

Third, I think requiring delete permissions will encourage admins to grant app delete access too liberally. Today, sync access grants the ability to delete resources (via prune and replace) but not the app itself. This change would force people who need sync+replace to grant applications, delete access which permits deleting the app itself.

Fourth, it's unclear how the RBAC will be enforced when the sync is requested without replace but the replace sync policy is set via annotation at the resource level.

Finally, if I'm correct that the primary need is for a safety check and not for a security mechanism, I think we can solve basically the same problem with improved UI/CLI warnings and prompts.

Even if we decide that this change should be made, we can't make it without putting it behind a flag, since it's a breaking API change.

If we decide that some form of this change makes sense, I think it should be via some RBAC mechanism other than delete (maybe sync+replace or something like that), and it needs to be more robust (handle resource annotations, etc.).

@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, here are some answers

First, I think that people with "sync" access are generally going to be people with git access. They have the ability to delete resources directly through git. So this would be primarily a safety mechanism, not a security mechanism. Using RBAC (a security mechanism) to apply a safety check seems like poor design.

Git updates are usually peer reviewed or can be made so, adding an extra security.

Second, the behavior for syncing specific resources is strange and confusing. If I specify resources in my sync request, then RBAC is checked for each of those resources. But if I do not specify the resources, then I can sync the whole app (including all contained resources) without resource-specific delete permissions. I also must have both app-level and resource-specific delete permissions if I specify resources, which is different from how other fine-grained access works.

With how I implemented the check, if resources are specified and you have a deletion permissions for each of them, you don't need a whole app deletion permission. I'm not sure if UI/CLI always pass a list of resources, but assumed that 0 resources mean whole app and treated accordingly.

Third, I think requiring delete permissions will encourage admins to grant app delete access too liberally. Today, sync access grants the ability to delete resources (via prune and replace) but not the app itself. This change would force people who need sync+replace to grant applications, delete access which permits deleting the app itself.

You can configure resources deletion permission via action delete/*/*/* instead of delete. From what I understand, that won't allow to delete apps. Also, I don't expect admins to allow replace access by default but rather make it via some requestable role through Opal or similar.

Fourth, it's unclear how the RBAC will be enforced when the sync is requested without replace but the replace sync policy is set via annotation at the resource level.

That's something I've missed and there might be no easy way to enforce this on the server level. But I think my change is an incremental improvement anyways, and if people specify annotations which is peer reviewed, so be it.

Finally, if I'm correct that the primary need is for a safety check and not for a security mechanism, I think we can solve basically the same problem with improved UI/CLI warnings and prompts.

From the experience in my company, double-warning we do with replace (show a warning when checkbox is checked and show a confirmation dialog) didn't prevent people from using replace sometimes. I think permissions is better than just option to hide the button. Please, let me know if the issue is company-specific (which doesn't seem so to me, but idk).

Even if we decide that this change should be made, we can't make it without putting it behind a flag, since it's a breaking API change.

I don't quite follow - what breaks in API? I didn't update any schemas.

Overall, I think the change can be good as is, maybe with some documentation updates and more comments, which I'll add.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from 2c6b63a to 086ed9f Compare November 8, 2024 16:04
@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Nov 8, 2024

Actually it's quite important if UI/CLI pass a list of resources. When selecting sync manually in UI, I see a list of resources with checkboxes, so kinda assume it does pass it. But if it doesn't, we may indeed have a problem and need to fetch all app resources, which server may not support without a network call.

@crenshaw-dev
Copy link
Member

Git updates are usually peer reviewed or can be made so, adding an extra security.

Sure. I think that just lends to the idea that this is more of a safety mechanism than a security mechanism and should probably be implemented in that way.

With how I implemented the check, if resources are specified and you have a deletion permissions for each of them, you don't need a whole app deletion permission.

Ah, I misread the code. I follow now.

I'm not sure if UI/CLI always pass a list of resources, but assumed that 0 resources mean whole app and treated accordingly.

The UI/CLI don't always pass a list of resources. 0 resources does mean "sync the whole app." Since the API doesn't know the full list of resources at sync time (and can't know that list, because the request and the sync are processed asynchronously), it can't reliably apply resource-level RBAC rules.

You can configure resources deletion permission via action delete///* instead of delete.

If I understand the code correctly, you only hit the fine-grained checks after you verify that the whole-app delete permission check passes.

Also, I don't expect admins to allow replace access by default

I think that's too broad an assumption. Many admins give users full GitOps control over their cluster/namespace and want to include replace replace access by default.

That's something I've missed and there might be no easy way to enforce this on the server level. But I think my change is an incremental improvement anyways, and if people specify annotations which is peer reviewed, so be it.

I think it's confusing to use a security mechanism (RBAC) to limit "replace" functionality but allow bypassing that security mechanism via either GitOps or update access (the use can just add the annotation to the resource via the API).

From the experience in my company, double-warning we do with replace (show a warning when checkbox is checked and show a confirmation dialog) didn't prevent people from using replace sometimes.

I think that's fair, people don't heed warnings.

I think permissions is better than just option to hide the button.

I'm not sure those are the only two options. A third option would be to add an Application field which completely blocks Replace actions. A fourth would be to add that configuration at the AppProject level. I think there are other ways that we should consider before locking in on an RBAC-based solution.

let me know if the issue is company-specific (which doesn't seem so to me, but idk).

I don't think the problem is company-specific, but I think the solution too narrowly fits the needs of a subset of companies. I think it could work great for that subset but would be a step backwards for others.

I don't quite follow - what breaks in API? I didn't update any schemas.

A change in behavior is still a breaking change, even if no schemas are altered.

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Nov 8, 2024

If I understand the code correctly, you only hit the fine-grained checks after you verify that the whole-app delete permission check passes.

No, it's the opposite. If the whole app check fails, I check for individual resources (if any).
If app check passes, I don't do other checks and just continue.

@andrii-korotkov-verkada
Copy link
Contributor Author

I think RBAC should be applied still, because if people don't have permissions to delete resources, they shouldn't have permissions to replace resources.

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Nov 8, 2024

The behavior of my code is consistent with https://argo-cd.readthedocs.io/en/stable/operator-manual/rbac/#fine-grained-permissions-for-updatedelete-action, in particular with the note.

@crenshaw-dev
Copy link
Member

No, it's the opposite. If the whole app check fails, I check for individual resources (if any).
If app check passes, I don't do other checks and just continue.

Ah! Understood. It only took me reading the code three times to get it. :-)

if people don't have permissions to delete resources, they shouldn't have permissions to replace resources.

If this truly needs to be a security-level check, then I think it should be handled at the application controller level rather than the API level. There should be a new sync mode which prevents deletion (either by prune or by replace) and then fails the sync operation if any resource would be deleted. There should be a new RBAC option something like sync-no-delete so that the API can verify that the user's intent is allowed and delegate the responsibility of actually enforcing that intent to the app controller.

But I'm still not convinced this is a security rather than a safety check.

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Nov 8, 2024

I don't claim to have a full solution for sync with replace permissions, but I think these permissions checks need to be done to be consistent with delete permissions checks, and it so happens to also solve the issue for my company. For companies where everyone has full delete access, but admins want to prohibit sync with replace, this won't work. But I think it's still better than nothing.

@andrii-korotkov-verkada
Copy link
Contributor Author

But I'm still not convinced this is a security rather than a safety check.

If users don't have delete permissions I think we should prohibit sync with replace for security reasons too.

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Nov 8, 2024

But I think it's still better than nothing.

A security mechanism that's unreliable or confusing can be worse than no security mechanism at all.

I think the mechanism, as implemented, is confusing.

If a user syncs with replace enabled without specifying resources, they must also have the ability to delete the whole application. That's weird. If they specify resources, we verify that they're allowed to delete those specific resources whether those resources would actually be replaced for this specific sync operation or not. But if the user doesn't specify resources, and replace is enabled for some specific resource via annotation, even if delete access is explicitly denied for that resource, the resource will be replaced. And if the annotation isn't currently set, and I have resource edit access, I can just add the annotation and bypass the restriction.

To be a good security mechanism, all those weird quirks need to be ironed out.

If users don't have delete permissions I think we should prohibit sync with replace for security reasons too.

As far as I know, no one on the original issue has described this as a security problem. They're mainly concerned that someone will accidentally (not maliciously) delete a resource by syncing with replace enabled.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from 086ed9f to 964abf4 Compare November 8, 2024 17:02
@andrii-korotkov-verkada
Copy link
Contributor Author

Hm, I see your point. I'll think more about this.
Meanwhile, I've made a small improvement to also check for all resources delete permission if app deletion permission fails. This way, admins can deny app deletion, but allow all resources deletion.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from 964abf4 to f3de372 Compare November 8, 2024 17:04
@andrii-korotkov-verkada
Copy link
Contributor Author

As far as I know, no one on the original issue has described this as a security problem. They're mainly concerned that someone will accidentally (not maliciously) delete a resource by syncing with replace enabled.

Security risk is more theoretical here, but from the point of consistency on how we do checks I think it still makes sense to check for deletion permissions. Although, sync with replace is theoretically less dangerous that just delete. cc @agaudreault.

if err := s.enf.EnforceErr(ctx.Value("claims"), rbacpolicy.ResourceApplications, rbacpolicy.ActionDelete, a.RBACName(s.ns)); err != nil {
if len(resources) == 0 {
// If no resources are specified, check for ability to delete any resource
action := fmt.Sprintf("%s/*/*/*/*", rbacpolicy.ActionDelete)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could encourage over-permissioning. I don't necessarily want to grant access to delete "anything in the app," I might just want to grant access to delete "these X few things that are actually in the app."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might, though it's hard to specify a generic rule allowing sync with replace for any application. I guess that's the best admins would be able to do.

I might just want to grant access to delete "these X few things that are actually in the app."

Then people can use the annotation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people can use the annotation

I still think it's weird that the sync permission is "prevent delete unless there's this annotation explicitly allowing delete." Letting an annotation overrule RBAC just isn't intuitive to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, though with this PR there'd be less discrepancy. Also, annotation bypass could be a feature, not a bug, if people really want to selectively replace some stuff but not the other. I've updated the documentation to talk about this.

@andrii-korotkov-verkada
Copy link
Contributor Author

Are there any scenarios where we want to enable sync with replace while disabling some delete permissions?

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from f3de372 to 349e6a1 Compare November 8, 2024 17:16
@andrii-korotkov-verkada andrii-korotkov-verkada requested a review from a team as a code owner November 8, 2024 17:16
@andrii-korotkov-verkada
Copy link
Contributor Author

I've made a small documentation update talking about annotations bypassing the RBAC check for deletion.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch 2 times, most recently from 35898b0 to 6aed0a2 Compare November 8, 2024 17:23
@andrii-korotkov-verkada
Copy link
Contributor Author

Updated the tests to cover more cases.

@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from 6aed0a2 to df24ade Compare November 8, 2024 17:25
…proj#14161)

Closes argoproj#14161

Sync with replace can be quite dangerous, but we currently allow it if the sync is allowed. Since sync with replace deletes resources and recreates them, it's reasonable to check deletion permissions for application and/or specific resources.

Let's cherry-pick this to 2.13.

Signed-off-by: Andrii Korotkov <[email protected]>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 14161-apply-deletion-permission-check-for-sync-with-replace branch from df24ade to 0a8b1c2 Compare November 8, 2024 17:35
@andrii-korotkov-verkada
Copy link
Contributor Author

Added a documentation in RBAC doc as well.

@agaudreault
Copy link
Member

agaudreault commented Nov 8, 2024

@andrii-korotkov-verkada Perhaps this should be discussed in a proposal/issue instead of a PR directly. I think it will be easier and more efficient to understand the different use cases, and how to implement with one paragraph rather than code at this point.

I think this is more complicated than what it looks like. We need to consider the following, and probably more:

  • if a resource has an annotation to allow replace, should it require delete permission? Probably not because it has been set in git as the expected behavior.
  • if a user does a partial sync replace of resources, should it check for delete? I don't think the fine-grained delete rbac is a 100% match here. Deleting a resource manually, and letting "kubernetes manage the lifecycle" with replace is slightly different. Using the same rbac, lets say allowing delete on statefulSet for the sake of a replace is not a good design choice IMO.
  • What about sync with prune? Although this is closer to the fine-grained delete, it is still not the same as deleting the actual resources. These are in git, so i assume the validation was done before the commit. So pruning resources removed from the desired state is a different intent than deleting resources that are still in the desired state.
  • What about syncing the whole app vs syncing one resource? Should we have a fine-grained sync implemented?

In brief, I think we are still at the drawing board and brainstorming phase rather than at the implementation. Before having any code written, i would personally need to know what would happen to the use cases above (and the ones i haven't thought of).

In general, i think a user action should be validated by the rbac, but I am really not sure the fine-grained delete can be used to supplement the sync rbac. Perhaps when thinking about the use cases above, new rbac action like "sync-replace" or "sync-prune" will be necessary, perhaps not!

Copy link

codecov bot commented Nov 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.15%. Comparing base (e2bc96b) to head (0a8b1c2).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20720      +/-   ##
==========================================
- Coverage   55.17%   55.15%   -0.03%     
==========================================
  Files         324      324              
  Lines       55257    55275      +18     
==========================================
- Hits        30489    30486       -3     
- Misses      22142    22171      +29     
+ Partials     2626     2618       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent unintended usage of "replace"
3 participants