Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent creation of no-expiry silences #4159

Closed
agoddard opened this issue Jan 8, 2021 · 14 comments
Closed

Prevent creation of no-expiry silences #4159

agoddard opened this issue Jan 8, 2021 · 14 comments
Labels
Milestone

Comments

@agoddard
Copy link
Contributor

agoddard commented Jan 8, 2021

Feature Suggestion

As a workflow practice, oftentimes creating silences without expiration (or "unlimited expiration" silences) is problematic for large teams where it's easy to forget a silence is in place. These cases may call for policies for teams to "please never create silences without expiration" but there is presently no way to enforce this in the API. It would be great if there was an opt-in configuration which prevented the creation of silences without expirations.

Possible Implementation

Due to the possibility of a breaking change if this behavior were changed by default, a mechanism to opt-in to this change could provide a way to implement it without having to wait until a major version change.

@agoddard
Copy link
Contributor Author

agoddard commented Jan 8, 2021

A workaround is possible here by creating a sensu check which hits the API and reports on the presence of any silences without expirations, however that creates a reactive workflow and isn't ideal.

@calebhailey calebhailey added the component:api Sensu API improvements label Jan 20, 2021
@agoddard
Copy link
Contributor Author

If we do make an opt-in change to the behavior, would it be possible to include the option of setting a default silence period, so that for example when no expiry is set, then all silences fall-back to the default config?

@calebhailey
Copy link

Perhaps we could make this a non-breaking change by adding a new sensu-backend --minimum-silence-duration flag that defaults to 0 (no expiration), but could be increased to a non-zero value.

Is it possible to make Silenced API resource validation compare resources against this setting?

@agoddard
Copy link
Contributor Author

@calebhailey I like that approach

@calebhailey
Copy link

@portertech has pointed out that this would need to be a cluster-level configuration setting (similar to web/v1.GlobalConfig), and that this is getting into "policy enforcement" territory, which would require broader consideration (i.e. how would we handle policy across the entire product). TBD

@calebhailey
Copy link

Noting for posterity that this could be implemented as a Sensu secret (API Key) + check + filter + handler template (one YAML file) that does all of the following with no changes to Sensu:

  • check the silence API for silences w/ expiry:0
  • handler set to alert on first occurrence (e.g. Slack notification)
  • "remediation" action to delete the Silence on Nth occurrence (which occurrence wouldn't be reached if the silence was updated with an expiration)
  • this "delete after Nth occurrence" logic would be roughly equivalent to a "policy"
  • instead of "delete after Nth occurrence" we could "update after Nth occurrence" where the update sets the silence expiration to the default value.

@calebhailey
Copy link

calebhailey commented Mar 15, 2021

Another customer has requested a similar enhancement to require the use of expire_on_resolve: true.

See: https://secure.helpscout.net/conversation/1423316647/24517?folderId=1931916

@jspaleta jspaleta self-assigned this Jul 6, 2022
@calebhailey calebhailey added this to the 6.8.0 milestone Jul 20, 2022
@calebhailey calebhailey assigned calebhailey and unassigned jspaleta Aug 19, 2022
@echlebek
Copy link
Contributor

Moving to 6.9.0 in favour of shipping 6.8.0 earlier.

@echlebek echlebek modified the milestones: 6.8.0, 6.9.0 Aug 22, 2022
@c-kruse c-kruse assigned c-kruse and unassigned calebhailey Sep 2, 2022
@c-kruse
Copy link
Contributor

c-kruse commented Sep 7, 2022

I do not think that a config flag or a global config resource like web/v1.GlobalConfig is the right answer here. We should probably rework silences in 7.0 to require an expiration method, but within 6.x this feels like a rather large time commitment for a fairly small change that I suspect will be questionably acceptable.

I suggest we begin perusing one of these options instead:

  • Policy management
    Either built in to sensu or a part of sensu-flow Policy Enforcement sensu-flow#24 to allow operators to configure policies more flexibly than our RBAC implementation. In this example a policy could prevent non-admins from creating no-expiry silences.
  • A generic sensu resource monitoring check that can be configured to find resources using a pattern (i.e. api/v2.Silenced.Spec.Expire > 0) that can be configured to work in conjunction with the sensu-remediation-handler or similar to remove the offending resources.
  • A single use sensu no-expiry resource check that can be set up to alert operators and potentially work with the sensu-remediation-handler or similar to remove the silences.

@fguimond
Copy link
Contributor

@portertech @agoddard - before he left Caleb mentioned he was thinking about implementing this as a restriction in the UI. He was supposed to add the details before he left but it slipped through the cracks. Do you know more details about this?

@echlebek
Copy link
Contributor

@fguimond I would imagine Caleb means doing validation client-side that prevents such silences from being created, but not restricting them altogether API-wise.

@portertech portertech modified the milestones: 6.9.0, 6.10.0 Sep 28, 2022
@c-kruse c-kruse removed their assignment Sep 28, 2022
@portertech portertech added component:web-ui Sensu dashboard improvements and removed component:api Sensu API improvements labels Oct 21, 2022
@portertech
Copy link
Contributor

We believe requiring a Web UI user to explicitly set an expiration for a silence is sufficient.

@gd8 @dabria making expire a required form field is step 1, however, what can be done to improve the experience of setting the field?

@gd8
Copy link

gd8 commented Jan 23, 2023

If we make the field required, a better UX would definitely be to have a default value (surprise, surprise) – we'd then have to decide where this default value comes from.

An option is to have it as a value in the GlobalConfig defaulted to 0 as Caleb suggested above. However, based on comments from @portertech @c-kruse, it seems that we'd want silence expiry to be more granular (e.g. by cluster) – is this still the case if the GlobalConfig field is controlling a default value in the Web-UI form?

@gd8
Copy link

gd8 commented Feb 1, 2023

Ok the engineering team discussed this briefly today here in a slack thread

My understanding of the general consensus is that adding global configuration for silence expiration unnecessarily complicates the product and isn't worth it because it doesn't affect every user.

@agoddard my proposed alternative is to make it easier to find/surface silences with no expiration by adding a filter or sorting mechanism to the silences list in the Web UI. This way, teams concerned with these silences could bring them up easily to delete any problematic ones. Do you think this would solve or alleviate the original issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants