From 3edfd7429d745cc09568ed1a26fefb393e784fca Mon Sep 17 00:00:00 2001 From: Khaled Emara Date: Tue, 10 Dec 2024 17:24:32 +0200 Subject: [PATCH] doc(KDP): add policy status KDP Signed-off-by: Khaled Emara --- proposals/policy_status.md | 170 +++++++++++++++++++++++++++++++++++++ 1 file changed, 170 insertions(+) create mode 100644 proposals/policy_status.md diff --git a/proposals/policy_status.md b/proposals/policy_status.md new file mode 100644 index 0000000..f76cbab --- /dev/null +++ b/proposals/policy_status.md @@ -0,0 +1,170 @@ +# Meta + +[meta]: #meta + +- Name: Kyverno Policy Status Readiness Evaluation +- Start Date: 2024-12-10 +- Author(s): @KhaledEmaraDev + +# Table of Contents + +[table-of-contents]: #table-of-contents + +- [Meta](#meta) +- [Table of Contents](#table-of-contents) +- [Overview](#overview) +- [Definitions](#definitions) +- [Motivation](#motivation) +- [Proposal](#proposal) +- [Implementation](#implementation) +- [Migration](#migration) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Unresolved Questions](#unresolved-questions) +- [CRD Changes](#crd-changes) + +# Overview + +[overview]: #overview + +This KDP proposes a comprehensive status reporting mechanism for Kyverno Policies. The status will reflect the operational readiness of a policy, considering factors like webhook configuration, caching, RBAC permissions, and schema validation. This allows users to quickly identify and diagnose issues preventing their policies from functioning correctly. + +# Definitions + +[definitions]: #definitions + +- **Policy**: Custom Resource Definition representing a Kyverno policy configuration +- **Webhook**: Kubernetes admission controller mechanism for intercepting and potentially modifying resource requests +- **Validating Webhook**: Kubernetes webhook that validates resource configurations +- **Mutating Webhook**: Kubernetes webhook that can modify resource configurations before admission +- **Policy Cache**: An internal Kyverno mechanism that stores policies for quick reference and processing +- **RBAC**: Role-Based Access Control, determining permissions for Kubernetes resources +- **Policy Status**: Current operational condition of a Kyverno policy (Ready, Partially Ready, Not Ready) +- **Schema Validation**: The process of verifying that a policy definition conforms to the expected schema + +# Motivation + +[motivation]: #motivation + +- The current implementation of Kyverno policies can lead to confusion and frustration when policies are not applied as expected +- Provide clear, granular visibility into policy operational status +- Enable administrators to quickly understand policy deployment challenges +- Create a robust mechanism for tracking policy readiness across complex Kubernetes environments +- Support effective troubleshooting of policy configuration issues + +# Proposal + +The policy status determination will be based on four key evaluation criteria, each would have its own status consition: + +1. **Webhook Configuration Validation** + + - Policies define rules that are configured in either validating or mutating webhooks. + - If an error occurs during webhook configuration, the policy will be Not Ready if it solely relies on the failed webhook type. + - If the other webhook type is successfully configured, the policy can be marked Partially Ready. + - Policies are marked Ready only when all required webhooks are configured without error. + +2. **Policy Caching Verification** + + - A policy must exist in the Policy cache of at least the leader replica to be considered Ready. + - Missing from the cache indicates a Not Ready state. + +3. **RBAC Permission Verification** + + - Policies requiring permissions not granted to the Admission Controller will be Not Ready. + - Detailed feedback should be provided about the missing permissions. + +4. **Schema Validation** + + - If a policy fails schema validation, it is considered Not Ready. + - Specific validation errors must be logged to guide resolution. + + +# Implementation + +## Detailed Readiness Evaluation + +### 1. Webhook Configuration Validation + +- Separately track validating and mutating webhook configurations +- Rules for status: + - Both webhooks configured successfully: **Ready** + - One webhook fails configuration: + - If policy requires only that webhook type: **Not Ready** + - If policy can function with alternative webhook: **Partially Ready** + +### 2. Policy Caching + +- Check cache presence on: + - Leader node (mandatory) + - Optional: All replica nodes +- Failure to cache on leader results in **Not Ready** status + +### 3. RBAC Permission Verification + +- Dynamically inspect required vs. available permissions +- Insufficient permissions trigger **Not Ready** status +- Comprehensive permission mapping required + +### 4. Schema Validation + +- Perform exhaustive schema validation during policy admission +- Any schema validation error results in **Not Ready** status +- Provide detailed error messages for troubleshooting + +## Proposed Status Transition Matrix + +| Condition | Validating WH | Mutating WH | Cache | RBAC | Schema | Status | +| ------------------------- | ------------- | ----------- | ----- | ---- | ------ | ----------------- | +| All Conditions Successful | ✓ | ✓ | ✓ | ✓ | ✓ | Ready | +| Validating WH Fails | ✗ | ✓ | ✓ | ✓ | ✓ | Not Ready | +| Mutating WH Fails | ✓ | ✗ | ✓ | ✓ | ✓ | Not Ready/Partially Ready | +| Cache Missing | ✓ | ✓ | ✗ | ✓ | ✓ | Not Ready | +| RBAC Insufficient | ✓ | ✓ | ✓ | ✗ | ✓ | Not Ready | +| Schema Invalid | ✓ | ✓ | ✓ | ✓ | ✗ | Not Ready | + +# Migration + +Any automation code that tracks the Policy readiness would have to look at four different conditions to determine the readiness of the policy. + +# Drawbacks + +- Increased complexity in policy status tracking +- Potential performance overhead from comprehensive validation + +# Alternatives + +- Simplified status tracking with fewer criteria +- Binary (Ready/Not Ready) instead of three-state status +- Consolidated status tracking for all validation dimension + +# Unresolved Questions + +- Should additional conditions (e.g., external dependency checks) be included in readiness evaluation? +- What is the acceptable delay for updating readiness status under high load? + +# CRD Changes + +Updates to Policy CRD to include: + +- New `status.conditions` fields for tracking webhook, cache, RBAC, and schema validation statuses + +```yaml +status: + conditions: + - type: WebhookConfigured + status: True|False + reason: + message: + - type: CachePresence + status: True|False + reason: + message: + - type: RBACPermission + status: True|False + reason: + message: + - type: SchemaValid + status: True|False + reason: + message: +```