-
Notifications
You must be signed in to change notification settings - Fork 499
LOG-4782: Workload Identity through Microsoft Entra ID For Azure Monitor Logs #1786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,295 @@ | ||
--- | ||
title: workload-identity-auth-for-azure-monitor-logs | ||
authors: | ||
- "@calee" | ||
reviewers: | ||
- "@jcantrill" | ||
- "@alanconway" | ||
approvers: | ||
- "@jcantrill" | ||
- "@alanconway" | ||
api-approvers: | ||
- "@jcantrill" | ||
- "@alanconway" | ||
creation-date: 2025-04-30 | ||
last-updated: 2025-05-13 | ||
status: implementable | ||
tracking-link: | ||
- https://issues.redhat.com/browse/LOG-4782 | ||
--- | ||
|
||
# Workload Identity Auth for Azure Monitor Logs | ||
|
||
## Release Sign-off Checklist | ||
|
||
- [X] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [X] Test plan is defined | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
[Azure Monitor Logs](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/data-platform-logs) is a comprehensive service provided by Microsoft Azure that enables the collection, analysis, and actioning of telemetry data across various Azure and on-premises resources. | ||
|
||
This proposal enhances the `ClusterLogForwarder` API by introducing a new output type for Azure Monitor Logs, transitioning from the deprecated [Data Collector API](https://learn.microsoft.com/en-us/previous-versions/azure/azure-monitor/logs/data-collector-api) to the new [Log Ingestion API](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/logs-ingestion-api-overview). The implementation will include secure, short-lived authentication using federated tokens with [Microsoft Entra Workload Identity (WID)](https://learn.microsoft.com/en-us/entra/workload-id/workload-identities-overview), as well as long-lived credentials utilizing client secrets. This proposal will leverage the pending upstream Vector PR, [azure_logs_ingestion](https://github.com/vectordotdev/vector/pull/22912) feature. | ||
|
||
## Motivation | ||
|
||
The current Azure Monitor Logs integration relies on long-lived credentials via [shared_key](https://learn.microsoft.com/en-us/previous-versions/azure/azure-monitor/logs/data-collector-api?tabs=powershell#authorization) and a deprecated API posing potential security risks. Adopting WID will enhance security by providing short-lived credential access to Azure Monitor Logs and eliminate the dependency on this deprecated, soon-to-be-retired API. | ||
|
||
### User Stories | ||
|
||
- As an administrator, I want to be able to forward logs from my OpenShift cluster to Azure Monitor Logs using federated tokens, thus removing the need for long-lived, static credentials. | ||
- As a developer, I want to continue to forward logs from my OpenShift cluster to Azure Monitor Logs when the data collector API is retired. | ||
- As an administrator, I want continue forwarding logs to Azure Monitor Logs with long-lived credentials utilizing the Log Ingestion API. | ||
|
||
### Goals | ||
|
||
- Extend the ClusterLogging Operator's API with a new Azure output type utilizing the Log Ingestion API. | ||
- Allow authentication using short-lived federated token credentials with the Log Ingestion API. | ||
- Allow authentication using static long-lived credentials with the Log Ingestion API. | ||
|
||
### Non-Goals | ||
|
||
## Proposal | ||
|
||
To realize the goals of this enhancement: | ||
|
||
- Switch to the new Azure Log Ingestion sink once it is implemented in upstream Vector. | ||
- See #1 in [implementation details](#implementation-detailsnotesconstraints) section. | ||
- Update upstream Vector's Rust [Azure Identity](https://github.com/Azure/azure-sdk-for-rust/tree/main/sdk/identity/azure_identity) client library to `v0.23.0`. | ||
- See #2 in [implementation details](#implementation-detailsnotesconstraints) section. | ||
- Extend upstream Vector's Azure Log Ingestion sink to accept additional configuration for workload identity authentication and client secrets. | ||
- Add a new output type to `ClusterLogForwarder` that supports Azure's Log Ingestion API. | ||
- Include workload identity authentication. | ||
- Include long-lived credentials through `client secrets`. | ||
|
||
### Workflow Description | ||
|
||
The Vector collector will do the following for workload identity authentication: | ||
|
||
1. Determine the authentication type to be workload identity when a token path is provided. | ||
2. Retrieve the OpenShift service account token from the local volume. | ||
3. Exchange the OpenShift token with Microsoft identity platform for a short-lived access token. | ||
4. Use the access token in the log forwarding request to Azure Monitor Logs. | ||
|
||
The Vector collector will do the following for credential secret authentication: | ||
|
||
1. Determine the authentication type to be client secret when a client secret is provided. | ||
2. Request an access token from the Microsoft identity platform using the client secret. | ||
3. Use the access token in the log forwarding request to Azure Monitor Logs. | ||
|
||
The ClusterLogForwarder will: | ||
|
||
1. Conditionally project the service account token if workload identity authentication is configured. | ||
2. Create the collector configuration with the required fields for the Log Ingestion API, including either the path to the projected service account token or the client secret. | ||
|
||
### Proposed API | ||
|
||
#### New `AzureLogIngestion` output type | ||
|
||
Output | ||
|
||
```Go | ||
// New output type of AzureLogIngestion which can live side by side with the AzureMonitor output | ||
type AzureLogIngestion struct { | ||
// The Data Collection Endpoint (DCE) or Data Collection Rule (DCR) logs ingestion endpoint URL. | ||
// | ||
// https://learn.microsoft.com/en-us/azure/azure-monitor/logs/logs-ingestion-api-overview#endpoint | ||
// | ||
// +kubebuilder:validation:Required | ||
// +kubebuilder:validation:XValidation:rule="isURL(self)", message="invalid URL" | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Url",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
URL string `json:"url"` | ||
|
||
// The Data Collection Rule's (DCR) Immutable ID | ||
// | ||
// A unique identifier for the data collection rule. This property and its value are automatically created when the DCR is created. | ||
// | ||
// https://learn.microsoft.com/en-us/azure/azure-monitor/data-collection/data-collection-rule-structure#properties | ||
// | ||
// +kubebuilder:validation:Required | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="DCR Immutable ID",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
DcrImmutableId string `json:"dcrImmutableId"` | ||
|
||
// The stream in the Data Collection Rule (DCR) that should handle the custom data | ||
// | ||
// https://learn.microsoft.com/en-us/azure/azure-monitor/data-collection/data-collection-rule-structure#input-streams | ||
// | ||
// +kubebuilder:validation:Required | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Stream Name",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
StreamName string `json:"streamName"` | ||
|
||
// Authentication sets credentials for authenticating the requests. | ||
// | ||
// +kubebuilder:validation:Required | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Authentication Options",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
Authentication *AzureLogIngestionAuth `json:"authentication"` | ||
|
||
// Tuning specs tuning for the output | ||
// | ||
// +kubebuilder:validation:Optional | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Tuning Options" | ||
Tuning *BaseOutputTuningSpec `json:"tuning,omitempty"` | ||
} | ||
``` | ||
|
||
Authentication | ||
|
||
```Go | ||
type AzureLogIngestionAuth struct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pattern elsewhere is to include an auth type enum so we can ensure there is only one defined or is it possible to not define secret or client? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I initially had an enum for the type of auth but this is where @cahartma had some comments about not including an auth type and having the client figure it out based on the provided information. (Unification of cloud output auths) Here the I was thinking you can define either or and have vector use the correct credential flow without providing an explicit type. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
is this a valid scenario where an admin could/would define both? In other auth scenarios do you usually define token and username/password. I would argue you do not. If we are not going to define a type enum, then we should at least specify an order of precedence if that is something we can control and enforce, assuming this is a valid use-case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I think precedence is a good way to restrict it. Maybe token first, then clientsecret |
||
// ClientId points to the secret containing the client ID used for authentication. | ||
// | ||
// This is the application ID that's assigned to your app. You can find this information in the portal where you registered your app. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Stray comment? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No this is the definition of the |
||
// | ||
// +kubebuilder:validation:Required | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Secret with Client ID",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
ClientId *SecretReference `json:"clientId"` | ||
|
||
// TenantId points to the secret containing the tenant ID used for authentication. | ||
// | ||
// The directory tenant the application plans to operate against, in GUID or domain-name format. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be a place where we can add validation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What kind of validation should we add here? This is information obtained from Azure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think we can validate these, though if we dont have precedence then I think it is reasonable to skip. If you are not getting your logs maybe you need to look closer at auth There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is more like, did you copy/paste the right information. It's like putting in a password, if you type it wrong it won't be valid. If the GUID or domain-name is wrong, it doesn't matter if it conforms to the right pattern, it is still wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. at least we can add validation for non-whitespace string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does that matter if it's a required field? |
||
// | ||
// https://learn.microsoft.com/en-us/azure/azure-monitor/data-collection/data-collection-rule-structure#properties | ||
// | ||
// +kubebuilder:validation:Required | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Secret with Tenant ID",xDescriptors={"urn:alm:descriptor:com.tectonic.ui:text"} | ||
TenantId *SecretReference `json:"tenantId"` | ||
|
||
// Token specifies a bearer token to be used for authenticating requests. | ||
// | ||
// https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-client-creds-grant-flow#third-case-access-token-request-with-a-federated-credential | ||
// | ||
// +kubebuilder:validation:Optional | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Bearer Token" | ||
Token *BearerToken `json:"token,omitempty"` | ||
|
||
// ClientSecret points to the secret containing the client secret. | ||
// A client secret is a secret string that the application uses to prove its identity when requesting a token. Also can be referred to as application password. | ||
// | ||
// https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-client-creds-grant-flow#first-case-access-token-request-with-a-shared-secret | ||
// | ||
// +kubebuilder:validation:Optional | ||
// +operator-sdk:csv:customresourcedefinitions:type=spec,displayName="Secret with client secret" | ||
ClientSecret *SecretReference `json:"clientSecret,omitempty"` | ||
} | ||
``` | ||
|
||
- The `TenantId` and `ClientId` can be retrieved from the [CCO utility secret](#cco-utility-secret) generated when OpenShift is configured for [workload identity for Azure](https://github.com/openshift/cloud-credential-operator/blob/9c3346aea5a7f9a38713c09d11605b8ee825446c/docs/azure_workload_identity.md). | ||
|
||
#### Additional configuration fields for upstream `Vector`'s Log Ingestion Sink API | ||
|
||
```Rust | ||
pub struct AzureLogsIngestionConfig { | ||
/// The [Client Secret][client_secret] for authentication. | ||
/// A secret string that the application uses to prove its identity when requesting a token. Also can be referred to as application password. | ||
/// | ||
/// [client_secret]: https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-client-creds-grant-flow#first-case-access-token-request-with-a-shared-secret | ||
#[configurable(metadata(docs::examples = "qWgdYAmab0YSkuL1qKv5bPX"))] | ||
pub client_secret: Option<String>, | ||
|
||
/// The [Federated Token File Path][federated_token_file_path] pointing to a federated token for authentication. | ||
/// | ||
/// [token_path]: https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-client-creds-grant-flow#third-case-access-token-request-with-a-federated-credential | ||
#[configurable(metadata(docs::examples = "/path/to/my/token"))] | ||
pub federated_token_file_path: Option<String>, | ||
|
||
/// The [Tenant ID][tenant_id] for authentication. | ||
/// | ||
/// The directory tenant the application plans to operate against, in GUID or domain-name format. | ||
/// | ||
/// [tenant_id]: https://learn.microsoft.com/en-us/azure/azure-monitor/data-collection/data-collection-rule-structure#properties | ||
#[configurable(metadata(docs::examples = "11111111-2222-3333-4444-555555555555"))] | ||
pub tenant_id: String, | ||
|
||
/// The [Client ID][client_id] for authentication. | ||
/// | ||
/// The client ID is the application ID that's assigned to your app. You can find this information in the portal where you registered your app. | ||
/// | ||
/// [client_id]: https://learn.microsoft.com/en-us/entra/identity-platform/v2-oauth2-client-creds-grant-flow#third-case-access-token-request-with-a-federated-credential | ||
#[configurable(metadata(docs::examples = "11111111-2222-3333-4444-555555555555"))] | ||
pub client_id: String, | ||
} | ||
``` | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
1. Relies on [this upstream vector PR](https://github.com/vectordotdev/vector/pull/22912) to implement the Azure Log Ingestion sink utilizing the Log Ingestion API. This is a separate sink from the data collector API. We can support both APIs while transitioning. | ||
2. Current `master` branch of [upstream Vector](https://github.com/vectordotdev/vector), `>=v0.46.1`, as of 04/29/2025, utilizes `[email protected]` which relies solely on environment variables for workload identity credentials and will not be sufficient when forwarding logs to multiple different Azure sinks. The aforementioned PR relies on `[email protected]`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like we identify the same dependency version |
||
- `[email protected]` allows for setting `client_id`, `tenant_id`, etc. for authentication. | ||
- [Workload Identity Credentials [email protected] SDK Ref](https://github.com/Azure/azure-sdk-for-rust/blob/azure_identity%400.23.0/sdk/identity/azure_identity/src/credentials/workload_identity_credentials.rs) | ||
- `v0.23.0` can also utilize the [ChainedTokenCredential](https://github.com/Azure/azure-sdk-for-rust/blob/azure_identity%400.23.0/sdk/identity/azure_identity/src/chained_token_credential.rs) struct which provides a user-configurable `TokenCredential` authentication flow for applications. | ||
3. Additional fields will be required in sink configuration in upstream Vector's API. See [proposed API](#proposed-api) above. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you or considered mentioning the dependency and additional config in the upstream to see if it can be added as part of the initial impl? it may be worth trying There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can ask about it in the upstream to see if they can update it. |
||
4. A separate output type, `AzureLogIngestion`, will be added to work alongside the existing `AzureMonitor` output. When Microsoft retires the Data Collector API, the `AzureMonitor` sink can also be deprecated and retired. | ||
5. Long lived credentials will be implemented via `client secret` in the `AzureLogIngestion` output in the CLO API. | ||
|
||
#### Constraints | ||
|
||
- As of `v6.2`, CLO relies on `v0.37.0` of OpenShift Vector. OpenShift's Vector will have to be upgraded; however, the upgrade is currently blocked by the Rust version for RHEL. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe its 0.37.1 but may not really matter |
||
|
||
#### CCO Utility Secret | ||
|
||
After the creation of a managed identity using the CCO utility, the following secret is created and can be used for authentication: | ||
|
||
```yaml | ||
apiVersion: v1 | ||
kind: Secret | ||
metadata: | ||
name: azure-test-secret | ||
namespace: openshift-logging | ||
type: Opaque | ||
stringData: | ||
azure_client_id: 11111111-2222-3333-4444-555555555555 | ||
azure_tenant_id: 11111111-2222-3333-4444-555555555555 | ||
azure_region: westus | ||
azure_subscription_id: 11111111-2222-3333-4444-555555555555 | ||
azure_federated_token_file: /var/run/secrets/openshift/serviceaccount/token | ||
``` | ||
- The `azure_federated_token_file` cannot be used because the CLO projects the service account token to a custom path. (e.g `/var/run/ocp-collector/serviceaccount/token`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if there is something we can do to make these the same so we can use it.. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would have to project the token to the path specified in the secret since this key/value is automatically generated by the CCO util |
||
|
||
### Open Questions | ||
|
||
1. Do we also want to implement long-lived credential support using the Log Ingestion API? | ||
- We will most likely implement long-lived tokens at the same time in the form of client secrets. | ||
2. Do we want to start deprecating the fields for the HTTP data collector API? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we don't already then maybe it is worth adding a note to the output type and a link to the feature request? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean adding a note to the output type above? I think we can add a note about it's deprecation etc to the existing output and let users know to use the new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I meant create a PR against CLO and add the deprecated hints and a link to the epic or feature JIRA issue |
||
- The addition of a distinct output type will allow for easier deprecation of the existing API. | ||
|
||
### Test Plan | ||
|
||
- Manual E2E tests will require access to Azure accounts and an OpenShift cluster configured to use Azure's workload identity. | ||
|
||
## Alternatives (Not Implemented) | ||
|
||
### Vector's Azure Identity Rust SDK | ||
|
||
- Consider adding a patch to `azure_identity` crate to allow setting `client_id`, `tenant_id`, etc. instead of relying on environment variables for workload identity credentials until Vector updates its crate version. See [implementation details](#implementation-detailsnotesconstraints). | ||
|
||
### Risks and Mitigations | ||
|
||
- As of `v6.2`, CLO relies on `v0.37.0` of OpenShift Vector. OpenShift's Vector will have to be upgraded; however, the upgrade is currently blocked by the Rust version for RHEL. | ||
|
||
### Drawbacks | ||
|
||
## Design Details | ||
|
||
### Graduation Criteria | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
#### Tech Preview -> GA | ||
|
||
#### Removing a deprecated feature | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
### Version Skew Strategy | ||
|
||
### Operational Aspects of API Extensions | ||
|
||
#### Failure Modes | ||
|
||
#### Support Procedures | ||
|
||
## Implementation History | ||
|
||
### API Extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this allow templating? This sounds like it is the "tenant" which could be a dynamic value? Maybe there is a concern with "grouping" here and the way batches of records are submitted that would make it not feasible to support a template...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could template this but it wouldn't be ideal. The stream isn't something that is created on the fly. You can think of the
streamName
as an existing table within Azure. It must be defined and exist in order for logs to be able to flow into it.If we template this, users must ensure the table exists within Azure otherwise they wouldn't get the logs.
So if they want to template
{.log_type}-{.namespace_name}-my-azure-stream
, they must ensure all possible renditions oflog_type
andnamespace_name
, depending on what they want to collect, exist within Azure.e.g
application-ns1-my-azure-stream
,application-ns2-my-azure-stream
,infrastructure-default-my-azure-stream
,... etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a future enhancement if needed. I wonder if we need a validation to restrict templating? I guess it depends what precedence we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need validation. We can just tell users that templating is not supported. I also think it depends on upstream Vector's sink and their configurable fields if it allows templating. If it doesn't it would just throw an error.