Skip to content

Commit

Permalink
Documentation improvements, mostly just reorganize content for easier…
Browse files Browse the repository at this point in the history
… reading
  • Loading branch information
jvoravong committed Feb 27, 2025
1 parent c4f6cd7 commit 83ce345
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 80 deletions.
6 changes: 3 additions & 3 deletions .chloggen/migration-operator-helm-certs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ issues: [1648]
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
- For users enabling both the operator and certmanager (.Values.operator.enabled=true, .Values.certmanager.enabled=true), please review the [upgrade guidelines](https://github.com/signalfx/splunk-otel-collector-chart/blob/main/UPGRADING.md#0119-to-0120).
- Previously, certificates were generated by certmanager by default; now they are generated by Helm unless specified otherwise.
- This change simplifies setup for new users while still supporting those who prefer certmanager.
- Previously, certificates were generated by cert-manager by default; now they are generated by Helm templates unless configured otherwise.
- This change simplifies the setup for new users while still supporting those who prefer using cert-manager or other solutions. For more details, see the [related documentation](https://github.com/signalfx/splunk-otel-collector-chart/tree/main/docs/auto-instrumentation-install.md#tls-certificate-requirement-for-kubernetes-operator-webhooks).
- If you use `.Values.operator.enabled=true` and `.Values.certmanager.enabled=true`, please review the [upgrade guidelines](https://github.com/signalfx/splunk-otel-collector-chart/blob/main/UPGRADING.md#0119-to-0120).
109 changes: 32 additions & 77 deletions docs/auto-instrumentation-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,84 +458,13 @@ helm template splunk-otel-collector-chart/splunk-otel-collector --include-crds \
| kubectl delete --dry-run=client -f -
```

### Documentation Resources

- https://developers.redhat.com/devnation/tech-talks/using-opentelemetry-on-kubernetes
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#instrumentation
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#opentelemetry-auto-instrumentation-injection
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#use-customized-or-vendor-instrumentation

### Troubleshooting the Operator and Cert Manager

#### Check the logs for failures

**Operator Logs:**

```bash
kubectl logs -l app.kubernetes.io/name=operator
```

**Cert-Manager Logs:**

```bash
kubectl logs -l app=certmanager
kubectl logs -l app=cainjector
kubectl logs -l app=webhook
```

#### Operator Issues

##### Networking and Firewall Requirements

Ensure the Mutating Webhook used by the operator for pod auto-instrumentation is not hindered by network policies or firewall rules. Key points to ensure:

- **Webhook Accessibility**: The webhook must freely communicate with the cluster IP and the Kubernetes API server. Ensure network policies or firewall rules permit operator-related services to interact with these endpoints.
- **Required Ports**: Policies should explicitly allow traffic to the necessary ports for seamless operation.

Use the following command to identify the IP addresses and ports that need to be accessible:

```bash
kubectl get svc -n {operator_namespace}
# Example output indicating necessary IP and port configurations:
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 10d
# splunk-splunk-otel-collector-agent ClusterIP 10.0.176.113 <none> 8006/TCP,14250/TCP,14268/TCP,... 3d17h
# splunk-splunk-otel-collector-operator ClusterIP 10.0.254.125 <none> 8443/TCP,8080/TCP 3d17h
# splunk-splunk-otel-collector-operator-webhook ClusterIP 10.0.222.223 <none> 443/TCP 3d17h
```

- **Configuration Action**: Adjust your network policies and firewall settings based on the service endpoints and ports listed by the command. This ensures the webhook and operator services can properly communicate within the cluster.

#### Cert-Manager Issues

If the operator seems to be hanging, it could be due to the cert-manager not auto-creating the required certificate. To troubleshoot:

- Check the health and logs of the cert-manager pods for potential issues.
- Consider restarting the cert-manager pods.
- Ensure that your cluster has only one instance of cert-manager, which should include `certmanager`, `certmanager-cainjector`, and `certmanager-webhook`.

For additional guidance, refer to the official cert-manager documentation:
- [Troubleshooting Guide](https://cert-manager.io/docs/troubleshooting/)
- [Uninstallation Guide](https://cert-manager.io/v1.2-docs/installation/uninstall/kubernetes/)

##### Validate Certificates

Ensure that the certificate, which the cert-manager creates and the operator utilizes, is available.

```bash
kubectl get certificates
# NAME READY SECRET AGE
# splunk-otel-collector-operator-serving-cert True splunk-otel-collector-operator-controller-manager-service-cert 5m
```

#### TLS Certificate Requirement for Kubernetes Operator Webhooks
### TLS Certificate Requirement for Kubernetes Operator Webhooks

In Kubernetes, the API server communicates with operator webhook components over HTTPS, which requires a valid TLS certificate that the API server trusts. The operator supports several methods for configuring the required certificate, each with different levels of complexity and security.

---

##### 1. **Using a Self-Signed Certificate Generated by the Chart**
#### 1. **Using a Self-Signed Certificate Generated by the Chart**

This is the default and simplest method for generating a TLS certificate. It automatically creates a self-signed certificate for the webhook. It is suitable for internal environments or testing purposes but may not be trusted by clients outside your cluster.

Expand All @@ -550,11 +479,11 @@ This is the easiest setup for users and does not require additional configuratio

---

##### 2. **Using a cert-manager Certificate**
#### 2. **Using a cert-manager Certificate**

Using `cert-manager` offers more control over certificate management and is more suitable for production environments. However, due to Helm’s install/upgrade order of operations, cert-manager CRDs and certificates cannot be installed within the same Helm operation. To work around this limitation, you can choose one of the following options:

###### Option 1: **Pre-deploy cert-manager**
##### Option 1: **Pre-deploy cert-manager**

If `cert-manager` is already deployed in your cluster, you can configure the operator to use it without enabling certificate generation by Helm.

Expand All @@ -568,7 +497,7 @@ operator:
enabled: false
```

###### Option 2: **Deploy cert-manager and the operator together**
##### Option 2: **Deploy cert-manager and the operator together**

If you need to install `cert-manager` along with the operator, use a Helm post-install or post-upgrade hook to ensure that the certificate is created after cert-manager CRDs are installed.

Expand All @@ -593,7 +522,7 @@ This method is useful when installing `cert-manager` as a subchart or as part of

---

##### 3. **Using a Custom Externally Generated Certificate**
#### 3. **Using a Custom Externally Generated Certificate**

For full control, you can use an externally generated certificate. This is suitable if you already have a certificate issued by a trusted CA or have specific security requirements.

Expand All @@ -619,3 +548,29 @@ This method allows you to use a certificate that is trusted by external systems,
---

For more advanced use cases, refer to the [official Helm chart documentation](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-operator/values.yaml) for detailed configuration options and scenarios.

### Troubleshooting the Operator and Cert Manager

#### Check the logs for failures

**Operator Logs:**

```bash
kubectl logs -l app.kubernetes.io/name=operator
```

**Cert-Manager Logs:**

```bash
kubectl logs -l app=certmanager
kubectl logs -l app=cainjector
kubectl logs -l app=webhook
```

### Documentation Resources

- https://developers.redhat.com/devnation/tech-talks/using-opentelemetry-on-kubernetes
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md#instrumentation
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#opentelemetry-auto-instrumentation-injection
- https://github.com/open-telemetry/opentelemetry-operator/blob/main/README.md#use-customized-or-vendor-instrumentation

0 comments on commit 83ce345

Please sign in to comment.