Skip to content

Commit

Permalink
2.0 Docs Compendium (#870)
Browse files Browse the repository at this point in the history
Signed-off-by: Michael Dresser <[email protected]>
Co-authored-by: Steven Weber <[email protected]>
Co-authored-by: Mike Murphy <[email protected]>
Co-authored-by: Mike Murphy <mike@kubecost>
Co-authored-by: Jason Charcalla <[email protected]>
Co-authored-by: Michael Dresser <[email protected]>
Co-authored-by: Thomas Nguyen <[email protected]>
Co-authored-by: thomasvn <[email protected]>
Co-authored-by: jesse goodier <[email protected]>
  • Loading branch information
9 people authored Jan 30, 2024
1 parent c91c945 commit 75bf70f
Show file tree
Hide file tree
Showing 46 changed files with 821 additions and 305 deletions.
8 changes: 7 additions & 1 deletion SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
* [Multi-Cluster](install-and-configure/install/multi-cluster/multi-cluster.md)
* [ETL Federation (preferred)](install-and-configure/install/multi-cluster/federated-etl/federated-etl.md)
* [Kubecost Aggregator](install-and-configure/install/multi-cluster/federated-etl/aggregator.md)
* [Migration Guide from Thanos to Kubecost 2.0 (Aggregator)](install-and-configure/install/multi-cluster/federated-etl/thanos-migration-guide.md)
* [Backups and Alerting](install-and-configure/install/multi-cluster/federated-etl/federated-etl-backups-alerting.md)
* [Thanos Federation](install-and-configure/install/multi-cluster/thanos-setup/thanos-setup.md)
* [Configuring Thanos](install-and-configure/install/multi-cluster/thanos-setup/configuring-thanos.md)
Expand All @@ -34,8 +35,9 @@
* [AWS Thanos IAM Policy](install-and-configure/install/multi-cluster/long-term-storage-configuration/aws-service-account-thanos.md)
* [Azure Long-Term Storage](install-and-configure/install/multi-cluster/long-term-storage-configuration/long-term-storage-azure.md)
* [GCP Long-Term Storage](install-and-configure/install/multi-cluster/long-term-storage-configuration/long-term-storage-gcp.md)
* [Mulit-Cluster Diagnostics](install-and-configure/install/multi-cluster/multi-cluster-diagnostics.md)
* [Multi-Cluster Diagnostics](install-and-configure/install/multi-cluster/multi-cluster-diagnostics.md)
* [Secondary Clusters Guide](install-and-configure/install/multi-cluster/secondary-clusters.md)
* [Kubecost 2.0 Install/Upgrade](install-and-configure/install/kubecostv2.md)
* [ETL Backup](install-and-configure/install/etl-backup/etl-backup.md)
* [Sharing ETL Backups](install-and-configure/install/etl-backup/sharing-etl-backups.md)
* [Query Service Replicas](install-and-configure/install/etl-backup/query-service-replicas.md)
Expand Down Expand Up @@ -89,6 +91,8 @@
* [Assets Dashboard](using-kubecost/navigating-the-kubecost-ui/assets.md)
* [Clusters Dashboard](using-kubecost/navigating-the-kubecost-ui/clusters-dashboard.md)
* [Cloud Cost Explorer](using-kubecost/navigating-the-kubecost-ui/cloud-costs-explorer.md)
* [Network Monitoring](using-kubecost/navigating-the-kubecost-ui/network-monitoring.md)
* [Collections](using-kubecost/navigating-the-kubecost-ui/collections.md)
* [Reports](using-kubecost/navigating-the-kubecost-ui/saved-reports/reports.md)
* [Advanced Reporting](using-kubecost/navigating-the-kubecost-ui/saved-reports/advanced-reports.md)
* [Cost Center Report](using-kubecost/navigating-the-kubecost-ui/saved-reports/cost-center-report.md)
Expand All @@ -108,6 +112,8 @@
* [Cluster Health Score](using-kubecost/navigating-the-kubecost-ui/cluster-health-score.md)
* [Budgets](using-kubecost/navigating-the-kubecost-ui/budgets.md)
* [Audits](using-kubecost/navigating-the-kubecost-ui/audits.md)
* [Anomaly Detection](using-kubecost/navigating-the-kubecost-ui/anomaly-detection.md)
* [Teams](using-kubecost/navigating-the-kubecost-ui/teams.md)
* [Contexts](using-kubecost/context-switcher.md)
* [Kubecost Data Audit](using-kubecost/kubecost-data-audit/README.md)
* [AWS/Kubecost Data Audit](using-kubecost/kubecost-data-audit/aws-kubecost-data-audit.md)
Expand Down
Binary file added images/aggregator/aggregator-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/anomalydetection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/collections.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/crss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/forecasting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed images/leader-follower.png
Binary file not shown.
Binary file added images/networkmonitoring.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/networkmonitoring2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/newcollection.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
120 changes: 61 additions & 59 deletions install-and-configure/advanced-configuration/high-availability.md
Original file line number Diff line number Diff line change
@@ -1,59 +1,61 @@
# High Availability Kubecost

{% hint style="info" %}
High availability mode is only officially supported on Kubecost Enterprise plans.
{% endhint %}

Running Kubecost in high availability (HA) mode is a feature that relies on multiple Kubecost replica pods implementing the [ETL Bucket Backup](/install-and-configure/install/etl-backup/etl-backup.md) feature combined with a Leader/Follower implementation which ensures that there always exists exactly one leader across all replicas.

## Leader + Follower

The Leader/Follower implementation leverages a `coordination.k8s.io/v1` `Lease` resource to manage the election of a leader when necessary. To control access of the backup from the ETL pipelines, a `RWStorageController` is implemented to ensure the following:

* Followers block on all backup reads, and poll bucket storage for any backup reads every 30 seconds.
* Followers no-op on any backup writes.
* Followers who receive Queries in a backup store will not stack on pending reads, preventing external queries from blocking.
* Followers promoted to Leader will drop all locks and receive write privileges.
* Leaders behave identically to a single Kubecost install.

![Leader/Follower](/images/leader-follower.png)

## Configuring high availability

In order to enable the leader/follower and HA features, the following must also be configured:

* Replicas are set to a value greater than 1
* ETL FileStore is Enabled (enabled by default)
* [ETL Bucket Backup](/install-and-configure/install/etl-backup/etl-backup.md) is configured

For example, using our Helm chart, the following is an acceptable configuration:

```bash
helm install kubecost kubecost/cost-analyzer --namespace kubecost \
--set kubecostDeployment.leaderFollower.enabled=true \
--set kubecostDeployment.replicas=5 \
--set kubecostModel.etlBucketConfigSecret=kubecost-bucket-secret
```

This can also be done in the `values.yaml` file within the chart:

```yaml
kubecostModel:
image: "gcr.io/kubecost1/cost-model"
imagePullPolicy: Always
# ...
# ETL should be enabled with etlFileStoreEnabled: true
etl: true
etlFileStoreEnabled: true
# ...
# ETL Bucket Backup should be configured by passing the configuration secret name
etlBucketConfigSecret: kubecost-bucket-secret

# Used for HA mode in Enterprise tier
kubecostDeployment:
# Select a number of replicas of Kubecost pods to run
replicas: 5
# Enable Leader/Follower Election
leaderFollower:
enabled: true
```
# High Availability Kubecost

{% hint style="warning" %}
High availability mode is no longer supported as of Kubecost 2.0.
{% endhint %}

{% hint style="info" %}
High availability mode is only officially supported on Kubecost Enterprise plans.
{% endhint %}

Running Kubecost in high availability (HA) mode is a feature that relies on multiple Kubecost replica pods implementing the [ETL Bucket Backup](/install-and-configure/install/etl-backup/etl-backup.md) feature combined with a Leader/Follower implementation which ensures that there always exists exactly one leader across all replicas.

## Leader + Follower

The Leader/Follower implementation leverages a `coordination.k8s.io/v1` `Lease` resource to manage the election of a leader when necessary. To control access of the backup from the ETL pipelines, a `RWStorageController` is implemented to ensure the following:

* Followers block on all backup reads, and poll bucket storage for any backup reads every 30 seconds.
* Followers no-op on any backup writes.
* Followers who receive Queries in a backup store will not stack on pending reads, preventing external queries from blocking.
* Followers promoted to Leader will drop all locks and receive write privileges.
* Leaders behave identically to a single Kubecost install.

## Configuring high availability

In order to enable the leader/follower and HA features, the following must also be configured:

* Replicas are set to a value greater than 1
* ETL FileStore is Enabled (enabled by default)
* [ETL Bucket Backup](/install-and-configure/install/etl-backup/etl-backup.md) is configured

For example, using our Helm chart, the following is an acceptable configuration:

```bash
helm install kubecost kubecost/cost-analyzer --namespace kubecost \
--set kubecostDeployment.leaderFollower.enabled=true \
--set kubecostDeployment.replicas=5 \
--set kubecostModel.etlBucketConfigSecret=kubecost-bucket-secret
```

This can also be done in the `values.yaml` file within the chart:

```yaml
kubecostModel:
image: "gcr.io/kubecost1/cost-model"
imagePullPolicy: Always
# ...
# ETL should be enabled with etlFileStoreEnabled: true
etl: true
etlFileStoreEnabled: true
# ...
# ETL Bucket Backup should be configured by passing the configuration secret name
etlBucketConfigSecret: kubecost-bucket-secret

# Used for HA mode in Enterprise tier
kubecostDeployment:
# Select a number of replicas of Kubecost pods to run
replicas: 5
# Enable Leader/Follower Election
leaderFollower:
enabled: true
```
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ Lowering query resolution will reduce memory consumption but will cause short ru

Fewer data points scraped from Prometheus means less data to collect and store, at the cost of Kubecost making estimations that possibly miss spikes of usage or short running pods. The default value is: `60s`. This can be tuned in our [Helm values](https://github.com/kubecost/cost-analyzer-helm-chart/blob/v1.93.2/cost-analyzer/values.yaml#L389) for the Prometheus scrape job.

## Disable or stop scraping node exporter
## Keep node exporter disabled

Node-exporter is optional. Some health alerts will be disabled if node-exporter is disabled, but savings recommendations and core cost allocation will function normally. This can be disabled with the following [Helm values](https://github.com/kubecost/cost-analyzer-helm-chart/blob/v1.93.2/cost-analyzer/values.yaml#L442):
Node-exporter is disabled by default, and is an optional feature. Some health alerts will be disabled if node-exporter is disabled, but savings recommendations and core cost allocation will function normally. You can enable node-exporter with the following [Helm values](https://github.com/kubecost/cost-analyzer-helm-chart/blob/v1.93.2/cost-analyzer/values.yaml#L442):

* `--set prometheus.server.nodeExporter.enabled=false`
* `--set prometheus.serviceAccounts.nodeExporter.create=false`
* `--set prometheus.server.nodeExporter.enabled=true`
* `--set prometheus.serviceAccounts.nodeExporter.create=true`

## Soft memory limit field

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,15 @@ Use [your browser's devtools](https://developer.chrome.com/docs/devtools/network

### Option 2: Review logs, and decode your JWT tokens

If `kubecostAggregator.enabled` is `true` or unspecified in `values.yaml`:
```sh
kubectl logs statefulsets/kubecost-aggregator
kubectl logs deploy/kubecost-cost-analyzer
```

If `kubecostAggregator.enabled` is `false` in `values.yaml`:
```sh
kubectl logs services/kubecost-aggregator
kubectl logs deploy/kubecost-cost-analyzer
```

Expand All @@ -133,6 +141,10 @@ kubecostModel:
extraEnv:
- name: LOG_LEVEL
value: debug
kubecostAggregator:
extraEnv:
- name: LOG_LEVEL
value: debug
```

For further assistance, reach out to [email protected] and provide both logs and a [HAR file](https://support.google.com/admanager/answer/10358597?hl=en).
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,9 @@ All SAML 2.0 providers also work. The above guides can be used as templates for
## Using the Kubecost API
When SAML SSO is enabled in Kubecost, ports 9090 and 9003 of `service/kubecost-cost-analyzer` will require authentication. Therefore user API requests will need to be authenticated with a token. The token can be obtained by logging into the Kubecost UI and copying the token from the browser’s local storage. Alternatively, a long-term token can be issued to users from your identity provider.
When SAML SSO is enabled in Kubecost, the following ports will require authentication:
- `service/kubecost-cost-analzyer`: ports 9003 and 9090
- `statefulset/kubecost-aggregator` or `service/kubecost-aggregator`: port 9004

{% code overflow="wrap" %}
```sh
Expand All @@ -66,11 +68,18 @@ curl -L 'http://kubecost.mycompany.com/model/allocation?window=1d' \
```
{% endcode %}

For admins, Kubecost additionally exposes an unauthenticated API on port 9004 of `service/kubecost-cost-analyzer`.
For admins, Kubecost additionally exposes unauthenticated APIs:

`service/kubecost-cost-analyzer`: port 9007
```sh
kubectl port-forward service/kubecost-cost-analyzer 9004:9004
curl -L 'localhost:9004/allocation?window=1d'
kubectl port-forward service/kubecost-cost-analyzer 9007:9007
curl -L 'localhost:9007/allocation?window=1d'
```

`service/kubecost-aggregator`: port 9008
```sh
kubectl port-forward service/kubecost-aggregator 9008:9008
curl -L 'localhost:9008/allocation?window=1d'
```

## View your SAML Group
Expand All @@ -79,12 +88,12 @@ You will be able to view your current SAML Group in the Kubecost UI by selecting

## SAML troubleshooting guide

1. Disable SAML and confirm that the `cost-analyzer` pod starts.
2. If step 1 is successful, but the pod is crashing or never enters the ready state when SAML is added, it is likely that there is panic loading or parsing SAML data.

`kubectl logs deployment/kubecost-cost-analyzer -c cost-model -n kubecost`
1. Disable SAML and confirm the `cost-analyzer` pod starts. If `kubecostAggregator.enabled` is unspecified or `true` in the _values.yaml_ file, confirm that the `aggregator` pod starts.
2. If Step 1 is successful, but the pod is crashing or never enters the ready state when SAML is added, it is likely there is panic when loading or parsing SAML data.
- If `kubecostAggregator.enabled` is `true` or unspecified in _values.yaml_, run `kubectl logs statefulsets/kubecost-aggregator` and `kubectl logs deploy/kubecost-cost-analyzer`
- If `kubecostAggregator.enabled` is `false` in _values.yaml_, run `kubectl logs services/kubecost-aggregator` and `kubectl logs deploy/kubecost-cost-analyzer`

If you’re supplying the SAML from the address of an Identity Provider Server, `curl` the SAML metadata endpoint from within the Kubecost pod and ensure that a valid XML EntityDescriptor is being returned and downloaded. The response should be in this format:
If you’re supplying the SAML from the address of an Identity Provider Server, `curl` the SAML metadata endpoint from within the `kubecost` pod and ensure that a valid XML EntityDescriptor is being returned and downloaded. The response should be in this format:

{% code overflow="wrap" %}
```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,14 +154,24 @@ kubectl delete configmap -n kubecost group-filters && kubectl create configmap -
## Troubleshooting
You can look at the logs on the cost-model container. This script is currently a work in progress.
You can look at the logs on the aggregator and cost-model containers. This script is currently a work in progress.
If `kubecostAggregator.enabled` is `true` or unspecified in _values.yaml_:
{% code overflow="wrap" %}
```
kubectl logs deployment/kubecost-cost-analyzer -c cost-model --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience'
```
{% endcode %}
If `kubecostAggregator.enabled` is `false` in _values.yaml_:
{% code overflow="wrap" %}
```
kubectl logs services/kubecost-aggregator --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience'
```
{% endcode %}
When the group has been matched, you will see:
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,32 @@ saml:

## Troubleshooting

You can view the logs on the cost-model container. In this example, the assumption is that the prefix for Kubecost groups is `kubecost_`. This command is currently a work in progress.

You can look at the logs on the aggregator and cost-model containers. In this example, the assumption is that the prefix for Kubecost groups is `kubecost_`. This script is currently a work in progress.

`kubectl logs deployment/kubecost-cost-analyzer -c cost-model --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience|kubecost_'`

{% code overflow="wrap" %}
```
kubectl logs deployment/kubecost-cost-analyzer -c cost-model --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience|kubecost_'
```
{% endcode %}
If `kubecostAggregator.enabled` is `true` or unspecified in _values.yaml_:
{% code overflow="wrap" %}
```
kubectl logs statefulsets/kubecost-aggregator --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience|kubecost_'
```
{% endcode %}
If `kubecostAggregator.enabled` is `false` in _values.yaml_:
{% code overflow="wrap" %}
```
kubectl logs services/kubecost-aggregator --follow |grep -v -E 'resourceGroup|prometheus-server'|grep -i -E 'group|xmlname|saml|login|audience|kubecost_'
```
{% endcode %}
When the group has been matched, you will see:
```
Expand Down Expand Up @@ -216,4 +237,4 @@ I0330 14:48:20.702125 1 log.go:47] [Info] Attempting to authenticate saml.
I0330 14:48:20.702229 1 costmodel.go:813] Authenticated saml
...
I0330 14:48:21.011787 1 auth.go:167] AUDIENCE: [admin group:admin@kubecost.com]
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,9 @@ eksctl utils associate-iam-oidc-provider \
**Step 4: Create required IAM service accounts**
**Note:** Remember to replace `1234567890` with your AWS account ID number.
{% hint style="info" %}
Remember to replace `1234567890` with your AWS account ID number.
{% endhint %}
{% code overflow="wrap" %}
```
Expand Down
Loading

0 comments on commit 75bf70f

Please sign in to comment.