-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/self managed to aws oss #153
base: main
Are you sure you want to change the base?
Changes from 5 commits
bc3b46c
587882a
099c451
1e05863
ad527cf
f3ea6cf
e4a9cce
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,204 @@ | ||
# EKS Observability : Managed Observability Services to AWS Managed Open Source Observability Services | ||
In the ever-evolving world of modern software development, observability has become a critical aspect of ensuring the reliability, performance, and scalability of applications. While self-managed tools like Prometheus and Datadog have been invaluable in providing insights into the health and behavior of systems, the increasing complexity and scale of modern architectures often demand more robust and scalable solutions. Enter managed observability services, such as Amazon Managed Prometheus and Amazon Managed Grafana. These fully managed offerings from AWS aim to simplify the management and operation of observability tools, freeing up valuable time and resources for organizations to focus on their core business objectives. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Datadog is not a self-managed tool. Change language appropriately. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pushed an update There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Call it Amazon Managed Service for Prometheus There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pushed an update |
||
|
||
In this guide, we will explore the journey of migrating from self-managed observability tools to Amazon's managed services. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are talking about self-managed solutions and 3rd party solutions in this article. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pushed an update |
||
|
||
# Why Managed Observability Services ? | ||
By leveraging managed services, organizations can benefit from: | ||
|
||
1. Scalability and Elasticity: Amazon's managed services are designed to scale seamlessly, allowing you to accommodate fluctuating workloads and handle spikes in data ingestion without compromising performance. | ||
|
||
2. Reduced Operational Overhead: With managed services, AWS takes care of the heavy lifting – patching, upgrading, and maintaining the underlying infrastructure, freeing up your team to concentrate on higher-value tasks. | ||
|
||
3. Enhanced Security and Compliance: AWS's managed services adhere to stringent security and compliance standards, providing peace of mind and helping organizations meet regulatory requirements. | ||
|
||
4. Seamless Integration: Amazon's managed observability services seamlessly integrate with other AWS services and third-party tools, enabling a cohesive and unified monitoring experience across your entire infrastructure. | ||
|
||
# Uninstall Self Managed Prometheus | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about people who use Prometheus Service Monitors and other advanced Prometheus deployments? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a line to address this. Please review and let me know if we need to add more details. |
||
If you have deployed Prometheus using a deployment manifest, you can delete all the resources by running the following command: | ||
``` | ||
kubectl delete -f <path/to/prometheus-deployment-manifest.yaml> | ||
``` | ||
If you have used Helm to install Prometheus, you can uninstall it with the following command: | ||
``` | ||
helm uninstall <prometheus-release-name> | ||
``` | ||
If you have created any ConfigMaps or Secrets for Prometheus configuration, you can delete them as well: | ||
``` | ||
kubectl delete configmap <prometheus-config-map-name> | ||
kubectl delete secret <prometheus-secret-name> | ||
``` | ||
|
||
# Uninstall DataDog | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we start with saying something on the lines of This procedure could apply for all 3rd party vendors with some caveats. In this case, we are taking the example of Datadog to showcase the workflow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pushed an update |
||
|
||
If you have DataDog installed, use the following commands to uninstall DataDog | ||
|
||
``` | ||
helm uninstall datadog-agent | ||
helm delete datadog-operator | ||
``` | ||
If you have a secret created for DataDog, then Use the following command to delete the DataDog Secret | ||
``` | ||
kubectl delete secret <datadog-agent-secret-name> -n <namespace> | ||
``` | ||
|
||
# Setup Amazon Managed Service for Prometheus | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about the Accelerators? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added separate section explaining how accelerators can be used. |
||
A workspace in [Amazon Managed Service for Prometheus](https://aws.amazon.com/prometheus/) (AMP) is a logical and isolated Prometheus server dedicated to Prometheus resources such as metrics. A workspace supports fine-grained access control for authorizing its management such as update, list, describe, delete, and the ingestion and querying of metrics. | ||
|
||
Please open a new terminal window and setup the required environment variables and use the below command to create an Amazon Managed Service for Prometheus workspace. | ||
|
||
``` | ||
export EKS_CLUSTER_NAME=<name-of-eks-cluster> | ||
export EKS_CLUSTER_REGION=<aws-region-of-eks-cluster> | ||
export AMP_WORKSPACE_NAME=eks-amp-workspace | ||
``` | ||
|
||
``` | ||
aws amp create-workspace \ | ||
--alias $AMP_WORKSPACE_NAME \ | ||
--region $EKS_CLUSTER_REGION | ||
``` | ||
|
||
The Amazon Managed Service for Prometheus workspace should be created in just a few seconds. | ||
|
||
As a best practice, create a [VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html) for Amazon Managed Service for Prometheus in VPC running your Amazon EKS cluster. Please visit [Using Amazon Managed Service for Prometheus with interface VPC endpoints](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-and-interface-VPC.html) for more information. | ||
|
||
# Setting up the AWS Distro for OpenTelemetry (ADOT) Collector to Ingest Metrics | ||
|
||
One of the easiest ways to collect Prometheus metrics from Amazon EKS workloads is by using the [AWS Distro for OpenTelemetry (ADOT) collector](https://aws-otel.github.io/docs/getting-started/collector). Customers can deploy the ADOT Collector in a variety of deployment models and easily manage configuration using the ADOT Operator. The [ADOT Operator is also available as an EKS Add-On](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for easier deployment and management. Read our [launch blog](https://aws.amazon.com/blogs/containers/metrics-and-traces-collection-using-amazon-eks-add-ons-for-aws-distro-for-opentelemetry/) to learn about this feature. | ||
|
||
If you dont have `eksctl` installed already please install it by follwowing the instructions found [here](https://eksctl.io/installation/). | ||
|
||
Lets create a IAM Service account using `eksctl` which will be used to remote write prometheus metrics to AMP workspace. | ||
|
||
``` | ||
export EKS_CLUSTER_NAME=<name-of-eks-cluster> | ||
eksctl create iamserviceaccount \ | ||
--name amp-iamproxy-ingest-role \ | ||
--region $EKS_CLUSTER_REGION \ | ||
--namespace prometheus \ | ||
--cluster $EKS_CLUSTER_NAME\ | ||
--attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \ | ||
--approve \ | ||
--override-existing-serviceaccounts | ||
``` | ||
|
||
ADOT requires cert-manager, if your EKS cluster does not have cert manager already installed then install cert manager using the following command | ||
|
||
``` | ||
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.3/cert-manager.yaml | ||
``` | ||
|
||
Next, we will grant permissions to Amazon EKS add-ons to install ADOT and then we will be installing the ADOT Add-on : | ||
|
||
``` | ||
kubectl apply -f https://amazon-eks.s3.amazonaws.com/docs/addons-otel-permissions.yaml | ||
aws eks create-addon \ | ||
--addon-name adot \ | ||
--region $EKS_CLUSTER_REGION \ | ||
--cluster-name $EKS_CLUSTER_NAME | ||
``` | ||
|
||
Now, wait for 30 seconds and execute the following command. You should see "ACTIVE" as result indicating that the add-on is installed successfully. | ||
|
||
``` | ||
aws eks describe-addon \ | ||
--addon-name adot \ | ||
--region $EKS_CLUSTER_REGION \ | ||
--cluster-name $EKS_CLUSTER_NAME | jq .addon.status | ||
``` | ||
|
||
Next, we will Install the OTel Collector Custom Resource Definition(CRD) and then we will configure the ADOT collector to push metrics to Amazon Managed Service for Prometheus endpoint. | ||
|
||
``` | ||
export AMP_WORKSPACE_ID=$(aws amp list-workspaces \ | ||
--alias $AMP_WORKSPACE_NAME \ | ||
--region $EKS_CLUSTER_REGION \ | ||
--query 'workspaces[0].[workspaceId]' \ | ||
--output text) | ||
export AMP_ENDPOINT_URL=$(aws amp describe-workspace \ | ||
--region $EKS_CLUSTER_REGION --workspace-id $AMP_WORKSPACE_ID | jq .workspace.prometheusEndpoint -r) | ||
export AMP_REMOTE_WRITE_URL=${AMP_ENDPOINT_URL}api/v1/remote_write | ||
curl -O https://raw.githubusercontent.com/aws-samples/one-observability-demo/main/PetAdoptions/cdk/pet_stack/resources/otel-collector-prometheus.yaml | ||
sed -i -e s/AWS_REGION/$EKS_CLUSTER_REGION/g otel-collector-prometheus.yaml | ||
sed -i -e s^AMP_WORKSPACE_URL^$AMP_REMOTE_WRITE_URL^g otel-collector-prometheus.yaml | ||
kubectl apply -f ./otel-collector-prometheus.yaml | ||
``` | ||
|
||
Now, lets verify that the ADOT collector is running and you should see a result like the one below showing that the collector has been successfully installed and being ready. | ||
|
||
``` | ||
kubectl get all -n prometheus | ||
``` | ||
``` | ||
NAME READY STATUS RESTARTS AGEpod/observability-collector-5774bbc68d-7nj54 1/1 Running 0 59s | ||
|
||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE | ||
service/observability-collector-monitoring ClusterIP 10.100.114.1 <none> 8888/TCP 59s | ||
|
||
NAME READY UP-TO-DATE AVAILABLE AGE | ||
deployment.apps/observability-collector 1/1 1 1 59s | ||
|
||
NAME DESIRED CURRENT READY AGE | ||
replicaset.apps/observability-collector-5774bbc68d 1 1 1 59s | ||
``` | ||
|
||
If you don't have prometheus node exporter already available in the EKS cluster then use the following commands to install prometheus node exporter. This is required to verify that ADOT collector is able to scrape metrics and push it to AMP. | ||
|
||
``` | ||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts | ||
helm repo update | ||
helm install prometheus-node-exporter prometheus-community/prometheus-node-exporter --version 4.37.0 | ||
``` | ||
|
||
Now you have successfully deployed the ADOT Collector to collect metrics from the EKS cluster and send it to the AMP workspace you created. To test whether AMP received the metrics, use awscurl. This tool enables you to send HTTP requests through the command line with AWS Sigv4 authentication, so you must have AWS credentials set up locally with the correct permissions to query from Amazon Managed Service for Prometheus. For instructions on installing awscurl, see [awscurl](https://github.com/okigan/awscurl). | ||
|
||
``` | ||
awscurl --service="aps" \ | ||
--region="$EKS_CLUSTER_REGION" "https://aps-workspaces.$EKS_CLUSTER_REGION.amazonaws.com/workspaces/$AMP_WORKSPACE_ID/api/v1/query?query=node_cpu_seconds_total" | ||
``` | ||
|
||
Your results should look similar to shown below: | ||
|
||
``` | ||
{ | ||
"status": "success", | ||
"data": { | ||
"resultType": "vector", | ||
"result": [ | ||
{ | ||
"metric": { | ||
"__name__": "node_cpu_seconds_total", | ||
"app_kubernetes_io_component":"metrics", | ||
"app_kubernetes_io_instance":"prometheus-node-exporter", | ||
.................................... | ||
.................................... | ||
"version": "v1" | ||
}, | ||
"value": [ | ||
1725391168, | ||
"20.37" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
``` | ||
|
||
# Setup Amazon Managed Grafana | ||
Amazon Managed Grafana (AMG) is a fully managed service that simplifies the deployment and operation of Grafana, an open-source data visualization and monitoring solution. With AMG, you can quickly set up and scale your Grafana environment, enabling you to monitor and analyze your application and infrastructure metrics from various data sources. Please follow the instructions found [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/) to create a AMG workspace. After you create the AMG workspace, to set up Authentication and Authorization, follow the instructions in the [AMG User Guide](https://docs.aws.amazon.com/grafana/latest/userguide/AMG-manage-users-and-groups-AMG.html) for enabling AWS IAM Identity Center. | ||
|
||
After completing authentication and authorization setup, connect to AMG workspace using the workspace URL found in the AMG console. From the left menu select Apps->AWS Data Sources and click on the Data sources tab. From the service dropdown select `Amazon Managed Service for Prometheus` and select the region that you used to create the AMP workspace. You will see the AMP workspace listed after selecting the region, select the AMP workspace and click on `Add data source` button. | ||
|
||
![AMP Datasource](../../../../images/Containers/aws-native/eks/grafana-amp-datasource.png) | ||
|
||
To verify the newly created datasource is working, lets try to explore the metrics available in the datasource. To explore the metrics, from the left menu select `Explore` and select newly created datasource and from the metrics drop down select `node_memory_MemAvailable_bytes` metric and click on `Run Query` button. You should see something similiar to the image below | ||
|
||
![AMP Metric](../../../../images/Containers/aws-native/eks/grafana-amp-metric.png) | ||
|
||
By selecting Dashboards from the left menu, You can also import a dashboard from grafana.com using a URL or the dashboard ID. For example you can use the dashboard ID `10182` to import a dahsboard that helps you to monitor kubernetes nodes. When you import the dashboard and use the AMP datasource the dashboard should look something similiar to the following | ||
|
||
![AMP Dashboard](../../../../images/Containers/aws-native/eks/grafana-amp-dashboard.png) | ||
|
||
# Conclusion | ||
In this guide we have understood on how to migrate from self managed observability services to managed services like Amazon Managed Prometheus and Amazon Managed Grafana. By migrating from self-managed observability tools to fully-managed services like Amazon Managed Prometheus and Amazon Managed Grafana, you can significantly reduce operational overhead and complexity. With Amazon's managed services, you benefit from a secure, highly available, and fully scalable monitoring solution without the burden of provisioning, operating, and maintaining the underlying infrastructure. By embracing Amazon's managed observability solutions, you can focus your efforts on core business objectives, accelerate innovation, and deliver high-quality applications and services to your customers with greater confidence and efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply call it Moving to Managed Open Source Observability Services