Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aerospike agent and dashboards #19188

Open
wants to merge 136 commits into
base: master
Choose a base branch
from

Conversation

mphanias
Copy link

@mphanias mphanias commented Dec 4, 2024

What does this PR do?

Modified the aerospike integration to include all the metrics exposed by aerospike-pormetheus-exporter and multiple dashboard consuming the exported metrics.

Motivation

Expanding Metric Collection and Dashboards, we want to ensure all Aerospike metrics are available in datadog, so customers can have better visibility and provide actionable insights for key metrics, including performance, latency, and resource utilization.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

mphanias and others added 3 commits November 26, 2024 10:57
modified check and metrics python scripts -
-- to include all aerospike metrics most as gauge
-- latency metrics are added as histograms

added new dashboard covering key aerospike metrics required to be monitored
includes,
- cluster, node, namespace, sindex, sets, users, xdr, basic system-info and latencies
updated Aerospike pulgin version in about script
reverted the aerospike version to 4.0.0
reverted CHANGELOG.md to earlier and actual version
generated changelog using ddev package from pypi with command "ddev release changelog new"
corrected the changelog files location
updated the correct PR number
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

incorporated lint feedback and removed unnessary spaces
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

removed unneceddary spaces at the end of the line as suggested in lint check
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

removed unnecessary blank lines reported by lint at the end of file
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.98%. Comparing base (755c683) to head (a70f5e7).

Additional details and impacted files
Flag Coverage Δ
activemq ?
aerospike 87.80% <100.00%> (?)
cassandra ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

renamed label cluster_name as aerospike_cluster as it is forbidden as per github checks,
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

added mapping for cluster_name to aerospike_cluster - updated check.py
Copy link

github-actions bot commented Dec 4, 2024

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link
Contributor

@iliakur iliakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mphanias thanks for taking the bull by the horns here!

I have a couple concerns with the PR as it is right now:

  1. We should not drop support for metrics we already collect. This will cause significant pain to folks who use older versions.
  2. I don't understand why tests were removed but no new ones were added.
  3. For the dashboards, have you had a look at these guidelines?

removed unnecessary dashboard links in node-view and sindex-view
1. fixed all dashboards as per the guidelines in document mentioned in review comments
2. fixed grammer issues
removed unique-data and datacenter-comparision dashboardss
@mphanias
Copy link
Author

Hi iliakur,

Thank you for your feedback, please find my response inline,

  1. We should not drop support for metrics we already collect. This will cause significant pain to folks who use older versions.
    Response:
    With the new changes all the metrics exposed by Aeropike Prometheus Exporter will be sent without any filters irrespective of the Aerospike server version,
    Aerospike Prometheus Exporter is rewritten to be Aerospike DB server agnostic and will send all metrics/stats given by server to the caller in OpenMetrics format,
    But surely we modified in existing Datadog Agent is, earlier metrics are read by DD Agent and sent as a different metric name which we changed.
    Why we have taken this approach is, Aerospike 7.0 version was released more than a year back and most customers are already on 7.0, and Aerospike 4.9, 5.x are end-of-life and soon Aerospike 6.0 also will be end-of-life.

  2. I don't understand why tests were removed but no new ones were added.
    Response:
    I agree removed test-cases, we did this because, with the proposed changes, DD Agent will now get all metrics from Aerospike Prometheus Exporter and metrics are sent with their names and values without any transformation within the agent (except prefixing each metric 'aerospike'), since we are not doing any transformation we felt test-cases for 5.x and 6.x are not required.

  3. For the dashboards, have you had a look at these guidelines?
    Response:
    Thank you for pointing towards the guidelines, we looked at all the dashboards and ensured adhering to all the guidelines mentioned.

Regards

@mphanias
Copy link
Author

Hi iliakur,

We modified the existing overview dashboard to use the actual metric names to ensure existing customer are not impacted. if any custom dashboard are designed by customers using old naming convention would have impact, but current proposed changes will ensure all customers can get the full benefits of all the metrics exposed by Aerospike Prometheus Exporter.

Regards

@mphanias
Copy link
Author

Hi iliakur,

Could you please share your inputs or feedback on the comments I added, really appreciate your support.

Regards
Phani.

Copy link
Contributor

@iliakur iliakur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Phani 👋

I'm afraid dropping metrics is out of the question for us. We often support technology that's EOL to avoid breaking the customers (even if they are few) that use it.

Dropping unit tests is also a no-go even if there's no significant transformation done to the prometheus payload. We would like our tests to guarantee the end to end customer experience and that involves ingesting and processing the prometheus payload. How trivial the processing is doesn't matter.

@mphanias
Copy link
Author

Hi iliakur,

I understand your point of supporting EOL customer, I will add back all the test-cases and then re-add my add changes.

while testing these changes in my local machine, I am facing some issue, could you please point me to any guide.

appreciate your review feedback and support.

Regards
Phani.

error I am getting - datadog_checks.base.checks.base.aerospike:aerospike.py:91 The aerospike client is not installed: No module named 'aerospike'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants