Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: add metrics from OpenSearch file_cache (:9200/_nodes/stats/file_cache) #934

Open
drewmiranda-gl opened this issue Sep 24, 2024 · 3 comments

Comments

@drewmiranda-gl
Copy link

As far as I can tell, metrics from file_cache are not exported for OpenSearch. These are very helpful metrics to monitor the health of using Searchable Snapshots and the search role with OpenSearch.

@Shivani3351
Copy link

Hi Team.
Please can you include this too .

Extend nodes search metrics to add scroll_current (integer) Number of scroll operations currently running

It is crucial data for our application and would be helpful if it were integrated.

@sysadmind
Copy link
Contributor

It would be helpful if someone could provide an easy example of how to run opensearch and a curl command to pull these metrics. A docker run command would be perfectly fine for this. It would save whoever picks up this work a lot of time in setup.

Notes / questions to developer:

  • How do we make sure this doesn't log errors or warnings for users of elasticsearch
  • Is there an obvious way to detect opensearch vs elasticsearch
  • Should a user have to pass a flag to enable opensearch collectors
  • How do we note in the documentation that some collectors are only one or the other

@drewmiranda-gl
Copy link
Author

drewmiranda-gl commented Nov 12, 2024

Howdy from MD :)

Bare minimum opensearch install: https://github.com/Graylog2/se-poc-docs/blob/main/src/On%20Prem%20POC/installing%20opensearch.md

Unfortunately setting up a repo is a bit cumbersome, but see https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#register-repository

In terms of taking a snapshot, the above should also be useful. OpenSearch dashboards may also be useful.

Assuming an OpenSearch node is fully running and has a snapshot repo and some snapshots you can execute a curl command against our OpenSearch node:

curl localhost:9200/_nodes/stats/file_cache

Which returns something like

{
  "_nodes": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "cluster_name": "graylog",
  "nodes": {
    "KccZVrL8SISo4JsNwGCglQ": {
      "timestamp": 1731423305965,
      "name": "pve-opsrch1",
      "transport_address": "192.168.0.161:9300",
      "host": "192.168.0.161",
      "ip": "192.168.0.161:9300",
      "roles": [
        "cluster_manager",
        "data"
      ],
      "attributes": {
        "zone": "opsrch1",
        "shard_indexing_pressure_enabled": "true"
      }
    },
    "Ky6yop-QRR6EWt7pRUO_Xg": {
      "timestamp": 1731423305965,
      "name": "pve-opsrch3",
      "transport_address": "192.168.0.181:9300",
      "host": "192.168.0.181",
      "ip": "192.168.0.181:9300",
      "roles": [
        "search"
      ],
      "attributes": {
        "zone": "opsrch1",
        "shard_indexing_pressure_enabled": "true"
      },
      "file_cache": {
        "timestamp": 1731423305965,
        "active_in_bytes": 6745596006,
        "total_in_bytes": 52249116672,
        "used_in_bytes": 32546903271,
        "evictions_in_bytes": 0,
        "active_percent": 21,
        "used_percent": 62,
        "hit_count": 48742,
        "miss_count": 882
      }
    },
    "YlxLlLS5Qmm5F3m2HpaY5w": {
      "timestamp": 1731423305966,
      "name": "pve-opsrch2",
      "transport_address": "192.168.0.191:9300",
      "host": "192.168.0.191",
      "ip": "192.168.0.191:9300",
      "roles": [
        "data"
      ],
      "attributes": {
        "zone": "opsrch2",
        "shard_indexing_pressure_enabled": "true"
      }
    }
  }
}

Note that this API endpoint returns ALL nodes, even if the node does not have the search role. Only nodes that do have a local file cache will contain the file_cache section:

$.nodes[*].file_cache

[
  {
    "timestamp": 1731423305965,
    "active_in_bytes": 6745596006,
    "total_in_bytes": 52249116672,
    "used_in_bytes": 32546903271,
    "evictions_in_bytes": 0,
    "active_percent": 21,
    "used_percent": 62,
    "hit_count": 48742,
    "miss_count": 882
  }
]

For now I'm using https://github.com/prometheus-community/json_exporter with the followin config:

---
modules:
  default:
    headers:
      X-Dummy: my-test-header
    metrics:
    - name: elasticsearch_file_cache
      type: object
      help: Example of sub-level value scrapes from a json
      path: '{ $.nodes.* }'
      labels:
        # environment: beta # static label
        name: '{.name}'
        host: '{.host}'
      values:
        # active: 1         # static value
        active_in_bytes: '{ .file_cache.active_in_bytes }'
        total_in_bytes: '{ .file_cache.total_in_bytes }'
        used_in_bytes: '{ .file_cache.used_in_bytes }'
        evictions_in_bytes: '{ .file_cache.evictions_in_bytes }'
        active_percent: '{ .file_cache.active_percent }'
        used_percent: '{ .file_cache.used_percent }'
        hit_count: '{ .file_cache.hit_count }'
        miss_count: '{ .file_cache.miss_count }'

which gets me

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants