Skip to content

Allow missing shard stats for restarted nodes for _snapshot/_status #4410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JeremyDahlgren
Copy link

Adds a note explaining the change made in elasticsearch PR #128399 to reduce latency when getting stats for currently running snapshots.

Leaving as draft until elasticsearch PR #128399 has been merged.

Relates ES-10982

Copy link
Contributor

Following you can find the validation results for the API you have changed.

API Status Request Response
snapshot.status 🟢 2/2 2/2

You can validate this API yourself by using the make validate target.

Copy link
Contributor

Following you can find the validation results for the API you have changed.

API Status Request Response
snapshot.status 🟢 2/2 2/2

You can validate this API yourself by using the make validate target.

1 similar comment
Copy link
Contributor

Following you can find the validation results for the API you have changed.

API Status Request Response
snapshot.status 🟢 2/2 2/2

You can validate this API yourself by using the make validate target.

@JeremyDahlgren JeremyDahlgren force-pushed the jdahlgren/get-snapshot-status-missing-stats-for-restarted-nodes branch from d2d9990 to cc13ef7 Compare June 10, 2025 17:24
Adds a note explaining the change made in elasticsearch PR #128399
to reduce latency when getting stats for currently running snapshots.

Relates ES-10982
@JeremyDahlgren JeremyDahlgren force-pushed the jdahlgren/get-snapshot-status-missing-stats-for-restarted-nodes branch from cc13ef7 to a85ad64 Compare June 10, 2025 17:27
@JeremyDahlgren JeremyDahlgren added the skip-backport This pull request should not be backported label Jun 10, 2025
Copy link
Contributor

Following you can find the validation results for the API you have changed.

API Status Request Response
snapshot.status 🟢 2/2 2/2

You can validate this API yourself by using the make validate target.

@JeremyDahlgren JeremyDahlgren marked this pull request as ready for review June 10, 2025 17:32
Copy link

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the text was, and still is, identical across files?

Is there a way to preview what this looks like?

Is there someplace we need to change/add an API response example?

@@ -43923,7 +43923,7 @@
"snapshot"
],
"summary": "Get the snapshot status",
"description": "Get a detailed description of the current state for each shard participating in the snapshot.\n\nNote that this API should be used only to obtain detailed shard-level information for ongoing snapshots.\nIf this detail is not needed or you want to obtain information about one or more existing snapshots, use the get snapshot API.\n\nIf you omit the `<snapshot>` request path parameter, the request retrieves information only for currently running snapshots.\nThis usage is preferred.\nIf needed, you can specify `<repository>` and `<snapshot>` to retrieve information for specific snapshots, even if they're not currently running.\n\nWARNING: Using the API to return the status of any snapshots other than currently running snapshots can be expensive.\nThe API requires a read from the repository for each shard in each snapshot.\nFor example, if you have 100 snapshots with 1,000 shards each, an API request that includes all snapshots will require 100,000 reads (100 snapshots x 1,000 shards).\n\nDepending on the latency of your storage, such requests can take an extremely long time to return results.\nThese requests can also tax machine resources and, when using cloud storage, incur high processing costs.\n\n ## Required authorization\n* Cluster privileges: `monitor_snapshot`",
"description": "Get a detailed description of the current state for each shard participating in the snapshot.\n\nNote that this API should be used only to obtain detailed shard-level information for ongoing snapshots.\nIf this detail is not needed or you want to obtain information about one or more existing snapshots, use the get snapshot API.\n\nIf you omit the `<snapshot>` request path parameter, the request retrieves information only for currently running snapshots.\nThis usage is preferred.\nNote that if a node has been restarted or has left the cluster since completing a shard snapshot the stats for that shard will be unavailable.\nLoading the stats from the repository is an expensive operation (see the WARNING below), so to minimize latency for returning stats for currently\nrunning snapshots the stats values will be -1 for these shards even though the \"stage\" value will be \"DONE\". A \"description\" field will be set\non these shard stats instances indicating why they are empty. Note that the total stats for the index will be less than expected due to the\nmissing values from these shards.\nIf needed, you can specify `<repository>` and `<snapshot>` to retrieve information for specific snapshots, even if they're not currently running.\n\nWARNING: Using the API to return the status of any snapshots other than currently running snapshots can be expensive.\nThe API requires a read from the repository for each shard in each snapshot.\nFor example, if you have 100 snapshots with 1,000 shards each, an API request that includes all snapshots will require 100,000 reads (100 snapshots x 1,000 shards).\n\nDepending on the latency of your storage, such requests can take an extremely long time to return results.\nThese requests can also tax machine resources and, when using cloud storage, incur high processing costs.\n\n ## Required authorization\n* Cluster privileges: `monitor_snapshot`",
Copy link

@DiannaHohensee DiannaHohensee Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get a preview to see how the formatting looks?

I don't know what the expectations are for formatting. Currently, there's a new line after every sentence, not in the middle of any sentence, regardless of the sentence length. I have a rearrangement suggestion for your new text, which also avoids having to go outside the existing newline pattern (at least the new text doesn't have any sentences longer than the pre-existing text). I added spacing just to see what I was doing.

"Get a detailed description of the current state for each shard participating in the snapshot.
\n\nNote that this API should be used only to obtain detailed shard-level information for ongoing snapshots.
\nIf this detail is not needed or you want to obtain information about one or more existing snapshots, use the get snapshot API.

\n\nIf you omit the `<snapshot>` request path parameter, the request retrieves information only for currently running snapshots.
\nThis usage is preferred.

>>> I moved this line up <<<
\nIf needed, you can specify `<repository>` and `<snapshot>` to retrieve information for specific snapshots, even if they're not currently running.

>>> New text (new paragraph, too) <<<
\n\nNote that the stats will not be available for any shard snapshots in an ongoing snapshot completed by a node that (even momentarily) left the cluster.
\nLoading the stats from the repository is an expensive operation (see the WARNING below).
\nTherefore the stats values for such shards will be -1 even though the \"stage\" value will be \"DONE\", in order to minimize latency.
\nA \"description\" field will be present on for a shard snapshot completed by a departed node explaining why the shard snapshot's stats results are invalid.
\nConsequently, the total stats for the index will be less than expected due to the missing values from these shards.
>>> <<<

\n\nWARNING: Using the API to return the status of any snapshots other than currently running snapshots can be expensive.
\nThe API requires a read from the repository for each shard in each snapshot.
\nFor example, if you have 100 snapshots with 1,000 shards each, an API request that includes all snapshots will require 100,000 reads (100 snapshots x 1,000 shards).
\n\nDepending on the latency of your storage, such requests can take an extremely long time to return results.
\nThese requests can also tax machine resources and, when using cloud storage, incur high processing costs.

\n\n ## Required authorization\n* Cluster privileges: `monitor_snapshot`",

@DiannaHohensee
Copy link

DiannaHohensee commented Jun 13, 2025

Is there someplace we need to change/add an API response example?

It looks like the allocation explain docs have two response examples -- I remembered see it once -- https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-allocation-explain, fwiw. Maybe we could add another example for the new description field in the same way allocation explain does it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-backport This pull request should not be backported specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants