Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented computation of segment replication stats at shard level #17055

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Jan 19, 2025

Description

The method implemented here computes the segment replication stats at the shard level, instead of relying on the primary shard to compute stats based on reports from its replicas.

Method implemented in this PR serves the segment replication stats for following core APIs:

  1. Nodes Stats API (/_nodes/stats)
  2. Cluster Stats API (/_cluster/stats)
  3. Indices Stats API (/_stats or /{index}/_stats)

Related Issues

Resolves #16801
Related to #15306

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The method implemented here computes the segment replication stats at the shard level,
instead of relying on the primary shard to compute stats based on reports from its replicas.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for dd0406d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for dd0406d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Jan 22, 2025
@vinaykpud vinaykpud reopened this Jan 22, 2025
Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Jan 22, 2025
@vinaykpud vinaykpud reopened this Jan 22, 2025
Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Jan 22, 2025
@vinaykpud vinaykpud reopened this Jan 22, 2025
Copy link
Contributor

❌ Gradle check result for 59e2617: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 90b96a8: SUCCESS

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 72.41379% with 16 lines in your changes missing coverage. Please review.

Project coverage is 72.37%. Comparing base (2794655) to head (90b96a8).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...nsearch/indices/replication/SegmentReplicator.java 72.72% 6 Missing and 3 partials ⚠️
.../replication/checkpoint/ReplicationCheckpoint.java 66.66% 1 Missing and 2 partials ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 60.00% 1 Missing and 1 partial ⚠️
...rc/main/java/org/opensearch/index/IndexModule.java 0.00% 1 Missing ⚠️
...c/main/java/org/opensearch/index/IndexService.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17055      +/-   ##
============================================
+ Coverage     72.19%   72.37%   +0.17%     
- Complexity    65304    65385      +81     
============================================
  Files          5301     5301              
  Lines        303774   303817      +43     
  Branches      44034    44040       +6     
============================================
+ Hits         219323   219891     +568     
+ Misses        66458    65882     -576     
- Partials      17993    18044      +51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -149,7 +150,8 @@ private static ReplicationCheckpoint readCheckpointFromIndexInput(
in.readLong(),
in.readLong(),
in.readString(),
toStoreFileMetadata(uploadedSegmentMetadataMap)
toStoreFileMetadata(uploadedSegmentMetadataMap),
in.readLong()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does bwc work here with this added field? Do we need to bump CURRENT_VERSION on RemoteSegmentMetadata ? @sachinpkale can you pls assist?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data is being read without the writer serialising it, we will end with an end of stream exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so we'd need to check the version before attempting to read. I think we may want to add a mixed cluster test here in any case. I don't think theres an existing qa pkg that runs with remote enabled clusters.

@@ -294,6 +294,7 @@ public synchronized void onNewCheckpoint(final ReplicationCheckpoint receivedChe
return;
}
updateLatestReceivedCheckpoint(receivedCheckpoint, replicaShard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove this and the latestReceivedCheckpoint map from this class as the replicator is the source of truth now.

;
final Map<String, StoreFileMetadata> indexStoreFileMetadata = indexReplicationCheckPoint.getMetadataMap();
// If primaryLastRefreshedCheckpoint is null, we will default to indexReplicationCheckPoint
// so that we can avoid any failures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this happen? In this case can we not assume the shard is up to date and return empty ReplicationStats obj instead?

);
final Map<String, StoreFileMetadata> storeFileMetadata = primaryLastRefreshedCheckpoint.getMetadataMap();

final Store.RecoveryDiff diff = Store.segmentReplicationDiff(storeFileMetadata, indexStoreFileMetadata);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than storing two ReplicationCheckpoint maps, lets store a single ShardId to ReplicationStats map and compute the stats as information about new primary checkpoints is received.

I'm also not following the lastOnGoingReplicationCheckpoint computation because cp used to compute bytes behind is not the same as the one used for lag. To simplify I think we only care about the primarylastRefreshed and should compute both stats against that, where lag is computed as the time between primarylastRefreshed and the currently searchable checkpoint on the shard.

This would still fall in line with our definition of 'Replication lag' in docs

segments.segment_replication.max_replication_lag long The maximum amount of time, in milliseconds, taken by a replica to catch up to its primary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed,

  1. Lets compute the diff(bytesBehind) on the go everytime when we receive the primary refreshed checkpoint
  2. Along with that, instead of maintaining two different maps, we can try to combine them in a single map
    ex: Map<ShardId, Map<SegmentInfoVersion, ComputedDiff>
class ComputedDiff {
   long timeStamp,
   long bytesBehind
}
  1. Once the checkpoint is computed we can remove all the entries less than replication completed checkpoint segmentInfoVersion

  2. when nodeStats requested we can use the diff from the map and calculate the lag based on currentTime and timeStamp in the ComputedDiff

Copy link
Member

@mch2 mch2 Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, we are doing similar tracking now with ReplicationTracker on the primary. You will need to wire this up with Index events IndexEventListener through the segrep service so that shard entries are cleared on removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Redefine the computation of segment replication metrics in Node Stats
3 participants