Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br: add metrics description for snapshot restore and PITR #18516

Merged
merged 6 commits into from
Aug 20, 2024

Conversation

Tristan1900
Copy link
Contributor

@Tristan1900 Tristan1900 commented Aug 7, 2024

First-time contributors' checklist

What is changed, added or deleted? (Required)

Add a bunch of descriptions for restore based on metrics in TiKV#tikv_details.dashboard.py

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v8.3 (TiDB 8.3 versions)
  • v8.2 (TiDB 8.2 versions)
  • v8.1 (TiDB 8.1 versions)
  • v8.0 (TiDB 8.0 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)
  • v5.3 (TiDB 5.3 versions)
  • v5.2 (TiDB 5.2 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

Signed-off-by: Wenqi Mou <[email protected]>
@ti-chi-bot ti-chi-bot bot added first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 7, 2024
grafana-tikv-dashboard.md Outdated Show resolved Hide resolved
Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest lgtm, thank you!

- Import Wait Duration: time spent on downloading task waiting in queue for execution.
- Import Read SST Duration: time spent on reading from external storage of a file and download it to TiKV.
- Import Rewrite SST Duration: time spent on rewriting SST based on rewrite rules.
- Import Ingest RPC Duration: time spent on sending ingest response in RPC call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Import Ingest RPC Duration: time spent on sending ingest response in RPC call.
- Import Ingest RPC Duration: time spent on handing ingest response in RPC call.

It is server-side.

Copy link
Contributor Author

@Tristan1900 Tristan1900 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, just read the code again, you are right, the timer is started before the handling logic, I thought it was the time of sending response from server back to client.
The name of RPC duration is confusing to me actually. Normally it should measure the entire GRPC call end to end.

// client side pseudo code
timer = Time();
server.doSomething().await; // GRPC call to server
record(timer);

in our use case it is only measuring part of it on the server side. we should audit all those metrics and rename it properly.

- Blocked by Concurrency Time: time spent on waiting to get executed due to concurrency constraint.
- Apply Request Speed: speed of applying request to raft engine.
- Cached File in Memory: files cached by the applying requests of importer.
- Engine Requests Unfinished: number of pending requests to raft engine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "raft engine" should be "raft store", this "Engine" actually means the Engine trait (RaftKv class) in TiKV. "Raft Engine" specially means the storage of the raft state in TiKV.

- Import Ingest SST Bytes: number of bytes ingested.
- Import Download SST Throughput: SST download throughput in bytes per second.
- Import Local Write keys: ???
- Import Local Write bytes: ???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this?

Copy link
Contributor Author

@Tristan1900 Tristan1900 Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I guess that raise the problem of mixing unrelated metrics. we shouldn't expect to see lightning related metrics in the backup&restore dashboard
opened issue tikv/tikv#17369

Signed-off-by: Wenqi Mou <[email protected]>
Copy link

ti-chi-bot bot commented Aug 8, 2024

@BornChanger: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

- Import Ingest SST Duration: time spent on ingesting SST into RocksDB.
- Import Ingest SST Bytes: number of bytes ingested.
- Import Download SST Throughput: SST download throughput in bytes per second.
- TTL Expired: number of expired items after TTL in backup files.
Copy link
Contributor

@BornChanger BornChanger Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also skip this item since it's raw_kv related, and it also needs to be filtered out from snapshot restore panel.

@@ -484,6 +484,45 @@ This section provides a detailed description of these key metrics on the **TiKV-
- Initial Scanning Trigger Reason: The reason for triggering incremental scanning
- Region Checkpoint Key Putting: The number of checkpoint operations logged to the PD

### Snapshot restore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most are for import, why call it Snapshot restore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this PR is to add metrics description for restore, as a new member in the team I find it myself confused when looking at sst importer where it's actually used by restore. I feel like customer might have the same doubt when looking the metrics, so changing the name to be snapshot restore.

I have brought up to the team that we probably should have an abstraction of the restore on the TiKV path, where internally it still calls sst importer. It can be more explicit and meaningful to users this way, and inside of it we can rename or create new metrics to be decoupled with sst importer.

let me know if that makes sense!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

grafana-tikv-dashboard.md Outdated Show resolved Hide resolved
grafana-tikv-dashboard.md Outdated Show resolved Hide resolved
Signed-off-by: Wenqi Mou <[email protected]>
@BornChanger
Copy link
Contributor

/cc @Oreoxmt please help merge the pr

@ti-chi-bot ti-chi-bot bot requested a review from Oreoxmt August 13, 2024 03:08
Copy link

ti-chi-bot bot commented Aug 13, 2024

@BornChanger: GitHub didn't allow me to request PR reviews from the following users: merge, the, pr, please.

Note that only pingcap members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @Oreoxmt please help merge the pr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Oreoxmt Oreoxmt added type/enhancement The issue or PR belongs to an enhancement. translation/doing This PR's assignee is translating this PR. labels Aug 13, 2024
@ti-chi-bot ti-chi-bot bot removed the missing-translation-status This PR does not have translation status info. label Aug 13, 2024
@Oreoxmt Oreoxmt added missing-translation-status This PR does not have translation status info. needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.0 needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. area/br Indicates that the Issue or PR belongs to the area of BR (Backup & Restore). labels Aug 13, 2024
@Oreoxmt Oreoxmt self-assigned this Aug 13, 2024
@Oreoxmt
Copy link
Collaborator

Oreoxmt commented Aug 20, 2024

/approve

Copy link

ti-chi-bot bot commented Aug 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Oreoxmt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Aug 20, 2024
@ti-chi-bot ti-chi-bot bot merged commit ac5582d into pingcap:master Aug 20, 2024
9 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #18617.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #18618.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #18619.

ti-chi-bot pushed a commit to ti-chi-bot/docs that referenced this pull request Aug 20, 2024
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.0: #18620.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #18621.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.2: #18622.

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.3: #18623.

@Oreoxmt Oreoxmt added translation/done This PR has been translated from English into Chinese and updated to pingcap/docs-cn in a PR. and removed translation/doing This PR's assignee is translating this PR. labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/br Indicates that the Issue or PR belongs to the area of BR (Backup & Restore). first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.3 Should cherry pick this PR to release-8.3 branch. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. translation/done This PR has been translated from English into Chinese and updated to pingcap/docs-cn in a PR. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants