Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVStore: Record table-wise memory usage and warn when it goes beyond the threshold #9835

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

CalvinNeo
Copy link
Member

@CalvinNeo CalvinNeo commented Jan 27, 2025

What problem does this PR solve?

Issue Number: ref #9722

Problem Summary:

  • Introduce a struct RegionTableSize which is shared by a Table and all its related Regions. Once registered, a region will update its memory change to this struct, so the table would learn.
  • Introduce maybeWarnMemoryLimitByTable and resetWarnMemoryLimitByTable, to log when the memory of a table goes beyond threshold.

Both of the systems works under the path that:

  • When a region is created, either for prehandling or inserting, it would register it to RegionTable, and get its shared RegionTableSize.
  • Ever since, the insert/delete from the Region would also reflect on this shared RegionTableSize.
  • After an insertion, if the memory consumed by a table associated with the given region is reported to go beyond the threshold propotion, and the total memory used is also beyond the given memory limit(which is a very corner case), then we will issue a log.
  • Afterwards, if the memory usage somehow goes upper, we don't print the log again for the same table.
  • If the memory goes down by committing txns, or applying snapshots, we will reset the warning. So if the table still consumes much memory, a new log will be printed.
  • If the region is splitted or merged, the table's memory usage is not affected. If the region is destroyed, the table's memory would be decresed.

Also:

  • Remove some useless structures in RegionTable.
  • Make safe ts related codes into SafeTsMgr.
  • Rename some methods in Region to reflect what they actually do.

What is changed and how it works?

Record table-wise memory usage and warn when it goes beyond threshold

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels Jan 27, 2025
Copy link
Contributor

ti-chi-bot bot commented Jan 27, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from calvinneo, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 27, 2025
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 5, 2025
@CalvinNeo CalvinNeo changed the title dnm Record table-wise memory usage Record table-wise memory usage Feb 7, 2025
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 7, 2025
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
Signed-off-by: Calvin Neo <[email protected]>
@CalvinNeo CalvinNeo changed the title Record table-wise memory usage and warn when it goes beyond threshold KVStore: Record table-wise memory usage and warn when it goes beyond the threshold Feb 8, 2025
Signed-off-by: Calvin Neo <[email protected]>
Comment on lines 63 to 67
RegionTable::InternalRegion & RegionTable::insertRegion(Table & table, const Region & region)
{
const auto range = region.getRange();
return insertRegion(table, *range, region.id());
return insertRegion(table, *range, region);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make line 108 in RegionTable::getOrInsertRegion as return insertRegion(table, *region.getRange(), region); and remove this redundant function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment on lines +55 to +58

Type total() const { return payload + decoded; }

size_t totalSize() const { return payload + decoded; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need two method for the same result? Maybe just adding explicit cast when want to assign the result to differnet type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total() needs no casting, but is not used very much,
totalSize() needs casting, but is widely used in test, so it is we just write it here to eliminate some static_casting

std::atomic_int64_t table_size;
std::atomic_bool warned;
};
using RegionTableCtx = std::shared_ptr<RegionTableCtxInner>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to RegionTableCtxPtr? Most alias of shared_ptr are xxxPtr in TiFlash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep those function name to make them shorter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep those function name to make them shorter?

I think its OK.

Copy link
Contributor

ti-chi-bot bot commented Feb 13, 2025

@CalvinNeo: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-integration-test 6ac0804 link true /test pull-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants