-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource Isolation Feature Checkpoint 4: Extend cache warmup to allow multiple resource isolation groups and multiple replicas #12
Conversation
…ch should be quieted. Missing unit tests. has some directional todos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
a few small things
List<Long> cnIdsOrderedByPreference = mapper.computeNodesForTablet( | ||
tabletId, props.numReplicasDesired, resourceIsolationGroupId); | ||
if (cnIdsOrderedByPreference.size() < props.numReplicasDesired) { | ||
throw new DdlException(String.format("Requesting more replicas than we have available CN" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we add a TODO: if requesting more replicas than available, shall we trigger a replica load task, from S3 to compute node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally sure what you mean by "replica load task", that basically describes what is already being executed here. Are you saying we should schedule it for later, if/when we have more compute nodes available? I don't think that's a good approach necessarily.
My thinking is this: if we're requesting more replicas than we have compute nodes, we can't fulfill the intention of the statement, so I'm throwing an exception. The other option is to make a best-effort attempt and select all of the available compute nodes.
fe/fe-core/src/main/java/com/starrocks/qe/CacheSelectBackendSelector.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/com/starrocks/sql/analyzer/DataCacheStmtAnalyzer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
one minor suggestion
// really shouldn't be getting ComputeNode references for un-matching resource isolation groups or unhealthy | ||
// ComputeNodes. Instead of changing a bunch of code which uses the WorkerProvider in a specific way, this way | ||
// limits scope to only change behavior when the user of the WorkerProvider sets this very specific option. | ||
public void setAllowGetAnyWorker(boolean allowGetAnyWorker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use a more intuitive naming? e.g. getWorkerFromUnmatchingIsolationGroup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific function isn't getting any worker, it's signaling the intent to allow getting any worker.
…tion-3.3_cacheselect Need to cherrypick from upstream for BE build fix related to error seen when loading: /opt/starrocks/be/lib/starrocks_be: error while loading shared libraries: libbfd-2.38-system.so: cannot open shared object file: No such file or directory
Why I'm doing:
This is part of the implementation of the Resource Isolation for Shared Data Mode feature
What I'm doing:
Makes syntax and functionality changes to
CACHE SELECT
statements to allow caching data for specified resource isolation groups and more than one replica.They can now be of the form
If no
resource_isolation_groups
are specified, uses the resource group of the current frontend.If no
num_replicas
, we assume 1.Fixes #issue https://jira.pinadmin.com/browse/RTA-6269
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: