Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VReplication: Optimize replication on target tablets #17166

Merged
merged 20 commits into from
Dec 3, 2024

Conversation

mattlord
Copy link
Contributor

@mattlord mattlord commented Nov 7, 2024

Description

This PR makes two performance optimizations affecting execution time and resource usage:

  1. For the copy phase: optimize the generation of row values that are used to build bulk INSERT statements
  2. For the running/replicating phase: enable the VPlayerBatching feature by default. This was added in v18 in VReplication VPlayer: support statement and transaction batching #14502. It has been enabled in a number of relevant endtoend tests ever since and it has been used in PlanetScale for a number of important workflows. Enabling it early in the v22 lifecycle also gives us a good 3-6 months for it to bake further on main.

Summary:

  • Copy phase
    • The time it took to finish the copy phase went from 548 seconds to 489: approximately an 11% improvement
    • The tablet memory usage/allocations went from 19,842.93MiB to 2,667.88MiB: approximately a 7.43x improvement
  • Running phase
    • The time it took to catch up when subsequently inserting another 10,000,000 rows after the copy phase ended went from 270 seconds to 40 seconds: approximately a 6.75x improvement

The detailed results of the testing can be seen here: https://gist.github.com/mattlord/d6161a1c27f203a497b7bd6a69e7c467

The test setup was created this way:

cd examples/local
alias vtctldclient='command vtctldclient --server=localhost:15999'

# Enable the pprof endpoints
diff --git a/examples/common/scripts/vttablet-up.sh b/examples/common/scripts/vttablet-up.sh
index daa40aee89..282cd0553e 100755
--- a/examples/common/scripts/vttablet-up.sh
+++ b/examples/common/scripts/vttablet-up.sh
@@ -54,6 +54,7 @@ vttablet \
  --service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
  --pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \
  --heartbeat_on_demand_duration=5s \
+ --pprof-http \
  > $VTDATAROOT/$tablet_dir/vttablet.out 2>&1 &



# Setup function to time the copy phase completion
function wait_for_workflow_running() {
    local keyspace=customer
    local workflow=commerce2customer
    local wait_secs=900
    local result=""

    echo "Waiting for the ${workflow} workflow in the ${keyspace} keyspace to finish the copy phase..."

    for _ in $(seq 1 ${wait_secs}); do
        result=$(vtctldclient Workflow --keyspace="${keyspace}" show --workflow="${workflow}" 2>/dev/null | grep "Copy phase completed")
        if [[ ${result} != "" ]]; then
            break
        fi
        sleep 1
    done;

    if [[ ${result} == "" ]]; then
        echo "Timed out after ${wait_secs} seconds waiting for the ${workflow} workflow in the ${keyspace} keyspace to reach the running state"
    else
        echo "The ${workflow} workflow in the ${keyspace} keyspace is now running. $(sed -rn 's/.*"(Copy phase.*)".*/\1/p' <<< "${result}")."
    fi
}


# Setup the commerce keyspace
./101_initial_cluster.sh


# Load data in the customer table
table_file="${VTDATAROOT}/vt_0000000100/data/vt_commerce/customer.ibd"
commerce_primary_uid=$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)

# Generate 5MiB of initial data
size=$((5*1024*1024))
while [[ $(stat -f "%z" "${table_file}") -lt ${size} ]]; do
    command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (customer_id, email) values (${RANDOM}*${RANDOM}, '${RANDOM}[email protected]')" 2> /dev/null
done

say "Initial data load completed"

# Grow that to at least 10GiB
size=$((10*1024*1024*1024))
i=1
while [[ $(stat -f "%z" "${table_file}") -lt ${size} ]]; do
    command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (email) select concat(${i}, email) from customer limit 5000000"
    let i=i+1
done

say "Full data load completed"


# Setup the customer keyspace
./201_customer_tablets.sh


# Move the customer table from the commerce keyspace to the customer keyspace
customer_primary_uid=$(vtctldclient GetTablets --keyspace customer --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)

# Profile target primary in a different shell
go tool pprof -seconds 130 "http://localhost:15${customer_primary_uid}/debug/pprof/profile"
(pprof) top 20 -ignore_regex runtime -cum

# Immediately start the workflow
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer create --source-keyspace commerce --tablet-types primary --tables "customer"
date
wait_for_workflow_running
date

say "Workflow is running"

go tool pprof "http://localhost:15${customer_primary_uid}/debug/pprof/allocs"
(pprof) top 30


# Now see how long it takes to replicate 10,000,000 new rows
command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (email) select concat(${i}, email) from customer limit 10000000"
sleep 1
date
while [[ $(vtctldclient Workflow --keyspace="customer" show --workflow="commerce2customer" 2>/dev/null | jq -r '.[] | .[0].max_v_replication_transaction_lag') -gt 1 ]]; do
    sleep 1
done
date


# Cancel the workflow
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer cancel

Related Issue(s)

Checklist

Copy link
Contributor

vitess-bot bot commented Nov 7, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Nov 7, 2024
@github-actions github-actions bot added this to the v22.0.0 milestone Nov 7, 2024
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.41%. Comparing base (8c51043) to head (ae1928b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...blet/tabletmanager/vreplication/replicator_plan.go 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17166      +/-   ##
==========================================
- Coverage   67.42%   67.41%   -0.01%     
==========================================
  Files        1574     1574              
  Lines      253299   253294       -5     
==========================================
- Hits       170781   170759      -22     
- Misses      82518    82535      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Nov 10, 2024
@@ -171,7 +171,7 @@ func (vc *vdbClient) Execute(query string) (*sqltypes.Result, error) {
func (vc *vdbClient) ExecuteWithRetry(ctx context.Context, query string) (*sqltypes.Result, error) {
qr, err := vc.Execute(query)
for err != nil {
if sqlErr, ok := err.(*sqlerror.SQLError); ok && sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout {
if sqlErr, ok := err.(*sqlerror.SQLError); ok && (sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout) {
Copy link
Contributor Author

@mattlord mattlord Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated, but I noticed that it was not correct as I was doing testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a separate PR for this since it's worth a quick backport to v19 to prevent a panic if the error we got back from the query was NOT an SQL error.

@mattlord mattlord marked this pull request as ready for review November 10, 2024 06:03
@mattlord mattlord removed the request for review from harshit-gangal November 11, 2024 04:36
Signed-off-by: Matt Lord <[email protected]>
@@ -353,6 +353,10 @@ message FieldEvent {
repeated query.Field fields = 2;
string keyspace = 3;
string shard = 4;

// Field numbers in the gap between shard (4) and enum_set_string_values
Copy link
Contributor Author

@mattlord mattlord Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments in this file are unrelated to this PR but I wanted to get them into main after the accidental field number gap came up in a discussion.

@mattlord
Copy link
Contributor Author

mattlord commented Nov 11, 2024

Putting back in Draft to investigate test failures that seemingly came out of nowhere....

@mattlord mattlord marked this pull request as draft November 11, 2024 08:21
@mattlord
Copy link
Contributor Author

mattlord commented Nov 11, 2024

Putting back in Draft to investigate test failures that seemingly came out of nowhere....

Addressed via f0f61db and 21564fd

This impacted e2e tests that were not previously using VPlayerBatching.

The load generator constantly generates INSERTs, which are then
effeciently batched in vplayer so we get ~ 7x more throughput than
before and thus we need more time for filtered replication to catch
up after we've stopped it for the vdiff. ESPECIALLY since we're
using the --update-table-stats flag and the ANALYZE TABLE and its
locking causes a pause in updates to the table the load generator is
inserting into -- in particular for the test clusters that only
have PRIMARY tablets as everything is interacting directly.

Signed-off-by: Matt Lord <[email protected]>
@@ -35,7 +35,7 @@ import (
)

const (
vdiffTimeout = 120 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout
vdiffTimeout = 180 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout
Copy link
Contributor Author

@mattlord mattlord Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see the reason for the vdiff test changes in the commit message: f0f61db

It failed for some materializations

Signed-off-by: Matt Lord <[email protected]>
@mattlord mattlord marked this pull request as ready for review November 11, 2024 23:06
Signed-off-by: Matt Lord <[email protected]>
Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and as long as all the tests pass it's great.

@@ -413,7 +413,7 @@ Flags:
--vreplication_copy_phase_duration duration Duration for each copy phase loop (before running the next catchup: default 1h) (default 1h0m0s)
--vreplication_copy_phase_max_innodb_history_list_length int The maximum InnoDB transaction history that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 1000000)
--vreplication_copy_phase_max_mysql_replication_lag int The maximum MySQL replication lag (in seconds) that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 43200)
--vreplication_experimental_flags int (Bitmask) of experimental features in vreplication to enable (default 3)
--vreplication_experimental_flags int (Bitmask) of experimental features in vreplication to enable (default 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
fieldsIndex++
field = tp.Fields[fieldsIndex]
length = row.Lengths[fieldsIndex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine. I don't have the context to understand why this change is necessarily faster. Can you please explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cuts out the overhead of the intermediate data structure colInfo: mainly in memory allocation and also a bit of cpu because of the large number of rows inserted during the copy phase.

@shlomi-noach
Copy link
Contributor

Just FYI, for some related purposes I'm running a local benchmark comparing non-batched, batched, and "other" optimized vreplication speed. The benchmark uses a similar case to https://github.com/vitessio/vitess/blob/main/go/test/endtoend/onlineddl/vrepl_stress/onlineddl_vrepl_mini_stress_test.go: a table, limited to 4096 rows, and some 12 concurrent writers continuously INSERTing, DELETEing and UPDATEing the same range of rows, with high conflict probability.

The benchmark is to throttle an Online DDL migration, run X time (10s in this case) of workload, stop, unthrottle, and count the time by which vplayer has caught up. It's not as accurate because this includes Online DDL overhead, but the numbers I'm seeing are:

  • Some 17s-19s catchup time for non-batched vplayer
  • Some 6s-8s catchup time for batched vplayer.
    In reality, we should subtract a couple of seconds from each. At any case, in that benchmark, batched vplayer is some 3-4 times faster than nonbatched.

Copy link
Contributor

@rohit-nayak-ps rohit-nayak-ps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice optimisation!

for i, loc := range bindLocations {
field = tp.Fields[fieldsIndex]
length := row.Lengths[fieldsIndex]
for tp.FieldsToSkip[strings.ToLower(field.Name)] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we are using the for loop to skip. This feels more intuitive.

if tp.FieldsToSkip[strings.ToLower(field.Name)] {
    if length > 0 {
          offset += length
    }
    fieldsIndex++
    continue
}

Copy link
Contributor Author

@mattlord mattlord Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don't want to move to the next bind location. We have a bind location X, and we want to get the corresponding next non-skipped field to bind to that location.

}
fieldsIndex++
field = tp.Fields[fieldsIndex]
length = row.Lengths[fieldsIndex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cuts out the overhead of the intermediate data structure colInfo: mainly in memory allocation and also a bit of cpu because of the large number of rows inserted during the copy phase.

@mattlord mattlord merged commit 551a5f7 into vitessio:main Dec 3, 2024
100 checks passed
@mattlord mattlord deleted the vrepl_target_perf branch December 3, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants