VReplication: Optimize replication on target tablets #17166

mattlord · 2024-11-07T02:53:05Z

Description

This PR makes two performance optimizations affecting execution time and resource usage:

For the copy phase: optimize the generation of row values that are used to build bulk INSERT statements
For the running/replicating phase: enable the VPlayerBatching feature by default. This was added in v18 in VReplication VPlayer: support statement and transaction batching #14502. It has been enabled in a number of relevant endtoend tests ever since and it has been used in PlanetScale for a number of important workflows. Enabling it early in the v22 lifecycle also gives us a good 3-6 months for it to bake further on main.

Summary:

Copy phase
- The time it took to finish the copy phase went from 548 seconds to 489: approximately an 11% improvement
- The tablet memory usage/allocations went from 19,842.93MiB to 2,667.88MiB: approximately a 7.43x improvement
Running phase
- The time it took to catch up when subsequently inserting another 10,000,000 rows after the copy phase ended went from 270 seconds to 40 seconds: approximately a 6.75x improvement

The detailed results of the testing can be seen here: https://gist.github.com/mattlord/d6161a1c27f203a497b7bd6a69e7c467

The test setup was created this way:

cd examples/local
alias vtctldclient='command vtctldclient --server=localhost:15999'

# Enable the pprof endpoints
diff --git a/examples/common/scripts/vttablet-up.sh b/examples/common/scripts/vttablet-up.sh
index daa40aee89..282cd0553e 100755
--- a/examples/common/scripts/vttablet-up.sh
+++ b/examples/common/scripts/vttablet-up.sh
@@ -54,6 +54,7 @@ vttablet \
  --service_map 'grpc-queryservice,grpc-tabletmanager,grpc-updatestream' \
  --pid_file $VTDATAROOT/$tablet_dir/vttablet.pid \
  --heartbeat_on_demand_duration=5s \
+ --pprof-http \
  > $VTDATAROOT/$tablet_dir/vttablet.out 2>&1 &



# Setup function to time the copy phase completion
function wait_for_workflow_running() {
    local keyspace=customer
    local workflow=commerce2customer
    local wait_secs=900
    local result=""

    echo "Waiting for the ${workflow} workflow in the ${keyspace} keyspace to finish the copy phase..."

    for _ in $(seq 1 ${wait_secs}); do
        result=$(vtctldclient Workflow --keyspace="${keyspace}" show --workflow="${workflow}" 2>/dev/null | grep "Copy phase completed")
        if [[ ${result} != "" ]]; then
            break
        fi
        sleep 1
    done;

    if [[ ${result} == "" ]]; then
        echo "Timed out after ${wait_secs} seconds waiting for the ${workflow} workflow in the ${keyspace} keyspace to reach the running state"
    else
        echo "The ${workflow} workflow in the ${keyspace} keyspace is now running. $(sed -rn 's/.*"(Copy phase.*)".*/\1/p' <<< "${result}")."
    fi
}


# Setup the commerce keyspace
./101_initial_cluster.sh


# Load data in the customer table
table_file="${VTDATAROOT}/vt_0000000100/data/vt_commerce/customer.ibd"
commerce_primary_uid=$(vtctldclient GetTablets --keyspace commerce --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)

# Generate 5MiB of initial data
size=$((5*1024*1024))
while [[ $(stat -f "%z" "${table_file}") -lt ${size} ]]; do
    command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (customer_id, email) values (${RANDOM}*${RANDOM}, '${RANDOM}[email protected]')" 2> /dev/null
done

say "Initial data load completed"

# Grow that to at least 10GiB
size=$((10*1024*1024*1024))
i=1
while [[ $(stat -f "%z" "${table_file}") -lt ${size} ]]; do
    command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (email) select concat(${i}, email) from customer limit 5000000"
    let i=i+1
done

say "Full data load completed"


# Setup the customer keyspace
./201_customer_tablets.sh


# Move the customer table from the commerce keyspace to the customer keyspace
customer_primary_uid=$(vtctldclient GetTablets --keyspace customer --tablet-type primary --shard "0" | awk '{print $1}' | cut -d- -f2 | bc)

# Profile target primary in a different shell
go tool pprof -seconds 130 "http://localhost:15${customer_primary_uid}/debug/pprof/profile"
(pprof) top 20 -ignore_regex runtime -cum

# Immediately start the workflow
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer create --source-keyspace commerce --tablet-types primary --tables "customer"
date
wait_for_workflow_running
date

say "Workflow is running"

go tool pprof "http://localhost:15${customer_primary_uid}/debug/pprof/allocs"
(pprof) top 30


# Now see how long it takes to replicate 10,000,000 new rows
command mysql -u root --socket "${VTDATAROOT}/vt_0000000${commerce_primary_uid}/mysql.sock" vt_commerce -e "insert into customer (email) select concat(${i}, email) from customer limit 10000000"
sleep 1
date
while [[ $(vtctldclient Workflow --keyspace="customer" show --workflow="commerce2customer" 2>/dev/null | jq -r '.[] | .[0].max_v_replication_transaction_lag') -gt 1 ]]; do
    sleep 1
done
date


# Cancel the workflow
vtctldclient MoveTables --workflow commerce2customer --target-keyspace customer cancel

Related Issue(s)

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation: Document VReplicationExperimentalFlagVPlayerBatching enabled by default website#1888

Signed-off-by: Matt Lord <[email protected]>

vitess-bot · 2024-11-07T02:53:08Z

Signed-off-by: Matt Lord <[email protected]>

codecov · 2024-11-09T17:20:11Z

Codecov Report

Attention: Patch coverage is 96.55172% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.41%. Comparing base (8c51043) to head (ae1928b).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...blet/tabletmanager/vreplication/replicator_plan.go	96.42%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #17166      +/-   ##
==========================================
- Coverage   67.42%   67.41%   -0.01%     
==========================================
  Files        1574     1574              
  Lines      253299   253294       -5     
==========================================
- Hits       170781   170759      -22     
- Misses      82518    82535      +17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Matt Lord <[email protected]>

examples/common/scripts/vttablet-up.sh

Signed-off-by: Matt Lord <[email protected]>

mattlord · 2024-11-10T05:53:37Z

go/vt/vttablet/tabletmanager/vreplication/vdbclient.go

@@ -171,7 +171,7 @@ func (vc *vdbClient) Execute(query string) (*sqltypes.Result, error) {
 func (vc *vdbClient) ExecuteWithRetry(ctx context.Context, query string) (*sqltypes.Result, error) {
 	qr, err := vc.Execute(query)
 	for err != nil {
-		if sqlErr, ok := err.(*sqlerror.SQLError); ok && sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout {
+		if sqlErr, ok := err.(*sqlerror.SQLError); ok && (sqlErr.Number() == sqlerror.ERLockDeadlock || sqlErr.Number() == sqlerror.ERLockWaitTimeout) {


This is unrelated, but I noticed that it was not correct as I was doing testing.

I'll open a separate PR for this since it's worth a quick backport to v19 to prevent a panic if the error we got back from the query was NOT an SQL error.

Signed-off-by: Matt Lord <[email protected]>

mattlord · 2024-11-11T07:04:47Z

proto/binlogdata.proto

@@ -353,6 +353,10 @@ message FieldEvent {
  repeated query.Field fields = 2;
  string keyspace = 3;
  string shard = 4;
+
+  // Field numbers in the gap between shard (4) and enum_set_string_values


The comments in this file are unrelated to this PR but I wanted to get them into main after the accidental field number gap came up in a discussion.

mattlord · 2024-11-11T08:21:29Z

Putting back in Draft to investigate test failures that seemingly came out of nowhere....

mattlord · 2024-11-11T21:10:39Z

Putting back in Draft to investigate test failures that seemingly came out of nowhere....

Addressed via f0f61db and 21564fd

This impacted e2e tests that were not previously using VPlayerBatching. The load generator constantly generates INSERTs, which are then effeciently batched in vplayer so we get ~ 7x more throughput than before and thus we need more time for filtered replication to catch up after we've stopped it for the vdiff. ESPECIALLY since we're using the --update-table-stats flag and the ANALYZE TABLE and its locking causes a pause in updates to the table the load generator is inserting into -- in particular for the test clusters that only have PRIMARY tablets as everything is interacting directly. Signed-off-by: Matt Lord <[email protected]>

mattlord · 2024-11-11T21:28:09Z

go/test/endtoend/vreplication/vdiff_helper_test.go

@@ -35,7 +35,7 @@ import (
 )

 const (
-	vdiffTimeout             = 120 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout
+	vdiffTimeout             = 180 * time.Second // We can leverage auto retry on error with this longer-than-usual timeout


You can see the reason for the vdiff test changes in the commit message: f0f61db

It failed for some materializations Signed-off-by: Matt Lord <[email protected]>

Signed-off-by: Matt Lord <[email protected]>

shlomi-noach

Looks good and as long as all the tests pass it's great.

shlomi-noach · 2024-11-25T13:47:16Z

go/flags/endtoend/vttablet.txt

@@ -413,7 +413,7 @@ Flags:
      --vreplication_copy_phase_duration duration                        Duration for each copy phase loop (before running the next catchup: default 1h) (default 1h0m0s)
      --vreplication_copy_phase_max_innodb_history_list_length int       The maximum InnoDB transaction history that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 1000000)
      --vreplication_copy_phase_max_mysql_replication_lag int            The maximum MySQL replication lag (in seconds) that can exist on a vstreamer (source) before starting another round of copying rows. This helps to limit the impact on the source tablet. (default 43200)
-      --vreplication_experimental_flags int                              (Bitmask) of experimental features in vreplication to enable (default 3)
+      --vreplication_experimental_flags int                              (Bitmask) of experimental features in vreplication to enable (default 7)


shlomi-noach · 2024-11-25T14:16:59Z

go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go

+			}
+			fieldsIndex++
+			field = tp.Fields[fieldsIndex]
+			length = row.Lengths[fieldsIndex]


This looks fine. I don't have the context to understand why this change is necessarily faster. Can you please explain?

It cuts out the overhead of the intermediate data structure colInfo: mainly in memory allocation and also a bit of cpu because of the large number of rows inserted during the copy phase.

shlomi-noach · 2024-11-26T13:24:59Z

Just FYI, for some related purposes I'm running a local benchmark comparing non-batched, batched, and "other" optimized vreplication speed. The benchmark uses a similar case to https://github.com/vitessio/vitess/blob/main/go/test/endtoend/onlineddl/vrepl_stress/onlineddl_vrepl_mini_stress_test.go: a table, limited to 4096 rows, and some 12 concurrent writers continuously INSERTing, DELETEing and UPDATEing the same range of rows, with high conflict probability.

The benchmark is to throttle an Online DDL migration, run X time (10s in this case) of workload, stop, unthrottle, and count the time by which vplayer has caught up. It's not as accurate because this includes Online DDL overhead, but the numbers I'm seeing are:

Some 17s-19s catchup time for non-batched vplayer
Some 6s-8s catchup time for batched vplayer.
In reality, we should subtract a couple of seconds from each. At any case, in that benchmark, batched vplayer is some 3-4 times faster than nonbatched.

rohit-nayak-ps

nice optimisation!

rohit-nayak-ps · 2024-12-02T12:59:32Z

go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go

+	for i, loc := range bindLocations {
+		field = tp.Fields[fieldsIndex]
+		length := row.Lengths[fieldsIndex]
+		for tp.FieldsToSkip[strings.ToLower(field.Name)] {


Not sure why we are using the for loop to skip. This feels more intuitive.

if tp.FieldsToSkip[strings.ToLower(field.Name)] { if length > 0 { offset += length } fieldsIndex++ continue }

Because we don't want to move to the next bind location. We have a bind location X, and we want to get the corresponding next non-skipped field to bind to that location.

rohit-nayak-ps · 2024-12-02T13:17:51Z

go/vt/vttablet/tabletmanager/vreplication/replicator_plan.go

+			}
+			fieldsIndex++
+			field = tp.Fields[fieldsIndex]
+			length = row.Lengths[fieldsIndex]


It cuts out the overhead of the intermediate data structure colInfo: mainly in memory allocation and also a bit of cpu because of the large number of rows inserted during the copy phase.

Signed-off-by: Matt Lord <[email protected]>

mattlord added 2 commits November 6, 2024 21:49

Enable VPlayerBatching by default

23024bd

Signed-off-by: Matt Lord <[email protected]>

Optimize row replication

a5472f4

Signed-off-by: Matt Lord <[email protected]>

mattlord added Component: VReplication Type: Performance labels Nov 7, 2024

github-actions bot added this to the v22.0.0 milestone Nov 7, 2024

mattlord added 2 commits November 7, 2024 11:09

Correct rowInfo cache updates with skipped cols

ab65f0c

Signed-off-by: Matt Lord <[email protected]>

Correct cache mgmt for schema changes and binlog_image=noblob

4371f80

Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vrepl_target_perf branch from c36e603 to 4371f80 Compare November 9, 2024 17:01

Adjust help text

ac27fd5

Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vrepl_target_perf branch from 018c936 to ac27fd5 Compare November 9, 2024 18:18

mattlord added 4 commits November 9, 2024 16:10

Enable pprof for testing

ad8233e

Signed-off-by: Matt Lord <[email protected]>

Merge remote-tracking branch 'origin/main' into vrepl_target_perf

bf930af

Signed-off-by: Matt Lord <[email protected]>

Get rid of the additional row/col info slice altogether

11b3493

Signed-off-by: Matt Lord <[email protected]>

Restore pprof_http default

ed81da1

Signed-off-by: Matt Lord <[email protected]>

colInfo type is no longer needed

fcfa376

Signed-off-by: Matt Lord <[email protected]>

mattlord commented Nov 10, 2024

View reviewed changes

examples/common/scripts/vttablet-up.sh Show resolved Hide resolved

VPlayerBatching is now enabled by default in all tests

2e3334f

Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vrepl_target_perf branch from a237695 to 2e3334f Compare November 10, 2024 05:42

mattlord mentioned this pull request Nov 10, 2024

Document VReplicationExperimentalFlagVPlayerBatching enabled by default vitessio/website#1888

Merged

mattlord commented Nov 10, 2024

View reviewed changes

mattlord marked this pull request as ready for review November 10, 2024 06:03

mattlord requested review from rohit-nayak-ps, shlomi-noach and deepthi as code owners November 10, 2024 06:03

Add comment about field number gap in topodata.FieldEvent

aa359bc

Signed-off-by: Matt Lord <[email protected]>

mattlord requested a review from harshit-gangal as a code owner November 11, 2024 04:32

mattlord removed the request for review from harshit-gangal November 11, 2024 04:36

Add unit test

5f5c76a

Signed-off-by: Matt Lord <[email protected]>

mattlord commented Nov 11, 2024

View reviewed changes

mattlord marked this pull request as draft November 11, 2024 08:21

mattlord force-pushed the vrepl_target_perf branch from 46d733b to f0f61db Compare November 11, 2024 21:26

mattlord commented Nov 11, 2024

View reviewed changes

Revert Fields and bindLocations comparison change

21564fd

It failed for some materializations Signed-off-by: Matt Lord <[email protected]>

mattlord force-pushed the vrepl_target_perf branch from 6ff6f84 to 21564fd Compare November 11, 2024 22:31

mattlord added 2 commits November 11, 2024 17:44

Adjust unit test

b8a3271

Signed-off-by: Matt Lord <[email protected]>

Merge remote-tracking branch 'origin/main' into vrepl_target_perf

5530030

Signed-off-by: Matt Lord <[email protected]>

mattlord marked this pull request as ready for review November 11, 2024 23:06

Unit test tweaks

5f15e3f

Signed-off-by: Matt Lord <[email protected]>

shlomi-noach approved these changes Nov 25, 2024

View reviewed changes

rohit-nayak-ps approved these changes Dec 2, 2024

View reviewed changes

mattlord added 2 commits December 3, 2024 08:32

Merge remote-tracking branch 'origin/main' into vrepl_target_perf

3206eb0

Signed-off-by: Matt Lord <[email protected]>

Enable pprof-http for vtgate and vtctld as well

ae1928b

Signed-off-by: Matt Lord <[email protected]>

mattlord merged commit 551a5f7 into vitessio:main Dec 3, 2024
100 checks passed

mattlord deleted the vrepl_target_perf branch December 3, 2024 14:48

mattlord mentioned this pull request Dec 5, 2024

VReplication: Enable VPlayerBatching in unit tests #17339

Merged

5 tasks

rohit-nayak-ps mentioned this pull request Dec 26, 2024

Flaky TestMoveTables(Un)sharded: Handle race condition #17440

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VReplication: Optimize replication on target tablets #17166

VReplication: Optimize replication on target tablets #17166

mattlord commented Nov 7, 2024 •

edited

Loading

vitess-bot bot commented Nov 7, 2024

codecov bot commented Nov 9, 2024 •

edited

Loading

mattlord Nov 10, 2024 •

edited

Loading

mattlord Nov 11, 2024

mattlord Nov 11, 2024 •

edited

Loading

mattlord commented Nov 11, 2024 •

edited

Loading

mattlord commented Nov 11, 2024 •

edited

Loading

mattlord Nov 11, 2024 •

edited

Loading

shlomi-noach left a comment

shlomi-noach Nov 25, 2024

shlomi-noach Nov 25, 2024

rohit-nayak-ps Dec 2, 2024

shlomi-noach commented Nov 26, 2024

rohit-nayak-ps left a comment

rohit-nayak-ps Dec 2, 2024

mattlord Dec 3, 2024 •

edited

Loading

rohit-nayak-ps Dec 2, 2024

VReplication: Optimize replication on target tablets #17166

VReplication: Optimize replication on target tablets #17166

Conversation

mattlord commented Nov 7, 2024 • edited Loading

Description

Summary:

Related Issue(s)

Checklist

vitess-bot bot commented Nov 7, 2024

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov bot commented Nov 9, 2024 • edited Loading

Codecov Report

mattlord Nov 10, 2024 • edited Loading

Choose a reason for hiding this comment

mattlord Nov 11, 2024

Choose a reason for hiding this comment

mattlord Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

mattlord commented Nov 11, 2024 • edited Loading

mattlord commented Nov 11, 2024 • edited Loading

mattlord Nov 11, 2024 • edited Loading

Choose a reason for hiding this comment

shlomi-noach left a comment

Choose a reason for hiding this comment

shlomi-noach Nov 25, 2024

Choose a reason for hiding this comment

shlomi-noach Nov 25, 2024

Choose a reason for hiding this comment

rohit-nayak-ps Dec 2, 2024

Choose a reason for hiding this comment

shlomi-noach commented Nov 26, 2024

rohit-nayak-ps left a comment

Choose a reason for hiding this comment

rohit-nayak-ps Dec 2, 2024

Choose a reason for hiding this comment

mattlord Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

rohit-nayak-ps Dec 2, 2024

Choose a reason for hiding this comment

mattlord commented Nov 7, 2024 •

edited

Loading

codecov bot commented Nov 9, 2024 •

edited

Loading

mattlord Nov 10, 2024 •

edited

Loading

mattlord Nov 11, 2024 •

edited

Loading

mattlord commented Nov 11, 2024 •

edited

Loading

mattlord commented Nov 11, 2024 •

edited

Loading

mattlord Nov 11, 2024 •

edited

Loading

mattlord Dec 3, 2024 •

edited

Loading