Skip to content
This repository has been archived by the owner on Aug 21, 2023. It is now read-only.

replace go-sql-connector/mysql with siddontang/go-mysql/client to improve dumpling's performance #126

Open
lichunzhu opened this issue Aug 4, 2020 · 25 comments
Assignees
Labels
help wanted priority/P2 Medium priority issue

Comments

@lichunzhu
Copy link
Contributor

lichunzhu commented Aug 4, 2020

Description

Background

Currently dumpling's performance is only 1/2 to 2/3 of mydumper. There are two parts that cost a lot of time for dumpling.

  1. After analyzing the torch graph and doing some simple tests we find that dumpling costs a lot of time in fetch one row.

That's because driver.Value in database/sql package is an interface{} type variable.

When we convert []byte type varible to interface{} type, it will use runtime.mallocgc in runtime.convTslice to do that which will cost a lot of time, but we can't change driver.Value.

One solution is to abandon the usage of database/sql and directly use the []byte value readed from mysql server. But this is a huge change for dumpling.

  1. Now dumpling will do these things in serial:

read a row -> escape string -> write to buffer -> read next row ...

Actually, when we escape the value, we can start to read another row to improve the performance. But it seems hard for database/sql package to implement this function, which means we may have to implement the MySQL client by ourselves.

Reference

  1. convertion code in go-mysql: https://github.com/go-sql-driver/mysql/blob/73dc904a9ece5c074295a77abb7a135797a351bf/packets.go#L770
  2. dumpling torch graph:
    image
  3. mydumper torch graph:
    image
  4. code to test the effiency when assign []byte to interface{}
    assign []byte to interface{} will cost much more time than assign to []byte in this test.
    https://gist.github.com/lichunzhu/2433d332b4bfc57fb7c1aa3f404b4c58
  5. Test: If we use dumpling with only scan (disable escape and write), it will cost the same time as mydumper both write and read.
    Revelant torch: image

Tasks

  • improve dumpling's performance, make it better than mydumper (for both single-threaded and multi-threaded running)
    • one possible approach is to replace go-sql-connector/mysql with siddontang/go-mysql/client to improve dumpling's performance. What's more, we need to refactor this package to parallel reading from database, escaping chapters and writing to disks.

Score

  • 6600

Mentor

Recommended Skills

  • performance improvement for golang
@kennytm kennytm added difficulty/3-hard priority/P2 Medium priority issue labels Aug 4, 2020
@lance6716
Copy link
Collaborator

append some information which maybe helpful:

runtime.mallocgc in runtime.convTslice will use per-goroutine's mcache to allocate the space for storing internal interface structure. mcache has preserved some fixed-size slot for allocation, when ran out, mcache will be dynamically resized from other memory.

@Rustin170506
Copy link
Member

/ping

@ti-challenge-bot
Copy link

pong! I am challenge bot.

@ti-challenge-bot

This comment has been minimized.

1 similar comment
@ti-challenge-bot

This comment has been minimized.

@kennytm

This comment has been minimized.

@Rustin170506

This comment has been minimized.

@kennytm

This comment has been minimized.

@ti-challenge-bot ti-challenge-bot bot added picked and removed picked labels Sep 17, 2020
@pingcap pingcap deleted a comment from ti-challenge-bot bot Sep 17, 2020
@pingcap pingcap deleted a comment from Rustin170506 Sep 17, 2020
@pingcap pingcap deleted a comment from ti-challenge-bot bot Sep 17, 2020
@pingcap pingcap deleted a comment from Rustin170506 Sep 17, 2020
@AndreMouche
Copy link
Collaborator

/assign @kennytm

@sylzd
Copy link
Contributor

sylzd commented Sep 27, 2020

/pick-up

@ti-challenge-bot
Copy link

The challenge program issue is already in the assign flow, so you cannot pick up this issue. But the current issue needs help, you can contact @kennytm to try to solve this issue together.

@kennytm kennytm removed their assignment Sep 27, 2020
@sylzd
Copy link
Contributor

sylzd commented Sep 27, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@sylzd You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Oct 4, 2020
@sylzd
Copy link
Contributor

sylzd commented Oct 4, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@sylzd You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Oct 11, 2020
@sylzd
Copy link
Contributor

sylzd commented Oct 13, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

@ti-challenge-bot
Copy link

@sylzd You did not submit PR within 7 days, so give up automatically.

@ti-challenge-bot ti-challenge-bot bot removed the picked label Oct 20, 2020
@kennytm
Copy link
Collaborator

kennytm commented Oct 20, 2020

@sylzd hello, there hasn't been much updates since the pick up, do you need some help?

@sylzd
Copy link
Contributor

sylzd commented Oct 20, 2020

@kennytm yes, replace driver may not help,it cost double. we are trying to replace database/sql or something else.

  1. our debug flame graph result shows that only 7.44% convTslice cost.
    Command:
time ./dumpling -h 10.162.1.1 -u dump_test2 -p* -P 4000 --filetype sql -r 10000 --threads 16 -B sbtest -o dumpling_output

FlameGraph:
image

  1. when we replace go-sql-driver with go-mysql, convTslice only cost 1.22% and real time is double than before.(it cost too much to query)
    image
    image

@lichunzhu
Copy link
Contributor Author

lichunzhu commented Oct 20, 2020

Do you replace go-sql-driver with go-mysql/driver? Or do you go-sql-driver and database/sql with go-mysql/client?

@sylzd
Copy link
Contributor

sylzd commented Oct 20, 2020

/pick-up

@ti-challenge-bot
Copy link

Pick up success.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted priority/P2 Medium priority issue
Projects
None yet
Development

No branches or pull requests

7 participants