-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiKV initial scan task not cancel/termicated when changefeed paused or failed #11638
Comments
/severity major |
Pause and remove changefeed, close the kv client, this should also close the gRPC connections to the TiKV-CDC, and cancel connections. |
From the logs above, we know that both changefeed and processor closed. But there is still some logs indicates that the kv client fetch data. the region report error, and then restart to connect to that region.
This logs indicates that the processor is blocked for a long time, this may be the root cause of the issue.
|
There is one log indicates that close the puller takes more than 21 minutes. another processor does not have such logs, and blocked for more than 2.5 hours.
|
The root cause of the issue comes from the kv client cannot unregister the event handle from the worker pool immediately. When close the puller, kv client also close, it has to unregister the event handler from the worker pool first, this cost too much time, makes each puller cannot be closed immediately. When there are a lot of tables, each table has one puller, so that some puller not closed, still works normally. This issue should not exist after v7.5, since the kv client was refactored, workerpool is no more used.
|
still find workerpool in v7.5 Lines 350 to 352 in d2c35e4
|
What did you do?
What did you expect to see?
What did you see instead?
After step 1, one changefeed initilization finished (all tables added the changefeed)
After step 3, For the initilized changefeed, table count is 0 for 1.4h+ before it was paused.
After step 4, initial scan pending tasks is decreasing, but it is very slow.
After step 6, the pending tasks disappeared.
Versions of the cluster
v6.5.9
The text was updated successfully, but these errors were encountered: