[Bug]: autovacuum worker took too long to start; canceled #7485

yangsr12 · 2024-11-25T07:18:22Z

What type of bug is this?

Locking issue, Performance issue, Unexpected error

What subsystems and features are affected?

Background worker

What happened?

行 3592: 2024-11-16 07:49:39.740 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
行 3597: 2024-11-16 07:50:09.744 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
行 3599: 2024-11-16 07:50:39.857 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
行 3703: 2024-11-16 07:51:09.878 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
行 3704: 2024-11-16 07:51:39.903 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled

TimescaleDB version affected

2.11.1

PostgreSQL version used

15.2

What operating system did you use?

windows 10

What installation method did you use?

Source

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

2024-11-16 07:47:10.716 HKT [12176] LOG:  connection received: host=127.0.0.1 port=64055
2024-11-16 07:47:10.718 HKT [12176] LOG:  connection authorized: user=dbuser database=yangdb
2024-11-16 07:47:11.909 HKT [43120] LOG:  disconnection: session time: 0:29:51.153 user=dbuser database=yangdb host=127.0.0.1 port=54352
2024-11-16 07:47:18.049 HKT [33820] LOG:  connection received: host=127.0.0.1 port=64092
2024-11-16 07:47:18.052 HKT [33820] LOG:  connection authorized: user=dbuser database=yangdb
2024-11-16 07:48:25.254 HKT [11552] LOG:  disconnection: session time: 0:29:33.416 user=dbuser database=yangdb host=127.0.0.1 port=54863
2024-11-16 07:48:25.303 HKT [32680] LOG:  connection received: host=127.0.0.1 port=64500
2024-11-16 07:48:25.305 HKT [32680] LOG:  connection authorized: user=dbuser database=yangdb
2024-11-16 07:48:27.549 HKT [38952] LOG:  disconnection: session time: 0:29:31.014 user=dbuser database=yangdb host=127.0.0.1 port=54888
2024-11-16 07:49:12.159 HKT [34224] LOG:  disconnection: session time: 0:29:53.108 user=dbuser database=yangdb host=127.0.0.1 port=55007
2024-11-16 07:49:39.740 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
2024-11-16 07:49:55.081 HKT [38632] LOG:  duration: 60003.969 ms  execute <unnamed>: xxx
2024-11-16 07:49:55.081 HKT [38632] DETAIL:  parameters: xxx
2024-11-16 07:50:09.744 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
2024-11-16 07:50:30.538 HKT [35924] LOG:  disconnection: session time: 0:29:16.448 user=dbuser database=yangdb host=127.0.0.1 port=55613
2024-11-16 07:50:39.857 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
2024-11-16 07:50:43.979 HKT [15392] LOG:  disconnection: session time: 0:29:15.354 user=dbuser database=yangdb host=127.0.0.1 port=55691
2024-11-16 07:50:53.402 HKT [43384] LOG:  duration: 120019.136 ms  execute S_2: INSERT INTO data into timescaled table xxx
2024-11-16 07:50:53.402 HKT [43384] DETAIL:  parameters: xxx
2024-11-16 07:50:55.107 HKT [4288] LOG:  duration: 120030.541 ms  bind <unnamed>: xxx
2024-11-16 07:50:55.107 HKT [4288] DETAIL:  parameters: xxx
2024-11-16 07:50:56.227 HKT [39728] LOG:  disconnection: session time: 0:29:41.247 user=dbuser database=yangdb host=127.0.0.1 port=55618
2024-11-16 07:51:03.313 HKT [42372] LOG:  duration: 120028.196 ms  bind <unnamed>: xxx
2024-11-16 07:51:03.313 HKT [42372] DETAIL:  parameters: xxx
2024-11-16 07:51:06.129 HKT [33124] LOG:  duration: 120010.540 ms  execute <unnamed>: insert into xxx
2024-11-16 07:51:06.129 HKT [33124] DETAIL:  parameters: xxx
2024-11-16 07:51:09.878 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
2024-11-16 07:51:39.903 HKT [17948] WARNING:  autovacuum worker took too long to start; canceled
2024-11-16 07:51:48.963 HKT [19020] LOG:  disconnection: session time: 0:29:42.753 user=dbuser database=yangdb host=127.0.0.1 port=55886
2024-11-16 07:51:53.327 HKT [23584] LOG:  duration: 176946.906 ms  bind <unnamed>: select xxxx
2024-11-16 07:51:53.327 HKT [23584] DETAIL:  parameters: xxx
2024-11-16 07:51:53.332 HKT [16224] LOG:  duration: 179782.672 ms  execute S_156: INSERT INTO data into timescaled table xxx
2024-11-16 07:51:53.332 HKT [16224] DETAIL:  parameters: xxx
2024-11-16 07:51:54.032 HKT [37276] LOG:  duration: 180766.301 ms  execute S_25: INSERT INTO data into timescaled table xxx
2024-11-16 07:51:54.032 HKT [8308] DETAIL:  parameters: xxx
2024-11-16 07:51:54.036 HKT [4288] LOG:  disconnection: session time: 0:30:44.031 user=dbuser database=yangdb host=127.0.0.1 port=55575
2024-11-16 07:51:54.039 HKT [38632] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.
2024-11-16 07:51:54.039 HKT [38632] LOG:  disconnection: session time: 0:07:01.574 user=dbuser database=yangdb host=127.0.0.1 port=63324
2024-11-16 07:51:54.072 HKT [33124] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.
2024-11-16 07:51:54.074 HKT [33820] LOG:  duration: 177694.506 ms  bind <unnamed>: SELECT  xxx
2024-11-16 07:51:54.074 HKT [33820] DETAIL:  parameters: xxx
2024-11-16 07:51:54.074 HKT [33124] LOG:  disconnection: session time: 0:29:08.009 user=dbuser database=yangdb host=127.0.0.1 port=56122
2024-11-16 07:51:54.089 HKT [35480] LOG:  duration: 117831.745 ms  bind <unnamed>: SELECT 
2024-11-16 07:51:54.089 HKT [35480] DETAIL:  parameters: xx
2024-11-16 07:51:54.090 HKT [42372] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.
2024-11-16 07:51:54.091 HKT [42372] LOG:  disconnection: session time: 0:30:37.082 user=dbuser database=yangdb host=127.0.0.1 port=55631
2024-11-16 07:51:54.099 HKT [23584] LOG:  could not send data to client: An established connection was aborted by the software in your host machine.

	
2024-11-16 07:51:54.099 HKT [23584] FATAL:  connection to client lost
2024-11-16 07:51:54.099 HKT [23584] LOG:  disconnection: session time: 0:09:05.001 user=dbuser database=yangdb host=127.0.0.1 port=62648
2024-11-16 07:51:54.102 HKT [37276] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.


2024-11-16 07:51:54.103 HKT [37276] LOG:  disconnection: session time: 0:22:51.139 user=dbuser database=yangdb host=127.0.0.1 port=58142
2024-11-16 07:51:54.103 HKT [33820] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.


2024-11-16 07:51:54.103 HKT [33820] LOG:  disconnection: session time: 0:04:36.086 user=dbuser database=yangdb host=127.0.0.1 port=64092
2024-11-16 07:51:54.103 HKT [35480] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.


2024-11-16 07:51:54.104 HKT [35480] LOG:  disconnection: session time: 0:30:37.575 user=dbuser database=yangdb host=127.0.0.1 port=55627
2024-11-16 07:51:54.107 HKT [37696] LOG:  duration: 167450.492 ms  bind <unnamed>: SELECT xxx
2024-11-16 07:51:54.107 HKT [37696] DETAIL:  parameters: xxx
2024-11-16 07:51:54.111 HKT [37696] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.

	
2024-11-16 07:51:54.113 HKT [37696] LOG:  disconnection: session time: 0:25:40.782 user=dbuser database=yangdb host=127.0.0.1 port=57236
2024-11-16 07:51:54.197 HKT [8308] LOG:  could not receive data from client: An established connection was aborted by the software in your host machine.

	
2024-11-16 07:51:54.198 HKT [8308] LOG:  disconnection: session time: 0:21:01.478 user=dbuser database=yangdb host=127.0.0.1 port=58750
2024-11-16 07:51:54.709 HKT [35036] LOG:  connection received: host=127.0.0.1 port=49213
2024-11-16 07:51:54.790 HKT [35036] LOG:  connection authorized: user=dbuser database=yangdb

How can we reproduce the bug?

I'm not sure as well. There have been several instances where autovacuum failed to start, despite the CPU, memory, hard drive, and other resources monitored by Windows being normal at that time. It seems similar to the issue described below: https://github.com/timescale/timescaledb/issues/6730

Is this a known bug in Timescaledb?

The text was updated successfully, but these errors were encountered:

erimatnor · 2024-11-26T13:53:36Z

This is quite an old version of TimescaleDB. I suggest updating to a newer version to see if the problem persists.

yangsr12 · 2024-11-27T01:23:40Z

This is quite an old version of TimescaleDB. I suggest updating to a newer version to see if the problem persists.

The version updates of TimescaleDB are relatively frequent. Version 2.11.1 was released in June of last year, and now the latest version is already 2.17.2. Rapid version updates indicate that the software is active, which is a good thing. However, for our application, upgrading the version may involve some costs, unless we can confirm this issue caused by TimescaleDB.

Our data volume is relatively large, with approximately 5 million records per hour. The link #6730 mentions that a table with 1 million records may trigger a failure to start autovacuum. From the logs above, it appears that our database not only failed to start autovacuum at that time, but also experienced many slow SQL queries, resulting in the inability to establish new connections to the database.

antekresic · 2024-12-23T09:07:47Z

Having looked at this a bit, I found an interesting post on hackers mailing list:
https://www.postgresql.org/message-id/flat/CAFTmYsGA-uHLArwOkZdbv0-MBgygjTv6CPJh8RMkHq2dpHh8OA%40mail.gmail.com#a9564f492f79986fe870924567ca0b98

Seems very similar to what you are experiencing, also on Windows platform.

Are you able to make new connections to the server once this starts happening? I assume that's what is going on so the autovacuum workers cannot start. Have you reached max_connections perhaps?

At this point, it seems like it has nothing to do with Timescaledb but rather a PG/Windows issue.

Hope this helps.

yangsr12 · 2025-01-02T13:15:09Z

Having looked at this a bit, I found an interesting post on hackers mailing list: https://www.postgresql.org/message-id/flat/CAFTmYsGA-uHLArwOkZdbv0-MBgygjTv6CPJh8RMkHq2dpHh8OA%40mail.gmail.com#a9564f492f79986fe870924567ca0b98

Seems very similar to what you are experiencing, also on Windows platform.

Are you able to make new connections to the server once this starts happening? I assume that's what is going on so the autovacuum workers cannot start. Have you reached max_connections perhaps?

At this point, it seems like it has nothing to do with Timescaledb but rather a PG/Windows issue.

Hope this helps.

Thank you for your reply and findings. The case you discovered is very similar to ours. At that time, new connections could not be established either, and we hadn't reached the max_connections limit (we calculated that the max_connections was set to 200, but there were less than 50 connections at the time of the failure).

Another similarity is that both cases used table spaces set up on other disks. This case is from 2017, has Postgres not resolved this issue after all this time?

yangsr12 added the bug label Nov 25, 2024

yangsr12 changed the title ~~[Bug]: <Title>Timescale DB causes issues with Auto vacuum and Locks up Postgres database causing it to crash: AutoVacuum issue~~ [Bug]: <Title>autovacuum worker took too long to start; canceled Nov 25, 2024

mkindahl changed the title ~~[Bug]: <Title>autovacuum worker took too long to start; canceled~~ [Bug]: autovacuum worker took too long to start; canceled Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: autovacuum worker took too long to start; canceled #7485

[Bug]: autovacuum worker took too long to start; canceled #7485

yangsr12 commented Nov 25, 2024

erimatnor commented Nov 26, 2024

yangsr12 commented Nov 27, 2024

antekresic commented Dec 23, 2024

yangsr12 commented Jan 2, 2025

[Bug]: autovacuum worker took too long to start; canceled #7485

[Bug]: autovacuum worker took too long to start; canceled #7485

Comments

yangsr12 commented Nov 25, 2024

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

erimatnor commented Nov 26, 2024

yangsr12 commented Nov 27, 2024

antekresic commented Dec 23, 2024

yangsr12 commented Jan 2, 2025