Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Pool hangs when trying to get connections wich closed on server timeout (they remain idle in pool) #82

Closed
MisteryX opened this issue Oct 24, 2019 · 7 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@MisteryX
Copy link

MisteryX commented Oct 24, 2019

Case:

  1. Executing queries in pool (releasing connections)
  2. Waiting sometime until server timeout is reached on pooled connections
  3. Trying to get connection from pool and... it hangs entire application forever (pool exhausted)
    When it hangs:
    ERROR [lettuce-epollEventLoop-4-2] dev.miku.r2dbc.mysql.client.ReactorNettyClient: Exchange cancelled while exchange is active. This is likely a bug leading to unpredictable outcome.

Mysql server 5.7.27, timeout = 180s

spring.r2dbc.connection-timeout=3s
spring.r2dbc.pool.initial-size=10
spring.r2dbc.pool.max-size=200
spring.r2dbc.pool.max-idle-time=10s
spring.r2dbc.pool.validation-query=SELECT 1
spring.r2dbc.pool.max-create-connection-time=3s
spring.r2dbc.pool.max-acquire-time=5s
spring.r2dbc.pool.max-life-time=180s

spring-data-r2dbc
1.0.0.RC1

r2dbc-pool
0.8.0.RC1

r2dbc-mysql
0.8.0.RC2

@bean
public ConnectionFactory connectionFactory()
{
ConnectionFactoryOptions baseOptions = ConnectionFactoryOptions.parse(getUrl());
ConnectionFactoryOptions.Builder options = ConnectionFactoryOptions.builder().from(baseOptions);

	options
		.option(ConnectionFactoryOptions.USER, getUserName())
		.option(ConnectionFactoryOptions.PASSWORD, getPassword())
		.option(MAX_SIZE, pool.maxSize)
		.option(ConnectionFactoryOptions.CONNECT_TIMEOUT, connectionTimeout);

	ConnectionFactory connectionFactory = ConnectionFactories.get(options.build());

	if (baseOptions.hasOption(ConnectionFactoryOptions.DRIVER) && baseOptions.getValue(ConnectionFactoryOptions.DRIVER).equalsIgnoreCase("pool"))
	{
		ConnectionPoolConfiguration configuration = ConnectionPoolConfiguration.builder(connectionFactory)
			.maxIdleTime(pool.maxIdleTime)
			.initialSize(pool.initialSize)
			.maxSize(pool.maxSize)
			.maxCreateConnectionTime(pool.maxCreateConnectionTime)
			.maxAcquireTime(pool.maxAcquireTime)
			.maxLifeTime(pool.maxLifeTime)
			.validationQuery(pool.validationQuery)
			.registerJmx(true)
			.name("supro.r2dbc.connection-pool")
			.build();
		return new ConnectionPool(configuration);
	}
	return connectionFactory;
}

@Bean
public DatabaseClient databaseClient(ConnectionFactory connectionFactory)
{
	return DatabaseClient.create(connectionFactory);
}
@mp911de
Copy link
Contributor

mp911de commented Oct 25, 2019

Cross-post from r2dbc/r2dbc-pool#42

@mirromutth
Copy link
Owner

Hi there,

Looks like the connections has been killed by server (because of timeout), and the driver does not support reconnect yet. So, response processor would be emit an error and would not set complete flag.

Maybe driver need reconnect when connection has been killed by outside. About this problem, I will take a look that how to do this on JDBC.

@mp911de
Copy link
Contributor

mp911de commented Oct 25, 2019

We had a similar issue with Postgres: pgjdbc/r2dbc-postgresql#177.

It makes sense to complete all active/pending streams with an error signal.

@mirromutth mirromutth self-assigned this Oct 31, 2019
@mirromutth mirromutth added the bug Something isn't working label Oct 31, 2019
@mirromutth mirromutth added this to the 0.8.0.RELEASE milestone Oct 31, 2019
@mirromutth
Copy link
Owner

mirromutth commented Nov 8, 2019

It was solved by #83 :

  • Closed signal and errors will be propagated to all queries (activating and queuing), so they will not be canceled by outside and will be not hang connections
  • Driver still not support auto-reconnect, maybe should the pool is used to recover from closed state

Those change have pushed to 0.8.0.BUILD-SNAPSHOT, target version is 0.8.0.RELEASE.

This issue will be close after Nov 10, if problem still exists, comment to this issue please.

@zhitom
Copy link

zhitom commented Aug 27, 2020

Hi, Have those change pushed to 0.8.2.RELEASE?
I use 0.8.2.RELEASE , and it seems occured again,it's found no idle-connect to create new connect.Actually mysql-server found no connections from r2dbc-mysql,and r2dbc catch exception:
[2020-08-27 19:48:59 WARN reactor-tcp-epoll-2 AbstractChannelHandlerContext.java:311] i.n.c.AbstractChannelHandlerContext ==> An e
xception 'reactor.core.Exceptions$BubblingException: dev.miku.r2dbc.mysql.client.MySqlConnectionClosedException: Connection closed'
[enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following except
ion:
io.netty.channel.unix.Errors$NativeIoException: readAddress(..) failed: Connection reset by peer
[2020-08-27 19:48:59 ERROR reactor-http-epoll-4 Loggers.java:319] reactor.core.publisher.Operators ==> Operator called default onEr
rorDropped
reactor.core.Exceptions$BubblingException: dev.miku.r2dbc.mysql.client.MySqlConnectionClosedException: Connection closed
at reactor.core.Exceptions.bubble(Exceptions.java:173)
at reactor.core.publisher.Operators.onErrorDropped(Operators.java:635)
at reactor.core.publisher.EmitterProcessor.onError(EmitterProcessor.java:275)
at dev.miku.r2dbc.mysql.client.ReactorNettyClient.drainError(ReactorNettyClient.java:254)
at dev.miku.r2dbc.mysql.client.ReactorNettyClient.handleClose(ReactorNettyClient.java:262)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:257)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2016)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:359)
at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onComplete(FluxConcatMap.java:268)
at reactor.core.publisher.EmitterProcessor.checkTerminated(EmitterProcessor.java:489)
at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:411)
at reactor.core.publisher.EmitterProcessor.onNext(EmitterProcessor.java:268)
at dev.miku.r2dbc.mysql.client.ReactorNettyClient.lambda$null$12(ReactorNettyClient.java:205)

@jyeoniii
Copy link

jyeoniii commented Oct 24, 2020

@mirromutth Hello, I'm using r2dbc-pool 0.8.3 and still facing the similar issue.
When server closed the connection, there are some connections that are not closed normally, which leads to cp hang. My team is having hard time debugging it since It occurs occasionally.

Below is the configuration properties.

spring:
  r2dbc:
    pool:
      initial-size: 10
      max-size: 30
      max-idle-time: 15s
      validation-query: SELECT 1
    properties:
      connect-timeout: 10
      ssl: false
      max-create-connection-time: 3
      max-acquire-time: 5
      max-life-time: 15

Here's the log for it. After MySQL server close the connection after wait_timeout, as you can see, 7 of 10 connections are closed normally, but exception occured while closing the others. After this, any request through MySQL hangs and the number of connections is maintained to be 7, which should be 10 in normal case.

2020-10-24 05:40:00.837  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.838  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.839  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.840  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.841  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.841  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.841  WARN 1 --- [tor-tcp-epoll-2] d.m.r.mysql.client.MessageDuplexCodec    : Connection has been closed by peer
2020-10-24 05:40:00.844 ERROR 1 --- [or-http-epoll-3] reactor.core.publisher.Operators         : Operator called default onErrorDropped
dev.miku.r2dbc.mysql.client.MySqlConnectionException: null
        at dev.miku.r2dbc.mysql.client.ClientExceptions.wrap(ClientExceptions.java:47)
        at dev.miku.r2dbc.mysql.client.ReactorNettyClient.resumeError(ReactorNettyClient.java:214)
        at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88)
        at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.innerError(FluxConcatMap.java:301)
        at reactor.core.publisher.FluxConcatMap$ConcatMapInner.onError(FluxConcatMap.java:863)
        at reactor.core.publisher.Operators.error(Operators.java:196)
        at reactor.core.publisher.MonoError.subscribe(MonoError.java:52)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4213)
        at reactor.netty.NettyOutbound.subscribe(NettyOutbound.java:331)
        at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.drain(FluxConcatMap.java:441)
        at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onNext(FluxConcatMap.java:243)
        at reactor.core.publisher.EmitterProcessor.drain(EmitterProcessor.java:426)
        at reactor.core.publisher.EmitterProcessor.onNext(EmitterProcessor.java:268)
        at dev.miku.r2dbc.mysql.client.ReactorNettyClient.lambda$null$12(ReactorNettyClient.java:205)
        at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:73)
        at reactor.core.publisher.MonoRunnable.call(MonoRunnable.java:32)
        at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132)
        at reactor.core.publisher.MonoCreate$DefaultMonoSink.success(MonoCreate.java:156)
        at dev.miku.r2dbc.mysql.client.RequestTask.run(RequestTask.java:46)
        at dev.miku.r2dbc.mysql.client.RequestQueue.submit(RequestQueue.java:93)
        at dev.miku.r2dbc.mysql.client.ReactorNettyClient.lambda$close$13(ReactorNettyClient.java:205)
        at reactor.core.publisher.MonoCreate.subscribe(MonoCreate.java:57)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4213)
        at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.drain(MonoIgnoreThen.java:172)
        at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:56)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4213)
        at reactor.core.publisher.Mono.subscribeWith(Mono.java:4324)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4184)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4120)
        at reactor.pool.SimplePool.drainLoop(SimplePool.java:283)
        at reactor.pool.SimplePool.drain(SimplePool.java:187)
        at reactor.pool.SimplePool.doAcquire(SimplePool.java:136)
        at reactor.pool.AbstractPool$Borrower.request(AbstractPool.java:363)
        at reactor.core.publisher.FluxPeek$PeekSubscriber.request(FluxPeek.java:130)
        at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:103)
        at reactor.core.publisher.FluxPeek$PeekSubscriber.onSubscribe(FluxPeek.java:163)
        at reactor.pool.SimplePool$QueueBorrowerMono.subscribe(SimplePool.java:381)
        at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)
        at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
        at reactor.core.publisher.FluxRetry$RetrySubscriber.resubscribe(FluxRetry.java:110)
        at reactor.core.publisher.MonoRetry.subscribeOrReturn(MonoRetry.java:49)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4198)
        at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:97)
        at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165)
        at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onError(FluxMapFuseable.java:134)
        at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:185)
        at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2344)
        at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.request(FluxHandleFuseable.java:243)
        at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:162)
        at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:103)
        at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:90)
        at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onSubscribe(FluxHandleFuseable.java:148)
        at reactor.core.publisher.MonoCurrentContext.subscribe(MonoCurrentContext.java:35)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4213)
        at reactor.core.publisher.FluxUsingWhen.subscribe(FluxUsingWhen.java:103)
        at reactor.core.publisher.Flux.subscribe(Flux.java:8325)
        at kotlinx.coroutines.reactive.PublisherAsFlow.collectImpl(ReactiveFlow.kt:83)
        at kotlinx.coroutines.reactive.PublisherAsFlow.collect(ReactiveFlow.kt:68)
        at kotlinx.coroutines.flow.FlowKt__CollectionKt.toCollection(Collection.kt:32)
        at kotlinx.coroutines.flow.FlowKt.toCollection(Unknown Source)
        at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList(Collection.kt:15)
        at kotlinx.coroutines.flow.FlowKt.toList(Unknown Source)
        at kotlinx.coroutines.flow.FlowKt__CollectionKt.toList$default(Collection.kt:15)
        at kotlinx.coroutines.flow.FlowKt.toList$default(Unknown Source)
        at 
......

@btakeya
Copy link
Contributor

btakeya commented Apr 28, 2021

Hi @mirromutth, There is a question about Connection has been closed by peer warning message (posted here because looks related with this thread)

Environment

  • Spring boot 2.3.4
  • Kotlin (java compat w/ 1.8)
  • R2DBC-MySQL 0.8.2 (dev.miku:r2dbc-mysql:0.8.2.RELEASE)
  • if any other is needed, just let me know :)

Observation

  1. After specific time has passed from launching app (or initializing db pool), I got these messages (I found the number of this message is pool.initial-size)
    In my case, specific time above is 300sec -- it is wait_timeout of MySQL.
2021-04-28 13:22:16,744 WARN  [reactor-tcp-nio-1] dev.miku.r2dbc.mysql.client.MessageDuplexCodec:136: Connection has been closed by peer
...
  1. Expect setting validationQuery=SELECT 1 and tcpKeepAlive = true would prevent Observation 1, but nothing is changed. (After wait_timeout, MySQL closed connections)
  2. When pool.maxIdleTime > wait_timeout (e.g. maxIdleTime: 10min, wait_timeout: 30sec) after Observation 1, I found query execution is go fail with this message:
dev.miku.r2dbc.mysql.client.MySqlConnectionClosedException: Cannot exchange messages because the connection is closed
	at dev.miku.r2dbc.mysql.client.ClientExceptions.exchangeClosed(ClientExceptions.java:28)
  1. When pool.maxIdleTime <= wait_timeout (e.g. maxIdleTime: 25sec, wait_timeout: 30sec - not sure for equal value) after Observation 1, query execution go success

Question

  • When will try to reconnect closed connections after wait_timeout in Observation 1? after maxIdleTime? or at the time when no available connection? or others... (If I get it correctly, would be latter by Propagate connection closed and errors for message exchange #83)
  • Is there any method to prevent connection closing by MySQL? As Observation 2, I can't find any...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants