Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apoc.cypher.runTimeboxed throws "DatabaseException: The transaction has been closed" #82

Closed
neo-technology-build-agent opened this issue Sep 1, 2022 · 7 comments

Comments

@neo-technology-build-agent
Copy link
Collaborator

Issue by tolomaus
Saturday Sep 05, 2020 at 16:21 GMT
Originally opened as neo4j-contrib/neo4j-apoc-procedures#1652


Hi, I'm migrating my graph db from neo4j 3.5 to 4.1 but it looks like the apoc.cypher.runTimeboxed procedure is broken:

neo4j@neo4j> CREATE (n:Person { name: 'Andy', title: 'Developer' });
neo4j@neo4j> CREATE (n:Person { name: 'Andy', title: 'Developer' });
neo4j@neo4j> CREATE (n:Person { name: 'Andy', title: 'Developer' });
neo4j@neo4j> CREATE (n:Person { name: 'Andy', title: 'Developer' });
neo4j@neo4j> CREATE (n:Person { name: 'Andy', title: 'Developer' });
neo4j@neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Andy", title: "Developer"})
(:Person {name: "Andy", title: "Developer"})
(:Person {name: "Andy", title: "Developer"})
(:Person {name: "Andy", title: "Developer"})
(:Person {name: "Andy", title: "Developer"})
neo4j@neo4j> call apoc.cypher.runTimeboxed("MATCH (n) RETURN n", null, 1000);
value
{n: (:Person {name: "Andy", title: "Developer"})}
org.neo4j.driver.exceptions.DatabaseException: The transaction has been closed.
	at org.neo4j.driver.internal.util.Futures.blockingGet(Futures.java:143)
	at org.neo4j.driver.internal.InternalResult.blockingGet(InternalResult.java:128)
	at org.neo4j.driver.internal.InternalResult.hasNext(InternalResult.java:64)
	at org.neo4j.shell.prettyprint.SimpleOutputFormatter.formatAndCount(SimpleOutputFormatter.java:28)
	at org.neo4j.shell.prettyprint.PrettyPrinter.format(PrettyPrinter.java:28)
	at org.neo4j.shell.CypherShell.lambda$executeCypher$0(CypherShell.java:99)
	at java.base/java.util.Optional.ifPresent(Optional.java:176)
	at org.neo4j.shell.CypherShell.executeCypher(CypherShell.java:98)
	at org.neo4j.shell.CypherShell.execute(CypherShell.java:81)
	at org.neo4j.shell.cli.InteractiveShellRunner.runUntilEnd(InteractiveShellRunner.java:107)
	at org.neo4j.shell.Main.runShell(Main.java:121)
	at org.neo4j.shell.Main.startShell(Main.java:87)
	at org.neo4j.shell.Main.main(Main.java:45)
	Suppressed: org.neo4j.driver.internal.util.ErrorUtil$InternalExceptionCause
		at org.neo4j.driver.internal.util.ErrorUtil.newNeo4jError(ErrorUtil.java:85)
		at org.neo4j.driver.internal.async.inbound.InboundMessageDispatcher.handleFailureMessage(InboundMessageDispatcher.java:105)
		at org.neo4j.driver.internal.messaging.v1.MessageReaderV1.unpackFailureMessage(MessageReaderV1.java:83)
		at org.neo4j.driver.internal.messaging.v1.MessageReaderV1.read(MessageReaderV1.java:59)
		at org.neo4j.driver.internal.async.inbound.InboundMessageHandler.channelRead0(InboundMessageHandler.java:83)
		at org.neo4j.driver.internal.async.inbound.InboundMessageHandler.channelRead0(InboundMessageHandler.java:35)
		at org.neo4j.driver.internal.shaded.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
		at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
		at org.neo4j.driver.internal.async.inbound.MessageDecoder.channelRead(MessageDecoder.java:47)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
		at org.neo4j.driver.internal.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at org.neo4j.driver.internal.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at org.neo4j.driver.internal.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at org.neo4j.driver.internal.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
		at org.neo4j.driver.internal.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
		at org.neo4j.driver.internal.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
		at org.neo4j.driver.internal.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
		at org.neo4j.driver.internal.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
		at org.neo4j.driver.internal.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
		at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
		at org.neo4j.driver.internal.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at org.neo4j.driver.internal.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.base/java.lang.Thread.run(Thread.java:832)

I used a clean install of neo4j 4.1.1 + apoc 4.1.0.2 on both centos and mac and even spun up a neo4j Aura instance, all giving the same result.

The test only checks if the result count is higher than zero: https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/16581f3253f82e75506428dfd2a498346c6bfb1d/core/src/test/java/apoc/cypher/CypherTest.java#L180 maybe it doesnt need to hit the database again for this.

See also the following result:

neo4j@neo4j> call apoc.cypher.runTimeboxed("MATCH (n) RETURN n", null, 1000) YIELD value RETURN value.x;
value.x
NULL
NULL
NULL
NULL
NULL

so it looks like all five of the results are correctly found. Since I only want an non-existing property x it just returns NULLs and doesnt have to hit the database neither so avoiding the above error.

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by tolomaus
Sunday Sep 06, 2020 at 10:13 GMT


Some more details I discovered along the way:

It turns out that my actual cypher query still contained some logic that is not supported anymore in neo4j 4.x, like extract(). Unfortunately these errors don't surface when the query is wrapped with a runTimeboxed procedure, they just hit the timeout and return an empty result.

Now that these issues are fixed the runTimeboxed seems to work fine again. The reason why this query works and not the trivial one above is that in my actual query I collect the results in a map.

So the following query works fine:

call apoc.cypher.runTimeboxed("MATCH (n:Person) RETURN {firstName: n.firstName, lastName: n.lastName }", null, 1000)

So the issue may already exist in neo4j 3.x.

It would be good to include an optional flag to the runTimeboxed procedure to not swallow any errors. This would cover errors due to bad queries but also the timeout error. Sending an empty (or partial) result when the timeout is hit is not an option in my use case.

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by tolomaus
Sunday Sep 06, 2020 at 10:17 GMT


see also #1233

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by sarmbruster
Monday Sep 07, 2020 at 13:27 GMT


Hi @tolomaus ,
what you're seeing is a side effect of some internal change along the way from 3.x -> 4.x: a node, relationship or path instance is now bound to the transaction it originates to. You can no longer reuse such a instance in a different transaction except using id(x) to get it's internal id.

apoc.cypher.runTimeboxed executes the statement in a separate transaction. This is necessary otherwise the timer could not kill it off. So you cannot easily reuse nodes or relationships returned.

call apoc.cypher.runTimeboxed("MATCH (n) RETURN n", null, 1000); has an implicit return at the end that shows all properties of the nodes n. However n originates in the inner statement and cannot be used outside, that's why you're seeing an error.

To solve this, you need to rebind the node:

call apoc.cypher.runTimeboxed("MATCH (n) RETURN n", {}, 1000) yield value match (x) where id(x) = id(value.n) return x;

Using the match (x) where id(x)=<old node's internal id> you effectively rebind that node to the current transaction and you're seeing the expected result.

Does that explanation solve the issue for you?

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by tolomaus
Monday Sep 07, 2020 at 16:47 GMT


Hi @sarmbruster thanks for the clear explanation!

The proposed solution feels a bit complicated though, especially if several collections of nodes and/or entities are returned from the query. I guess it may probably also generate a performance hit since all items need to be retrieved again by id.

Luckily for me, as explained above, I only return maps from the timeboxed queries so far and these don't seem to be impacted by the issue.

This timeboxing feature is essential for the availability and stability of my platform since it is very hard to estimate upfront how much time a query will take. I'm guaranteed that no query will abuse the whole database for longer than a couple of seconds. I'm perfectly happy with a simple all or nothing approach: if the timeout is hit then I dont show any results. Partial results are not meaningful in my case.

With that in mind, I wonder if it would make sense for me to stick with the original runTimeboxed implementation of killing the main transaction when the timeout is hit? see https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/800dc4283e324ca119513e7ce97ddd0fad3941bb/src/main/java/apoc/cypher/Timeboxed.java

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by sarmbruster
Tuesday Sep 08, 2020 at 07:12 GMT


Let me double check if it's possible to prevent using a separate transaction.
Are you willing to test drive a snapshot build?

@neo-technology-build-agent
Copy link
Collaborator Author

Comment by tolomaus
Tuesday Sep 08, 2020 at 07:14 GMT


Sure happy to help

@gem-neo4j
Copy link
Contributor

There have been a lot changes here in the last 4 years, and like mentioned there is another timebox related issue that is open about improving the erroring, so I will close this one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants