Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch query support for drop step [tp-tests] #4456

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

porunov
Copy link
Member

@porunov porunov commented May 14, 2024

  • Reuse multi-query optimization for TinkerPop DropStep
  • Change restriction on eligible multi-query traversals and allow multi-query optimizations to be used for queries with steps which contain drop() step
  • Add release template for JanusGraph 1.1.0

Benchmarks:

CQLMultiQueryDropBenchmark.dropVertices                                                                                       N/A           true                          N/A              5000  avgt    5     98.069 ±   24.496  ms/op
CQLMultiQueryDropBenchmark.dropVertices                                                                                       N/A          false                          N/A              5000  avgt    5   1451.917 ±   90.617  ms/op
CQLMultiQueryDropBenchmark.dropVerticesGremlinQuery                                                                           N/A           true                          N/A              5000  avgt    5    106.444 ±   38.290  ms/op
CQLMultiQueryDropBenchmark.dropVerticesGremlinQuery                                                                           N/A          false                          N/A              5000  avgt    5   1452.070 ±  130.414  ms/op

Conclusion: same as with direct usage of multi-query the usage of Gremlin drop() step benefits from re-using multi-query capabilities. Performance of drop query processing improved more than 13 times.

Related apache/tinkerpop#2609
Related apache/tinkerpop#2612


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@porunov porunov added this to the Release v1.1.0 milestone May 14, 2024
@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label May 15, 2024
@porunov porunov force-pushed the feature/drop-step-batch-support branch 4 times, most recently from 476f92e to 1c7d543 Compare May 17, 2024 22:09
@porunov
Copy link
Member Author

porunov commented May 17, 2024

Currently I'm stuck on the following two TinkerPop tests:

Error:  Failures: 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_E_propertiesXweightX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
[INFO] 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_V_properties_propertiesXstartTimeX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<2>

The first test checks that drop() works for Edge properties. The second one checks that drop() works for meta properties.
For some reason those two TinkerPop tests are failing with this PR. However, I verified that the removal operation works in this PR by adding testMetaPropertiesDrop and testEdgePropertiesDrop tests (similarly as in those TinkerPop tests).
It can be seen that these added tests are passing. For now the reason for these two TinkerPop tests to fail is unknown to me.
Additional pair of eyes would me much appreciated.

TinkerPop tests for inmemory database can be triggered with:

mvn clean install --projects janusgraph-inmemory -DskipTests=true --batch-mode --also-make

mvn verify --projects janusgraph-inmemory -Dtest.skip.tp=false -Dtest=org.janusgraph.blueprints.inmemory.process.InMemoryMultiQueryJanusGraphProcessTest -pl janusgraph-inmemory

@porunov
Copy link
Member Author

porunov commented May 17, 2024

One of the observations is that if I change the default barrier step size from the default 2500 to 1 (query.batch.limited-size=1) then all tests passes.
I.e. it means that the fact that we store those properties in the barrier before we actually remove them changes the result of those two tests.

@ntisseyre
Copy link
Contributor

ntisseyre commented May 17, 2024

Currently I'm stuck on the following two TinkerPop tests:

Error:  Failures: 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_E_propertiesXweightX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_E_propertiesXweightX_drop:98->DropTest.lambda$g_E_propertiesXweightX_drop$1:98 expected:<0> but was:<1>
[INFO] 
Error:  org.apache.tinkerpop.gremlin.process.traversal.step.filter.DropTest$Traversals.g_V_properties_propertiesXstartTimeX_drop
Error:    Run 1: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<1>
Error:    Run 2: DropTest$Traversals>DropTest.g_V_properties_propertiesXstartTimeX_drop:109->DropTest.lambda$g_V_properties_propertiesXstartTimeX_drop$3:109 expected:<0> but was:<2>

The first test checks that drop() works for Edge properties. The second one checks that drop() works for meta properties. For some reason those two TinkerPop tests are failing with this PR. However, I verified that the removal operation works in this PR by adding testMetaPropertiesDrop and testEdgePropertiesDrop tests (similarly as in those TinkerPop tests). It can be seen that these added tests are passing. For now the reason for these two TinkerPop tests to fail is unknown to me. Additional pair of eyes would me much appreciated.

TinkerPop tests for inmemory database can be triggered with:

mvn clean install --projects janusgraph-inmemory -DskipTests=true --batch-mode --also-make

mvn verify --projects janusgraph-inmemory -Dtest.skip.tp=false -Dtest=org.janusgraph.blueprints.inmemory.process.InMemoryMultiQueryJanusGraphProcessTest -pl janusgraph-inmemory

I noticed, I am not sure if it is related, but you might have missed adding JanusGraphDropStep here:

.addStrategies(AdjacentVertexFilterOptimizerStrategy.instance(),

@porunov
Copy link
Member Author

porunov commented May 17, 2024

I noticed, I am not sure if it is related, but you might have missed adding JanusGraphDropStep here:

JanusGraphDropStep is replaced as part of JanusGraphLocalQueryOptimizerStrategy. So I believe it should be fine.
I feels that those two TinkerPop tests don't like that I'm storing properties for removal in the barrier step for some reason.
However, I can't figure out "why" yet.

@porunov
Copy link
Member Author

porunov commented May 18, 2024

After a bit of debugging I found the root cause of the problem.
When removing Edge properties I see that each edge property is wrapped into B_O_Traverser which is passed to NoOpBarrier step. That traverser is added to the TraverserSet here.
After I loaded the same graph as in TinkerPop tests locally I noticed that 4 out of 6 edge properties were removed, but 2 didn't because they were considered the "same" properties as were already added into the TraverserSet (thus their Traverser's merged).
The root cause is that relation is not checked on equality for SimpleJanusGraphProperty. This bug was introduced 9 years ago here: 795d7b0

The fix should be a simple one and I will add a separate test case with barrier() + drop() of edge properties with the same key and value.

@porunov
Copy link
Member Author

porunov commented May 18, 2024

After a bit of debugging I found the root cause of the problem. When removing Edge properties I see that each edge property is wrapped into B_O_Traverser which is passed to NoOpBarrier step. That traverser is added to the TraverserSet here. After I loaded the same graph as in TinkerPop tests locally I noticed that 4 out of 6 edge properties were removed, but 2 didn't because they were considered the "same" properties as were already added into the TraverserSet (thus their Traverser's merged). The root cause is that relation is not checked on equality for SimpleJanusGraphProperty. This bug was introduced 9 years ago here: 795d7b0

The fix should be a simple one and I will add a separate test case with barrier() + drop() of edge properties with the same key and value.

Huh. Seems I was a bit wrong. If I fix that drop tests then dedup tests start to fail. Hmm. Now I'm confused where the fix should happen.

@porunov
Copy link
Member Author

porunov commented May 18, 2024

Seems I'm missing how barrier step suppose to work. It could be that we actually don't have any bug which I described above and it's actually intended to be like that (at least I verified that the behavior is consistent between TinkerGraph and JanusGraph, so I guess we are fine there).
However, I think it makes sense to double check with the TinkerPop community regrading barrier and dedup behavior. Thus, I opened a thread here: https://discord.com/channels/838910279550238720/1241219665376706603

If that is confirmed to be the intended behavior then I will try to think about a bit different solution.

@porunov
Copy link
Member Author

porunov commented May 18, 2024

One of the potential solutions is to make out own NoOpBarrierStep which skips anything except Vertex and insert that barrier step as part of multi-query optimization instead of the standard NoOpBarrierStep.
However, there are many places in code logic where we need to check and sometimes cast a step to NoOpBarrierStep. It would make sense to extend NoOpBarrierStep and overwrite part of the logic, but NoOpBarrierStep is currently final and the necessary properties are private which makes things a bit challenging.
Potentially, when the following PR is merged and the next TinkerPop version is released it should be easier to extend logic: apache/tinkerpop#2612

@porunov
Copy link
Member Author

porunov commented May 24, 2024

Both TinkerPop PRs are now merged (apache/tinkerpop#2612 and apache/tinkerpop#2609). This will allow us to extend NoOpBarrierStep to skip everything except vertices. Thus should be also an improvement by itself because there is no reason for multi-query strategy to inject barrier steps which trigger batching for anything except edges because we multi-query step registers Vertices only as for now, and thus anything else is just a potential waist of operations.
When we replace it with our own barrier step it will allow us:

  • Safely open multi-query strategy for drop() step.
  • Slightly improve processing of edges / properties / meta properties because they won't be affected by the barrier anymore (i.e. a bit less of unnecessary operations).

Now we need to wait for TinkerPop 3.7.3 to proceed without doing major hacks.

@porunov porunov force-pushed the feature/drop-step-batch-support branch 3 times, most recently from 72db163 to 7429b5f Compare June 1, 2024 00:12
@porunov porunov marked this pull request as ready for review June 1, 2024 08:51
@porunov porunov requested a review from ntisseyre June 1, 2024 08:52
@porunov
Copy link
Member Author

porunov commented Jun 1, 2024

The PR is now ready for the review. All tests passed as can be seen from my branch (here).

Notes:

  • This PR is using TinkerPop 3.7.3-SNAPSHOT. Thus, we can't merge it until TinkerPop 3.7.3 is released.
  • The attached link to my branch shows that all tests (including tp-tests) passed. However, if berkeleydb tp-tests fail for this PR - it's not related to this PR. We currently have flaky berkeleydb configuration in master branch which makes tp-tests for berkeleydb fail often. This problem appeared after merging the following PR fix: handle BerkeleyJE DB interruption [tp-tests] #4425 .

- Reuse multi-query optimization for TinkerPop DropStep
- Change restriction on eligible multi-query traversals and allow multi-query optimizations to be used for queries with steps which contain drop() step

Signed-off-by: Oleksandr Porunov <[email protected]>
@porunov porunov force-pushed the feature/drop-step-batch-support branch from 7429b5f to 72e2427 Compare October 30, 2024 07:50
@porunov
Copy link
Member Author

porunov commented Oct 30, 2024

@ntisseyre let me know if you want to review it or I will merge this PR by following lazy consensus

@ntisseyre
Copy link
Contributor

@ntisseyre let me know if you want to review it or I will merge this PR by following lazy consensus

Let's merge, thanks @porunov !

@porunov porunov merged commit 0cb87d3 into JanusGraph:master Oct 30, 2024
184 of 185 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants