Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated AQL queries aren't optimal #43

Open
fdominik opened this issue Feb 6, 2019 · 3 comments
Open

Generated AQL queries aren't optimal #43

fdominik opened this issue Feb 6, 2019 · 3 comments

Comments

@fdominik
Copy link

fdominik commented Feb 6, 2019

The driver doesn't generate optimal AQL queries.

For example the easiest Gremlin query:
g.V().hasLabel("person").next()
In a graph, which has 3 vertex collections

Generates following AQL Query:

(FOR v in UNION( 
  (FOR x1 IN @@col1 RETURN x1),
  (FOR x2 IN @@col2 RETURN x2),
  (FOR x3 IN @@col3 RETURN x3  )
)
RETURN v
)

against db, with bind vars: {@col1=documents, @col2=person, @col3=prison}
This will return ALL vertices in ArangoDB (which might be a significantly unperformant for huge collections), and then the hasLabel("person") is parsed in Java.

However it could be easily transformed to:

FOR v IN person
return v

Where only vertices from the requested collection are returned.

However it all depends if we are able to modify how gremlin executes the Steps (e.g. some Custom Traversal?) or if we are able to get the query into Arango driver...
EDIT: it should e possible using:
Traversal Strategies: A TraversalStrategy can be used to alter a traversal prior to its execution. A typical example is converting a pattern of g.V().has('name','marko') into a global index lookup for all vertices with name "marko". In this way, a O(|V|) lookup becomes an O(log(|V|)). Please review TinkerGraphStepStrategy for ideas.

@fdominik
Copy link
Author

fdominik commented Feb 6, 2019

So I did a first attempt to implement ProviderTraversalStrategy, being inspired by
https://github.com/apache/tinkerpop/blob/master/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/process/traversal/strategy/optimization/TinkerGraphStepStrategy.java
and by
https://github.com/JanusGraph/janusgraph/blame/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphStepStrategy.java

However I haven't fount, where the code to build the Arango query should be placed. Need to postpone my work on this issue on later, so someone can continue in the mean time...

it is at: https://github.com/fdominik/arangodb-tinkerpop-provider/commits/feature/43
contributor access on request.

@arcanefoam
Copy link
Collaborator

Hi, I think that for this type of optimization we would need to implement the OLAP api of Tinkerpop... or be smarter about how Graph::vertices retrieves all vertices from the db. On the later, my current line of thought is moving towards something similar to what you need to do when paginating results for a web app and some clever use of caches...

@arcanefoam
Copy link
Collaborator

I just understood the TraversalStrategy idea. Will look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants