-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: support more and 2048 node/edge labels #64
Comments
@arcanefoam IIRC, the adapter for tinkerpop 2.x worked around this problem by puting all nodes in a collection |
Hi, as stated on #54 the provider architecture and the use of ArangoDB's named graph API makes this change a difficult one. However, this particular one adds the extra burden of not using collections for labels. This would probably result in not being able to use the Graph syntax of AQL thus a complete re-write of the AQL queries used by the current implementation. Thus, implementation wise we would need to change the architecture to add something like a "AQL translator" and then we could provide different implementations, one with Graph AQL, one with plain AQL (plus the integrity checks for #54). |
Sorry, but does this means that, in the end, the real problem is that Arango graphs and AQL itself have a physical limit to number of node/edge labels? |
In the current implementation each label is mapped to a collection, as @dothebart mentioned, there is a limit of 2048 collections and as a result a limit to the number of labels supported by the arango tinkerpop provider. The decision to model labels as collections was an architectural one driven by the decision to rely on the ArangoDB AQL support for graphs (see #58). If labels are modelled as vertex/edge properties then we could have two collections (Vertex, Edges), but we would need to use "normal" AQL and implement all the graph logic in the provider. It is possible, but hard. Does this clarify your question? |
Not completely, does Arango DB natively have a concept of node/edge labels? If yes, I understand that in AQL a node label is the same of a node collection, ditto for edges. |
No, ArangoDB does not have a concept of label, label is a Tinkerpop
concept.
So, for implementation I had two choices: 1. Map labels to *collections*,
i.e. a collection contains all nodes/edges with the same label; 2. Map
labels to document *properties*, i.e. each document will have a property
called "label" with a value of the corresponding node/edge label.
1. **Good:** because it matches the behaviour of ArangoDB graphs, meaning
we can translate most tinkerpop expressions to AQL's graph syntax and we
get graph integrity for free. Users can use tinkerpop to analyse their
existing grpahs. **Bad:** it forces users to define the graph "schema"
beforehand, i.e. node and edge labels must be known a priori. Limited to
2048 labels.
2. **Good:** No need for schema. No 2048 label limit. **Bad:** We can't use
AQL's graph syntax. We don't get grapgh integrity for free. Adding the
label as a property means we would need to "corrupt" existing graphs by
adding additional label property. Possible performance overhead for the
additional filter by property vs select from collection (until we benchmark
it is impossible to know the actual hit).
I went with 1. To support more than 2048 labels we would need to use 2. So
for 2048 we would need to overcome a limitation of the tinkerpop provider
implementation, as in change how it works.
Does this explanation makes it easier to understand?
…On Mon, 24 Aug 2020, 18:43 M. Lissandrini, ***@***.***> wrote:
Not completely, does Arango DB natively have a concept of node/edge
labels? If yes, I understand that in AQL a node label is the same of a node
collection, ditto for edges.
If this is also correct, then, trying to have this driver support more
node/edge labels, is the equivalent of trying to have the driver overcome a
limitation of the underlying system.
Here is then my question: are we trying to achieve this ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#64 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQOU3M3KJJGBSKJ5LWJSH3SCKRDDANCNFSM4QHFUDQQ>
.
|
Thanks, this is all clear. But I still believe we are trying to overcome Arango's limitation: not having labels (which is worse than only allowing 2048 labels) Anyway, for reference to others coming to the issue, here I found the relevant bit: https://www.arangodb.com/docs/stable/graphs.html#multiple-edge-collections-vs-filters-on-edge-document-attributes
|
ArangoDB doesn't have an as strict data model to how edges and vertices have to look like as other solutions available. For ArangoDB all that sets appart edges from regular documents is, that they live in special edge collections which demand (and index) the availability of the https://www.arangodb.com/docs/stable/aql/graphs-traversals.html#filtering-edges-on-the-path demonstrates how to classify edges by an additional property. |
Is not a "strict" data model, it is the property graph model. |
Hi, So its probably a question of the count of your label, whether the cost of maintaining the additional edge collections, or the different indices are higher. |
@dothebart the edge collection is not an options, because i have some 4 thousands edge types |
Yes, for your usecase vertex centric indices are definitely the way to go. So maybe @arcanefoam (or you?) can create the option to choose between vertex centric indices and collections. |
Currently ArangoDB cannot support more than 2048 node/edge labels (in total?)
since all labels are mapped to collections.
See: #63
Some comments:
1 - A new feature would be required to decide whether to store 1-label-per-collection or labels as properties.
2 - This will need to take into account also how indexing/traversal is handled for queries like:
3 - Currently the documentation should mention upfront this limitation
The text was updated successfully, but these errors were encountered: