-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of Graph.delete_vertex/2 in large graphs #80
Comments
Large graphs in terms of number of vertices, number of edges, or both? Looking at the relevant code I would think just number of nodes should not cause an issue, but number of edges would be more impactful.
Style-wise and performance-wise (though a tiny bit) I think pattern matching on the vertices and vertex_labels fields the way all other fields are matched in the function head would be better. For each vertex deleted it traverses 3 collections of edges (out edges, in edges, and edges). At some point that gets pricy. Can probably do a single pass over
Alternatively change the data structure to just EDIT
Weirdly it returns almost the exact same time for the current version and for my change. I think my test is flawed. 121-128ms current version, 118-125ms my version over ten runs of the delete call. Not sure if my example is too simplistic, not enough data, or I've just misdiagnosed the problem. |
So the issue was the test I was using. With the following I get 25-30x speedup using my version versus the previous version:
For graphs of 1000 nodes it shows ~8-10x speedup over the current implementation. PR submitted #84 |
I think there might be a similar performance issue with |
@stevensonmt The performance issue (~18ms for vertex deletes) is occurring on Hologram's call graph when installed in an application - so it's dealing with a large number of nodes (MFAs) and edges (function calls) in total. |
@bitwalker Fixing this issue would speedup a lot Hologram incremental compilation and live reloading. |
@bartblast I'll try and get to this today, sorry for the delay - unfortunately I have minimal time to maintain some of my open source projects for the time being. |
Thinking about this some more and I think there might be an even more efficient way to do this. Let's say there are 1000 vertices, each vertex has a 100 incident edges and 100 emanating edges. That's 200k edges for the whole graph. The old version was visiting each vertex twice for the incident and emanating edge collection and then all the edges in the graph once. The proposed version only has to visit the collection of all edges. I think there is a way to visit only the subset of edges involving the vertex of interest. Not where I can test this out until this evening, but something like this:
Now instead of visiting 200k edges it visits 200 that are identified as involving v_id. Does this make sense? Need to test it. Intuitively I think this might be faster for sparsely connected graphs (each vertex has relatively few edges) where the current proposal could be faster for densely connected graphs (each vertex is connected to nearly every other vertex). |
So my intuition was correct for a sparsely connected graph the version above is vastly superior. This is for a sparse graph with many vertices, each with one in edge and one out edge.
Subsequent runs were similar. Changing the graph to ~2k vertices but ~1.9M edges yields:
So still significantly better. You also see the old version and the current PR version converging in a graph with fewer nodes but more edges. This is because the old version is more heavily impacted by the number of nodes with any edges but both are relatively equally affected by the number of total edges in the graph. The new version is affected only by the number of edges connected to the vertex being deleted. A more extreme graph with 1_000_000 vertices with 999_999 edges would likely show an even bigger improvement but my poor 14-year-old i5 w/16Gb slow RAM just can't. EDIT |
I'm getting ~18 ms on vertex deletes in large graphs, is this expected? And if so, is it theoretically possible to do anything about it?
The text was updated successfully, but these errors were encountered: