Large datasets #284

s2tephen · 2016-11-02T00:08:00Z

See the big-data branch for upper bounding on data sets and betweenness centrality estimation. These bounds were determined by running a series of tests on a variety of random graphs to roughly find the sweet spot in the accuracy/runtime tradeoff. Ultimately I was able to get the runtime of the betweenness_centrality function under 90 sec with error less than two decimal places (< .01).

This estimation works in the backend logic/testing suite, but unfortunately still times out in production. I was able to get it working by further lowering the upper bounds: MAX_NODES = 2000 and MAX_V_X_E (nodes * edges) = 10000000. At that point, however, the accuracy of the betweenness centrality estimation is much lower. I can't really invest any more time into fine tuning this but it may be worth revisiting in the future. One possible solution would be to run the betweenness centrality algorithm in parallel, as in this example.

The text was updated successfully, but these errors were encountered:

rahulbot · 2016-11-02T12:38:26Z

Maybe this is the final straw in the argument to host prod on the civic servers. @kanarinka what do you think?

s2tephen added enhancement ConnectTheDots labels Nov 2, 2016

s2tephen referenced this issue Nov 15, 2016

Temporary large dataset solution.

b7da904

kanarinka modified the milestone: v2.0 - Someday Jul 25, 2017

rahulbot removed this from the Someday milestone May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large datasets #284

Large datasets #284

s2tephen commented Nov 2, 2016

rahulbot commented Nov 2, 2016

Large datasets #284

Large datasets #284

Comments

s2tephen commented Nov 2, 2016

rahulbot commented Nov 2, 2016