-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand swap capability and reorganize package #6
Conversation
As an example, using Zachary's karate club network with Using slow hash table Using fast hash table Or using a protein-protein interaction graph due to Vidal and Ma'ayan with Using slow hash table Using fast hash table |
In the previous comment, does "fast hash table" refer to the "Cantor pair hash table" and "slow hash table" refer to the |
That is basically correct, though I have found that Here's a quick rundown of the differences in implementations between |
* Add roaring bitmaps * Add tests for roaring bitset, rename hash table to correct, bitset
In trying to apply the XSwap method to other networks, I have discovered that this implementation fails for several publicly-available networks. A few improvements address most of the underlying issues.
First, I added a preprocessing file to process files containing graphs. The underlying C++ functionality deals with edges represented as
int*
s. Since permutable graphs may be in files as integers or strings, (eg:node_a,node_b\nnode_c,node_d\n
...), it is helpful to be able to process such graphs to a form that is usable by the fast XSwap implementation. Moreover, assigning new ids to the nodes allows us to ensure that nodes are indexed 1, 2, ...,num_edges
. This kind of indexing allows larger graphs to be memory-feasible using the fast Cantor pair hash table.Second, I added an additional hash table implementation using the C++ standard library
std::unordered_set
. While this is significantly slower than the Cantor pair hash table, it will work for networks with too many potential edges for the original hash table to fit in memory. (I have set the cutoff at 4 GB, though this was a totally arbitrary choice.) To ensure a uniform API, I added a final hash table wrapper, theHashTable
class, which can call both the originalEdgeHashTable
and the newBigHashTable
.