Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand swap capability and reorganize package #6

Merged
merged 10 commits into from
Mar 9, 2019
Merged

Expand swap capability and reorganize package #6

merged 10 commits into from
Mar 9, 2019

Conversation

zietzm
Copy link
Member

@zietzm zietzm commented Dec 28, 2018

In trying to apply the XSwap method to other networks, I have discovered that this implementation fails for several publicly-available networks. A few improvements address most of the underlying issues.

First, I added a preprocessing file to process files containing graphs. The underlying C++ functionality deals with edges represented as int*s. Since permutable graphs may be in files as integers or strings, (eg: node_a,node_b\nnode_c,node_d\n...), it is helpful to be able to process such graphs to a form that is usable by the fast XSwap implementation. Moreover, assigning new ids to the nodes allows us to ensure that nodes are indexed 1, 2, ..., num_edges. This kind of indexing allows larger graphs to be memory-feasible using the fast Cantor pair hash table.

Second, I added an additional hash table implementation using the C++ standard library std::unordered_set. While this is significantly slower than the Cantor pair hash table, it will work for networks with too many potential edges for the original hash table to fit in memory. (I have set the cutoff at 4 GB, though this was a totally arbitrary choice.) To ensure a uniform API, I added a final hash table wrapper, the HashTable class, which can call both the original EdgeHashTable and the new BigHashTable.

@zietzm
Copy link
Member Author

zietzm commented Dec 28, 2018

As an example, using Zachary's karate club network with multiplier=10000, we find that it does not affect the swap outcomes, but significantly affects computation time.

Using slow hash table
43.5897 % unchanged
5.205 seconds

Using fast hash table
43.5897 % unchanged
0.134 seconds

Or using a protein-protein interaction graph due to Vidal and Ma'ayan with multiplier=10

Using slow hash table
1.1894 % not swapped
98.145 seconds

Using fast hash table
1.1894 % not swapped
0.025 seconds

@dhimmel
Copy link
Member

dhimmel commented Dec 29, 2018

In the previous comment, does "fast hash table" refer to the "Cantor pair hash table" and "slow hash table" refer to the std::unordered_set hash table?

@zietzm
Copy link
Member Author

zietzm commented Dec 29, 2018

That is basically correct, though I have found that std::set actually gives better performance than std::unordered_set. I am not sure why, but std::unordered_set had been giving insertion performance much closer to the worst case (O(n)) than to the best case (constant). Since std::set uses a self-balancing binary search tree (O(log(n))), log(n) is proving to be faster than the worst-case hash table. I am looking into the possibility that a better, custom-specified hash function would improve performance on the hash table. Alternatively, it may be that frequent table resizing is slowing down the hash table (unordered_set) implementation. This could be remedied by pre-allocating a larger amount of memory.

Here's a quick rundown of the differences in implementations between set and unordered_set:
https://www.geeksforgeeks.org/set-vs-unordered_set-c-stl/

* Add roaring bitmaps
* Add tests for roaring bitset, rename hash table to correct, bitset
@zietzm
Copy link
Member Author

zietzm commented Jan 18, 2019

@zietzm zietzm changed the title [WIP]: Expand swap capability and reorganize package Expand swap capability and reorganize package Feb 22, 2019
@zietzm zietzm merged commit 59e6c0c into hetio:master Mar 9, 2019
@zietzm zietzm deleted the hashtable-size branch March 9, 2019 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants