Expand swap capability and reorganize package #6

zietzm · 2018-12-28T04:10:04Z

In trying to apply the XSwap method to other networks, I have discovered that this implementation fails for several publicly-available networks. A few improvements address most of the underlying issues.

First, I added a preprocessing file to process files containing graphs. The underlying C++ functionality deals with edges represented as int*s. Since permutable graphs may be in files as integers or strings, (eg: node_a,node_b\nnode_c,node_d\n...), it is helpful to be able to process such graphs to a form that is usable by the fast XSwap implementation. Moreover, assigning new ids to the nodes allows us to ensure that nodes are indexed 1, 2, ..., num_edges. This kind of indexing allows larger graphs to be memory-feasible using the fast Cantor pair hash table.

Second, I added an additional hash table implementation using the C++ standard library std::unordered_set. While this is significantly slower than the Cantor pair hash table, it will work for networks with too many potential edges for the original hash table to fit in memory. (I have set the cutoff at 4 GB, though this was a totally arbitrary choice.) To ensure a uniform API, I added a final hash table wrapper, the HashTable class, which can call both the original EdgeHashTable and the new BigHashTable.

zietzm · 2018-12-28T04:14:59Z

As an example, using Zachary's karate club network with multiplier=10000, we find that it does not affect the swap outcomes, but significantly affects computation time.

Using slow hash table
43.5897 % unchanged
5.205 seconds

Using fast hash table
43.5897 % unchanged
0.134 seconds

Or using a protein-protein interaction graph due to Vidal and Ma'ayan with multiplier=10

Using slow hash table
1.1894 % not swapped
98.145 seconds

Using fast hash table
1.1894 % not swapped
0.025 seconds

dhimmel · 2018-12-29T17:03:52Z

In the previous comment, does "fast hash table" refer to the "Cantor pair hash table" and "slow hash table" refer to the std::unordered_set hash table?

zietzm · 2018-12-29T17:10:09Z

That is basically correct, though I have found that std::set actually gives better performance than std::unordered_set. I am not sure why, but std::unordered_set had been giving insertion performance much closer to the worst case (O(n)) than to the best case (constant). Since std::set uses a self-balancing binary search tree (O(log(n))), log(n) is proving to be faster than the worst-case hash table. I am looking into the possibility that a better, custom-specified hash function would improve performance on the hash table. Alternatively, it may be that frequent table resizing is slowing down the hash table (unordered_set) implementation. This could be remedied by pre-allocating a larger amount of memory.

Here's a quick rundown of the differences in implementations between set and unordered_set:
https://www.geeksforgeeks.org/set-vs-unordered_set-c-stl/

xswap/preprocessing.py

* Add roaring bitmaps * Add tests for roaring bitset, rename hash table to correct, bitset

zietzm · 2019-01-18T14:57:19Z

cd5fea4e0958f32812f1919109f798fea6c1b82f closes #7

Move files, wrap HashTable

a9c32d1

zietzm added 3 commits December 27, 2018 22:16

Fix travis src path

ba505eb

Fix test include path

ad0d560

Move cantor pairing function

7caf3c7

dhimmel reviewed Dec 29, 2018

View reviewed changes

xswap/preprocessing.py Outdated Show resolved Hide resolved

xswap/preprocessing.py Outdated Show resolved Hide resolved

xswap/preprocessing.py Outdated Show resolved Hide resolved

xswap/preprocessing.py Outdated Show resolved Hide resolved

xswap/preprocessing.py Show resolved Hide resolved

zietzm added 2 commits December 30, 2018 20:03

Update preprocessing file, hash table

5914724

Use roaring bitset for larger set

cd5fea4

* Add roaring bitmaps * Add tests for roaring bitset, rename hash table to correct, bitset

zietzm added 4 commits January 18, 2019 16:35

Fix pointer error

91f3e77

version 0.0.2 for testing

00ebc2b

Attempt to fix memory leak

c7cbec4

Rename hash table to bitset

e2ba512

zietzm changed the title ~~[WIP]: Expand swap capability and reorganize package~~ Expand swap capability and reorganize package Feb 22, 2019

zietzm merged commit 59e6c0c into hetio:master Mar 9, 2019

zietzm deleted the hashtable-size branch March 9, 2019 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand swap capability and reorganize package #6

Expand swap capability and reorganize package #6

zietzm commented Dec 28, 2018

zietzm commented Dec 28, 2018 •

edited

Loading

dhimmel commented Dec 29, 2018

zietzm commented Dec 29, 2018 •

edited

Loading

zietzm commented Jan 18, 2019

Expand swap capability and reorganize package #6

Expand swap capability and reorganize package #6

Conversation

zietzm commented Dec 28, 2018

zietzm commented Dec 28, 2018 • edited Loading

dhimmel commented Dec 29, 2018

zietzm commented Dec 29, 2018 • edited Loading

zietzm commented Jan 18, 2019

zietzm commented Dec 28, 2018 •

edited

Loading

zietzm commented Dec 29, 2018 •

edited

Loading