You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/modules/ROOT/pages/machine-learning/node-embeddings/hashgnn.adoc
+17-8Lines changed: 17 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -35,14 +35,19 @@ For more information on this algorithm, see:
35
35
36
36
=== The algorithm
37
37
38
-
The first step of the algorithm is optional and transforms input features into binary features.
39
-
The HashGNN can only run on binary features, so this step is necessary.
40
-
Then for a number of iterations, a new binary embedding is computed for each node using the embeddings of the previous iteration.
38
+
The HashGNN algorithm can only run on binary features.
39
+
There is an optional first step to transform input features into binary features.
40
+
41
+
For a number of iterations, a new binary embedding is computed for each node using the embeddings of the previous iteration.
41
42
In the first iteration, the previous embeddings are the binary feature vectors.
42
-
Each node vector is constructed by taking `K` random samples.
43
+
44
+
During one iteration, each node embedding vector is constructed by taking `K` random samples.
43
45
The random sampling is carried out by successively selecting features with lowest min-hash values.
44
-
In this selection, both features of the same node and of the neighbors of the node are considered.
45
-
Hence, for each node, iteration and each `0 <= k < K` we sample a feature to add to the new embedding of the node, and we select either one of the node's own features or a feature from a neighbor.
46
+
Features of each node itself and of its neighbours are both considered.
47
+
48
+
There are three types of hash functions involved: 1) a function applied to a node's own features, 2) a function applied to a subset of neighbor's feature 3) a function applied to all neighbor's features to select the subset for hash function 2).
49
+
For each iteration and sampling round `k<K` new hash functions are used, and the third function also varies by relationship type connecting to the neighbor.
50
+
46
51
The sampling is consistent in the sense that if nodes `a` and `b` are same or similar in terms of their features, the features of their neighbors and the relationship types connecting the neighbors, the samples for `a` and `b` are also same or similar.
47
52
The number `K` is called `embeddingDensity` in the configuration of the algorithm.
48
53
The algorithm ends with another optional step that maps the binary embeddings to dense vectors.
@@ -57,7 +62,8 @@ The graph structure is `a--b--c`.
57
62
We imagine running HashGNN for one iteration with `embeddingDensity=2`.
58
63
59
64
During the first iteration and `k=0`, we compute an embedding for `(a)`.
60
-
A hash value for `f1` turns out to be `7`. Since `(b)` is a neighbor, we generate a value for its feature `f2` and it becomes `11`.
65
+
A hash value for `f1` turns out to be `7`.
66
+
Since `(b)` is a neighbor, we generate a value for its feature `f2` and it becomes `11`.
61
67
The value `7` is sampled from a hash function which we call "one" and `11` from a hash function "two".
62
68
Thus `f1` is added to the new features for `(a)` since it has a smaller hash value.
63
69
We repeat for `k=1` and this time the hash values are `4` and `2`, so now `f2` is added as a feature to `(a)`.
@@ -74,7 +80,10 @@ We proceed similarily with `k=1` and `f1` is selected again.
74
80
Since the embeddings consist of binary features, this second addition has no effect.
75
81
76
82
We omit the details of computing the embedding of `(c)`.
77
-
Our result is that `(a)` has features `f1` and `f2` and `(b)` has only the feature `f1`.
83
+
84
+
After the 2 sampling rounds, the iteration is complete and since there is only one iteration, we are done.
85
+
Each node has a binary embedding that contains some subset of the original binary features.
86
+
In particular, `(a)` has features `f1` and `f2`, `(b)` has only the feature `f1`.
0 commit comments