Skip to content

Commit 75d5667

Browse files
breakanalysisbrs96
andcommitted
Touch up docs more
Co-authored-by: Brian Shi <[email protected]>
1 parent e85bd0f commit 75d5667

File tree

1 file changed

+17
-8
lines changed
  • doc/modules/ROOT/pages/machine-learning/node-embeddings

1 file changed

+17
-8
lines changed

doc/modules/ROOT/pages/machine-learning/node-embeddings/hashgnn.adoc

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,19 @@ For more information on this algorithm, see:
3535

3636
=== The algorithm
3737

38-
The first step of the algorithm is optional and transforms input features into binary features.
39-
The HashGNN can only run on binary features, so this step is necessary.
40-
Then for a number of iterations, a new binary embedding is computed for each node using the embeddings of the previous iteration.
38+
The HashGNN algorithm can only run on binary features.
39+
There is an optional first step to transform input features into binary features.
40+
41+
For a number of iterations, a new binary embedding is computed for each node using the embeddings of the previous iteration.
4142
In the first iteration, the previous embeddings are the binary feature vectors.
42-
Each node vector is constructed by taking `K` random samples.
43+
44+
During one iteration, each node embedding vector is constructed by taking `K` random samples.
4345
The random sampling is carried out by successively selecting features with lowest min-hash values.
44-
In this selection, both features of the same node and of the neighbors of the node are considered.
45-
Hence, for each node, iteration and each `0 <= k < K` we sample a feature to add to the new embedding of the node, and we select either one of the node's own features or a feature from a neighbor.
46+
Features of each node itself and of its neighbours are both considered.
47+
48+
There are three types of hash functions involved: 1) a function applied to a node's own features, 2) a function applied to a subset of neighbor's feature 3) a function applied to all neighbor's features to select the subset for hash function 2).
49+
For each iteration and sampling round `k<K` new hash functions are used, and the third function also varies by relationship type connecting to the neighbor.
50+
4651
The sampling is consistent in the sense that if nodes `a` and `b` are same or similar in terms of their features, the features of their neighbors and the relationship types connecting the neighbors, the samples for `a` and `b` are also same or similar.
4752
The number `K` is called `embeddingDensity` in the configuration of the algorithm.
4853
The algorithm ends with another optional step that maps the binary embeddings to dense vectors.
@@ -57,7 +62,8 @@ The graph structure is `a--b--c`.
5762
We imagine running HashGNN for one iteration with `embeddingDensity=2`.
5863

5964
During the first iteration and `k=0`, we compute an embedding for `(a)`.
60-
A hash value for `f1` turns out to be `7`. Since `(b)` is a neighbor, we generate a value for its feature `f2` and it becomes `11`.
65+
A hash value for `f1` turns out to be `7`.
66+
Since `(b)` is a neighbor, we generate a value for its feature `f2` and it becomes `11`.
6167
The value `7` is sampled from a hash function which we call "one" and `11` from a hash function "two".
6268
Thus `f1` is added to the new features for `(a)` since it has a smaller hash value.
6369
We repeat for `k=1` and this time the hash values are `4` and `2`, so now `f2` is added as a feature to `(a)`.
@@ -74,7 +80,10 @@ We proceed similarily with `k=1` and `f1` is selected again.
7480
Since the embeddings consist of binary features, this second addition has no effect.
7581

7682
We omit the details of computing the embedding of `(c)`.
77-
Our result is that `(a)` has features `f1` and `f2` and `(b)` has only the feature `f1`.
83+
84+
After the 2 sampling rounds, the iteration is complete and since there is only one iteration, we are done.
85+
Each node has a binary embedding that contains some subset of the original binary features.
86+
In particular, `(a)` has features `f1` and `f2`, `(b)` has only the feature `f1`.
7887

7988
=== Features
8089

0 commit comments

Comments
 (0)