-
Notifications
You must be signed in to change notification settings - Fork 118
Graph Types
If the highest performance is not required, it is easy to use NGT. However, to obtain the highest performance, it is necessary to know the difference between the two graph types of NGT.
-
ANNG (Approximate k-Nearest Neighbor Graph): An undirected and connected graph. An undirected graph consists of undirected (bidirectional) edges. A connected graph is one in which there is a path between two arbitrary nodes. The numbers of outgoing and incoming edges are the same for each node. An ANNG is a basic NGT graph type.
-
ONNG (Optimized Nearest Neighbors Graph): A directed graph. A directed graph consists of directed edges. Since the edges in the graph are optimized to obtain high search performance, its search performance is higher than that of an ANNG. However, objects cannot be removed from an ONNG.
In addition to the two types of graphs, a tree-based index is also used to search.
Below is an example of constructing an ANNG.
ngt create -d 300 -D c -E 10 anng vecs.tsv
Here, -E specifies the number of edges for each appended node. Since an ANNG is constructed by incrementally appending a node with the same number of edges to the graph, the numbers of edges for existing nodes gradually increase, as more nodes are appended. Below is the flow of ANNG construction.
As the number of edges increases, construction time increases and search accuracy improves. Since search time increases at the same time, an appropriate number of edges should be set to obtain better performance. However, not all edges are always used to explore the graph. By default, 40 edges are used to balance search accuracy and time relatively for most datasets. -S specifies the number of edges to explore instead of the default value of 40 as follows.
ngt create -d 300 -D c -E 20 -S 50 anng
The number of edges specified with -S is a fixed value. However, it can be automatically and adequately calculated from search_range_coefficient of search when -2 is specified with -S as follows or -E is specified in the search command. The coefficients of the formula to calculate the number of edges are set by default.
ngt create -d 300 -D c -E 50 -S -2 anng
Since it is difficult to adjust these parameters and coefficients above to fit your dataset, there are optimization functions (Command/Python/Cpp) to automatically optimize these parameters.
Next is ONNG construction. As described above, an ONNG can be converted from an ANNG by using the following command.
ngt reconstruct-graph -m S -o 10 -i 100 anng onng
This command performs the following five steps.
- Graph construction
First, an initial ONNG without edges is a graph in which the only nodes are extracted from the specified ANNG. In the example above, 10 outgoing edges that are extracted for each node in the ANNG are added to the ONNG, and 100 outgoing edges extracted in the same way are reversed in direction and added to the ONNG. The following figure shows the graph construction.
-
Shortcut reduction (pass adjustment)
Unnecessary shortcut paths are removed to decrease edges (see this paper for details). -
Search coefficient optimization
The search coefficients to calculate the number of edges to explore are optimized . -
Memory prefetch optimization
Th size and offset for the memory prefetch during the search processing are optimized. -
Accuracy table generation
The accuracy table is a lookup table from accuracies to epsilons. When an expected accuracy is specified for search, the indispensable epsilon value for the actual search is converted from the specified accuracy with the table.
ONNG construction and optimization are completed with the above steps.
Command line tool
Python
C++