5
5
\section {Syntax Comparison } \label {syntax }
6
6
We provide a usage syntax comparison of several graph algorithms
7
7
in Tier 1 of P3128 against the \textbf {boost::graph } equivalent.
8
- We refer to the refrence implementation associated with this proposal
9
- as \textbf {stdgraph }.
8
+ We refer to the reference implementation associated with this proposal
9
+ as \textbf {std::graph }.
10
10
These algorithms are breadth-first search (BFS, Figure~\ref {fig:bfssyntax }),
11
11
connected components (CC, Figure~\ref {fig:ccsyntax }),
12
12
single sourced shortest paths (SSSP, Figure~\ref {fig:ssspsyntax }),
13
13
and triangle counting (TC)(\ref {fig:tcsyntax }).
14
14
We take these algorithms from the GAP Benchmark Suite~\cite {gapbs_2023 }
15
15
which we discuss more in Section~\ref {performance }.
16
+ We also defer to Section~\ref {performance } any discussion of
17
+ underlying implementation details.
16
18
17
- Unlike \textbf {boost::graph }, \textbf {stdgraph } does not specify edge directedness
18
- as a graph property.
19
- If a graph in \textbf {stdgraph } implemented by \textbf {container::compressed\_ graph }
19
+ Unlike \textbf {boost::graph }, \textbf {std::graph } does not
20
+ specify edge direction as a graph property.
21
+ If a graph in \textbf {std::graph } implemented by \textbf {container::compressed\_ graph }
20
22
is undirected, then it will contain edges in both directions.
21
23
\textbf {boost::graph } has a \textbf {boost::graph::undirectedS } property
22
24
which can be used in the \textbf {boost::graph::adjacency\_ matrix } class
23
- to specify an unidrected graph, but
25
+ to specify an undirected graph, but
24
26
not in the \textbf {boost::graph::compressed\_ sparse\_ row\_ graph } class.
25
27
Thus in Figures~\ref {fig:bfssyntax }-\ref {fig:tcsyntax }, the graph type always includes \textbf {boost::graph::directedS }.
26
- Similarly to \textbf {stdgraph }, undirected graphs must contain the edges in both directions.
28
+ Similarly to \textbf {std::graph }, undirected graphs must contain the edges in both directions.
27
29
28
30
Intermediate data structures (i.e. edgelists) will be needed
29
31
to construct the compressed graph structures.
30
- In order to focus on the differenes in algorithm syntax, we omit
32
+ In order to focus on the differences in algorithm syntax, we omit
31
33
code which populates the graph data structures.
32
34
In the following sections we address the syntax changes for each of
33
35
these algorithms.
@@ -43,7 +45,7 @@ \section{Syntax Comparison} \label{syntax}
43
45
\lstinputlisting {D3337_Comparison/src/stdgraph_bfs.hpp}
44
46
}
45
47
\end {minipage }
46
- \caption {Breadth\ - First Search Syntax Comparison}
48
+ \caption {Breadth-First Search Syntax Comparison}
47
49
\label {fig:bfssyntax }
48
50
\end {figure }
49
51
\begin {figure }[ht]
@@ -72,7 +74,7 @@ \section{Syntax Comparison} \label{syntax}
72
74
\lstinputlisting {D3337_Comparison/src/stdgraph_sssp.hpp}
73
75
}
74
76
\end {minipage }
75
- \caption {Single Source Shortest Paths (Dijkstra) Syntax Comparison}
77
+ \caption {Single Source Shortest Paths Syntax Comparison}
76
78
\label {fig:ssspsyntax }
77
79
\end {figure }
78
80
@@ -87,11 +89,11 @@ \section{Syntax Comparison} \label{syntax}
87
89
\lstinputlisting {D3337_Comparison/src/stdgraph_tc.hpp}
88
90
}
89
91
\end {minipage }
90
- \caption {TC Syntax Comparison}
92
+ \caption {Triangle Counting Syntax Comparison}
91
93
\label {fig:tcsyntax }
92
94
\end {figure }
93
95
94
- \subsection {BFS }
96
+ \subsection {Breadth-First Search }
95
97
BFS is often described as a graph algorithm, though a BFS traversal
96
98
by itself does not actually perform any task.
97
99
In reality, it is a data access pattern which specifies an order
@@ -105,44 +107,42 @@ \subsection{BFS}
105
107
106
108
This capability is very powerful but often cumbersome if the BFS traversal
107
109
simply requires vertex and edge access upon visiting.
108
- For this reason stdgraph provides a simple, range-based-for loop BFS traversal
110
+ For this reason \textbf { std::graph } provides a simple, range-based-for loop BFS traversal
109
111
called a view.
110
112
Figure~\ref {fig:bfssyntax } compares the most simple \textbf {boost::graph }
111
113
BFS visitor against the range-based-for loop implementation.
112
114
The authors of this proposal acknowledge that some power users still want
113
115
the full customization provided by visitors,
114
116
and we plan to add them to this proposal.
115
117
116
- \subsection {CC }
118
+ \subsection {Connected Components }
117
119
There is very little difference in the connected component interfaces.
118
120
119
- \subsection {SSSP }
120
- Of the four algorithms discussed here, only SSSP makes use of some edge property, in this case distance.
121
+ \subsection {Single Source Shortest Paths }
122
+ Of the four algorithms discussed here, only SSSP makes use of some
123
+ edge property, in this case distance.
121
124
Along with the input edge property, the algorithm also associates with
122
125
every vertex a distance from the start vertex, and a predecessor
123
126
vertex to store the shortest path.
124
127
In Figure~\ref {fig:ssspsyntax } we see that \textbf {boost::graph } requires
125
128
property maps to lookup edge and vertex properties.
126
- These property maps are tightly coupled with the graph data strucutres .
129
+ These property maps are tightly coupled with the graph data structures .
127
130
We propose properties be stored external to the graph.
128
131
For edge properties we provide a weight lambda function to the algorithm
129
132
to lookup distance from the \textbf {edge\_ reference\_ t }.
130
133
131
- \subsection {TC }
134
+ \subsection {Triangle Counting }
132
135
\textbf {boost::graph } does not contain a global triangle counting
133
- similar to the one proposed by stdgraph .
136
+ similar to the one proposed by \textbf { std::graph } .
134
137
Instead we must iterate through the vertices counting the number of triangles
135
138
on every vertex, and adjust for overcounting at the end.
136
139
137
-
138
-
139
-
140
-
141
-
140
+ \clearpage
142
141
143
142
\section {Performance Comparison } \label {performance }
143
+ \subsection {Experimental Setup }
144
144
To evaluate the performance of this proposed library, we compare its reference implementation
145
- (stdgraph ) against BGL and NWGraph on a subset of the GAP Benchmark Suite\cite {gapbs_2023 }.
145
+ (\textbf { std::graph } ) against \textbf { boost::graph } and NWGraph on a subset of the GAP Benchmark Suite\cite {gapbs_2023 }.
146
146
This comparison includes four of the five GAP algorithms that are in the tier 1 algorithm list of this proposal:
147
147
triangle counting (TC), weak connected components (CC), breadth-first search (BFS),
148
148
and single-source shortest paths (SSSP).
@@ -157,11 +157,11 @@ \section{Performance Comparison} \label{performance}
157
157
\begin {tabular }{c c c c c c c}
158
158
Name & Description & \# Vertices & \# Edges & Degree & (Un)directed & References \\
159
159
& & (M) & (M) & Distribution & & \\\hline
160
- road & USA road network & 23.9 & 57.7 & bounded & undirected & \\\hline
161
- Twitter & Twitter follower links & 61.6 & 1,468.4 & power & directed & \\\hline
162
- web & Web crawl of .sk domain & 50.6 & 1,930.3 & power & directed &\\\hline
163
- kron & Synthetic graph & 134.2 & 2,111.6 & power & undirected & \\\hline
164
- urand & Uniform random graph & 134.2 & 2,147.5 & normal & undirected & \\\hline
160
+ road & USA road network & 23.9 & 57.7 & bounded & undirected & \cite { dimacs9th }\ \\hline
161
+ Twitter & Twitter follower links & 61.6 & 1,468.4 & power & directed & \cite { Twitter }\ \\hline
162
+ web & Web crawl of .sk domain & 50.6 & 1,930.3 & power & directed & \cite { LAW1 } \\\hline
163
+ kron & Synthetic graph & 134.2 & 2,111.6 & power & undirected & \cite { Graph500 } \ \\hline
164
+ urand & Uniform random graph & 134.2 & 2,147.5 & normal & undirected & \cite { Erdos }\ \\hline
165
165
\end {tabular }
166
166
\caption {Summary of GAP Benchmark Graphs}
167
167
\label {tab:gap_graphs }
@@ -172,37 +172,108 @@ \section{Performance Comparison} \label{performance}
172
172
To simplify experimental setup, we rerun these new experiments using the same machine used in\cite {REF_nwgraph_library },
173
173
(compute nodes consisting of two Intel® Xeon® Gold 6230 processors, each with 20 physical cores running at 2.1 GHz,
174
174
and 188GB of memory per processor).
175
- NWGraph and stdgraph were compiled with gcc 13.2 using -Ofast -march=native compilation flags.
175
+ NWGraph and \textbf { std::graph } were compiled with gcc 13.2 using -Ofast -march=native compilation flags.
176
176
177
- Even though NWGraph contains an implmentation of Dijkstra, the SSSP results in \cite {REF_nwgraph_library }
178
- were based on delta-stepping. For this comparison, stdgraph and NWgraph both use Dijkstra.
179
- The NWGraph and stdgraph implementation of CC is based on the Afforest \cite {sutton2018optimizing } algorithm.
180
- While BFS and SSSP implementations are very similar for NWGraph and stdgraph, the latter contains
181
- support for event-based visitors, and it is immportant to make sure this does not incur a performance penalty.
182
- Table~\ref {tab:performance_numbers } summarizes our GAP benchmark results for stdgraph compared to BGL and NWGraph.
177
+ Even though NWGraph contains an implementation of Dijkstra, the SSSP results in \cite {REF_nwgraph_library }
178
+ were based on delta-stepping. For this comparison, \textbf {std::graph } and NWgraph both use Dijkstra.
179
+ The NWGraph and \textbf {std::graph } implementation of CC is based on the Afforest \cite {sutton2018optimizing } algorithm.
180
+ While BFS and SSSP implementations are very similar for NWGraph and \textbf {std::graph }, the latter contains
181
+ support for event-based visitors.
182
+ If this functionality is not required it should be optimized out and not
183
+ incur a performance penalty,
184
+ but we seek to verify this experimentally.
185
+ NWGraph and \textbf {std::graph } contain similar implementations of triangle
186
+ counting which perform a set intersection of the neighbor list of vertices
187
+ $ u$ and $ v$ , only if $ v$ is a neighbor of $ u$ .
188
+ By first performing a lexicographic sort of the vertex ids of the adjacency
189
+ structure, the set intersection is limited to neighbors with vertex ids greater
190
+ than $ u$ and $ v$ , or equivalently the upper triangular portion of the adjacency
191
+ matrix.
192
+ Table~\ref {tab:performance_numbers } summarizes our GAP benchmark results for \textbf {std::graph } compared to \textbf {boost::graph } and NWGraph.
183
193
184
194
\begin {table }[h!]
185
195
\centering
186
196
\begin {tabular }{ c c c c c c c }
187
197
Algorithm & Library & road & twitter & kron & web & urand \\
188
198
\hline
189
- \multirow {3}{*}{BFS} & BGL & 1.09s & 12.11s & 54.80s & 5.52s & 73.26s \\
199
+ \multirow {3}{*}{BFS} & \textbf { boost::graph } & 1.09s & 12.11s & 54.80s & 5.52s & 73.26s \\
190
200
& NWGraph & 0.91s & 11.25s & 38.86s & 2.37s & 64.63s \\
191
- & stdgraph & 1.39s & 8.54s & 16.34s & 3.52s & 62.75s \\
201
+ & \textbf { std::graph } & 1.39s & 8.54s & 16.34s & 3.52s & 62.75s \\
192
202
\hline
193
203
\multirow {3}{*}{CC} & BGL & 1.36s & 21.96s & 81.18s & 6.64s & 134.23s \\
194
204
& NWGraph & 1.05s & 3.77s & 10.16s & 3.04s & 36.59s \\
195
- & stdgraph & 0.78s & 2.81s & 8.37s & 2.23s & 33.75s \\
205
+ & \textbf { std::graph } & 0.78s & 2.81s & 8.37s & 2.23s & 33.75s \\
196
206
\hline
197
207
\multirow {3}{*}{SSSP} & BGL & 4.03s & 47.89s & 167.20s & 28.29s & OOM \\
198
208
& NWGraph & 3.63s & 109.37s & 344.12s & 35.58s & 400.23s \\
199
- & stdgraph & 4.22s & 79.75s & 211.37s & 33.87s & 493.15s \\
209
+ & \textbf { std::graph } & 4.22s & 79.75s & 211.37s & 33.87s & 493.15s \\
200
210
\hline
201
211
\multirow {3}{*}{TC} & BGL & 1.34s & >24H & >24H & >24H & 4425.54s \\
202
212
& NWGraph & 0.41s & 1327.63s & 6840.38s & 131.47s & 387.53s \\
203
- & stdgraph & 0.17s & 459.08s & 2357.95s & 50.04s & 191.36s \\
213
+ & \textbf { std::graph } & 0.17s & 459.08s & 2357.95s & 50.04s & 191.36s \\
204
214
\hline
205
215
\end {tabular }
206
- \caption {GAP Benchmark Performance: Time for GAP benchmark algorithms is shown for Boost Graph Library , NWGraph, and this proposal's reference implementation (stdgraph) }
216
+ \caption {GAP Benchmark Performance: Time for GAP benchmark algorithms is shown for \textbf { boost::graph } , NWGraph, \textbf { std::graph } }
207
217
\label {tab:performance_numbers }
208
218
\end {table }
219
+
220
+ \subsection {Experimental Analysis }
221
+ BFS results are consistent between the three implementations,
222
+ except for the kron graph where \textbf {std::graph } is 2.4x faster
223
+ than NWGraph and 3.4x faster than \textbf {boost::graph }.
224
+
225
+ CC results are consistent between NWGraph and \textbf {std::graph }, which
226
+ are both much faster than \textbf {boost::graph } on twitter, kron, and urand.
227
+ This is reasonable as \textbf {boost::graph } is using a simple breadth-first
228
+ search based CC algorithm while the other two implementations use the
229
+ Afforest algorithm.
230
+ Of the four algorithms, CC shows the closest agreement between NWGraph
231
+ and \textbf {std::graph }.
232
+
233
+ SSSP results are more mixed, with differing performance on twitter and kron.
234
+ Interestingly of the algorithms we profile, this is the only one where
235
+ \textbf {boost::graph } is often faster than the other implementations,
236
+ faster than \textbf {std::graph } by 1.7x on twitter and 1.3x on kron, though
237
+ failing by running out of memory on urand.
238
+
239
+ TC performance from the na\" ive \textbf {boost::graph } implementation
240
+ is far slower than the adjacency matrix set intersection used by NWGraph
241
+ and \textbf {std::graph }.
242
+ Since the same triangle is counted 6 times in \textbf {boost::graph },
243
+ we expect at least that much of a slowdown, but in fact the slowdown
244
+ is often much worse.
245
+ However the TC results are concerning because the \textbf {std::graph }
246
+ performance is around 2x that of NWGraph.
247
+ We plan to review the implementation details to discover the cause of
248
+ this discrepancy.
249
+
250
+ \section {Memory Allocation }
251
+ Unlike existing STL algorithms, the graph algorithms we propose here
252
+ will often require their own memory allocations.
253
+ Table~\ref {tab:internalmem } records the internal memory allocations
254
+ required for our implementations of the GAP Benchmark algorithms
255
+ where relevant.
256
+ It is important to note that the memory usage is not prescribed
257
+ by the algorithm interface in P3128, and is ultimately up to the
258
+ library implementer.
259
+ Some memory use, such as the queues in BFS and SSSP, will
260
+ probably be common to most implementations.
261
+ However, the color map in BFS and the reindex map in CC
262
+ (used to ensure the resulting component indices are contiguous)
263
+ could potentially be avoided.
264
+
265
+ \begin {table }[h!]
266
+ \centering
267
+ \begin {tabular }{| c | c | c |}
268
+ \hline
269
+ Algorithm & Required Member Data & Max Size \\\hline
270
+ BFS & queue & $ O(|V|)$ \\
271
+ & color map & V \\\hline
272
+ CC & reindex map & $ O(|components|)$ \\\hline
273
+ SSSP & priority queue & $ O(|E|)$ \\\hline
274
+ TC & None & N\/ A\\
275
+ \hline
276
+ \end {tabular }
277
+ \caption {Memory Allocations of GAP Benchmark Algorithm Implementations}
278
+ \label {tab:internalmem }
279
+ \end {table }
0 commit comments