Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 976fcfb

Browse files
authoredMar 9, 2025··
Merge pull request #111 from stdgraph/comparison
update comparison
2 parents feb9dd2 + e730643 commit 976fcfb

File tree

3 files changed

+158
-44
lines changed

3 files changed

+158
-44
lines changed
 

‎D3126_Overview/tex/refs.bib

+44-1
Original file line numberDiff line numberDiff line change
@@ -131,4 +131,47 @@ @inproceedings{sutton2018optimizing
131131
pages={12--21},
132132
year={2018},
133133
organization={IEEE}
134-
}
134+
}
135+
136+
@misc{dimacs9th,
137+
key = {DIMACS},
138+
title = {9th {DIMACS} Implementation Challenge - {S}hortest Paths.},
139+
howpublished = {{\tt http://www.dis.uniroma1.it/challenge9/}},
140+
year = {2006}
141+
}
142+
143+
@article{Twitter,
144+
author = {Haewoon Kwak and Changhyun Lee and Hosung Park and Sue Moon},
145+
xxjournal = {International World Wide Web Conference (WWW)},
146+
journal = {{WWW}},
147+
title = {What is {Twitter}, a Social Network or a News Media?},
148+
year = {2010}
149+
}
150+
151+
@article{LAW1,
152+
author ="Paolo Boldi and Sebastiano Vigna",
153+
title = "The {W}eb{G}raph Framework {I}: Compression Techniques",
154+
year = 2004,
155+
xxjournal="International World Wide Web Conference (WWW)",
156+
journal = {WWW},
157+
xxaddress="Manhattan, USA",
158+
pages="595--601",
159+
xxpublisher="ACM Press"
160+
}
161+
162+
@inproceedings{Graph500,
163+
title = {Introducing the {G}raph 500},
164+
author={Murphy, Richard C. and Wheeler, Kyle B. and Barrett, Brian W and Ang, James A.},
165+
booktitle={Cray User's Group},
166+
year={2010},
167+
organization={CUG}
168+
}
169+
170+
@article{Erdos,
171+
author = {Paul Erd\H{o}s and Alfr\'{e}d R\'{e}nyi},
172+
journal = {Publicationes Mathematicae},
173+
title = {On Random Graphs. {I}},
174+
year = {1959},
175+
volume = {6},
176+
pages = {290--297}
177+
}

‎D3337_Comparison.pdf

39.5 KB
Binary file not shown.

‎D3337_Comparison/tex/comparison.tex

+114-43
Original file line numberDiff line numberDiff line change
@@ -5,29 +5,31 @@
55
\section{Syntax Comparison} \label{syntax}
66
We provide a usage syntax comparison of several graph algorithms
77
in Tier 1 of P3128 against the \textbf{boost::graph} equivalent.
8-
We refer to the refrence implementation associated with this proposal
9-
as \textbf{stdgraph}.
8+
We refer to the reference implementation associated with this proposal
9+
as \textbf{std::graph}.
1010
These algorithms are breadth-first search (BFS, Figure~\ref{fig:bfssyntax}),
1111
connected components (CC, Figure~\ref{fig:ccsyntax}),
1212
single sourced shortest paths (SSSP, Figure~\ref{fig:ssspsyntax}),
1313
and triangle counting (TC)(\ref{fig:tcsyntax}).
1414
We take these algorithms from the GAP Benchmark Suite~\cite{gapbs_2023}
1515
which we discuss more in Section~\ref{performance}.
16+
We also defer to Section~\ref{performance} any discussion of
17+
underlying implementation details.
1618

17-
Unlike \textbf{boost::graph}, \textbf{stdgraph} does not specify edge directedness
18-
as a graph property.
19-
If a graph in \textbf{stdgraph} implemented by \textbf{container::compressed\_graph}
19+
Unlike \textbf{boost::graph}, \textbf{std::graph} does not
20+
specify edge direction as a graph property.
21+
If a graph in \textbf{std::graph} implemented by \textbf{container::compressed\_graph}
2022
is undirected, then it will contain edges in both directions.
2123
\textbf{boost::graph} has a \textbf{boost::graph::undirectedS} property
2224
which can be used in the \textbf{boost::graph::adjacency\_matrix} class
23-
to specify an unidrected graph, but
25+
to specify an undirected graph, but
2426
not in the \textbf{boost::graph::compressed\_sparse\_row\_graph} class.
2527
Thus in Figures~\ref{fig:bfssyntax}-\ref{fig:tcsyntax}, the graph type always includes \textbf{boost::graph::directedS}.
26-
Similarly to \textbf{stdgraph}, undirected graphs must contain the edges in both directions.
28+
Similarly to \textbf{std::graph}, undirected graphs must contain the edges in both directions.
2729

2830
Intermediate data structures (i.e. edgelists) will be needed
2931
to construct the compressed graph structures.
30-
In order to focus on the differenes in algorithm syntax, we omit
32+
In order to focus on the differences in algorithm syntax, we omit
3133
code which populates the graph data structures.
3234
In the following sections we address the syntax changes for each of
3335
these algorithms.
@@ -43,7 +45,7 @@ \section{Syntax Comparison} \label{syntax}
4345
\lstinputlisting{D3337_Comparison/src/stdgraph_bfs.hpp}
4446
}
4547
\end{minipage}
46-
\caption{Breadth\-First Search Syntax Comparison}
48+
\caption{Breadth-First Search Syntax Comparison}
4749
\label{fig:bfssyntax}
4850
\end{figure}
4951
\begin{figure}[ht]
@@ -72,7 +74,7 @@ \section{Syntax Comparison} \label{syntax}
7274
\lstinputlisting{D3337_Comparison/src/stdgraph_sssp.hpp}
7375
}
7476
\end{minipage}
75-
\caption{Single Source Shortest Paths (Dijkstra) Syntax Comparison}
77+
\caption{Single Source Shortest Paths Syntax Comparison}
7678
\label{fig:ssspsyntax}
7779
\end{figure}
7880

@@ -87,11 +89,11 @@ \section{Syntax Comparison} \label{syntax}
8789
\lstinputlisting{D3337_Comparison/src/stdgraph_tc.hpp}
8890
}
8991
\end{minipage}
90-
\caption{TC Syntax Comparison}
92+
\caption{Triangle Counting Syntax Comparison}
9193
\label{fig:tcsyntax}
9294
\end{figure}
9395

94-
\subsection{BFS}
96+
\subsection{Breadth-First Search}
9597
BFS is often described as a graph algorithm, though a BFS traversal
9698
by itself does not actually perform any task.
9799
In reality, it is a data access pattern which specifies an order
@@ -105,44 +107,42 @@ \subsection{BFS}
105107

106108
This capability is very powerful but often cumbersome if the BFS traversal
107109
simply requires vertex and edge access upon visiting.
108-
For this reason stdgraph provides a simple, range-based-for loop BFS traversal
110+
For this reason \textbf{std::graph} provides a simple, range-based-for loop BFS traversal
109111
called a view.
110112
Figure~\ref{fig:bfssyntax} compares the most simple \textbf{boost::graph}
111113
BFS visitor against the range-based-for loop implementation.
112114
The authors of this proposal acknowledge that some power users still want
113115
the full customization provided by visitors,
114116
and we plan to add them to this proposal.
115117

116-
\subsection{CC}
118+
\subsection{Connected Components}
117119
There is very little difference in the connected component interfaces.
118120

119-
\subsection{SSSP}
120-
Of the four algorithms discussed here, only SSSP makes use of some edge property, in this case distance.
121+
\subsection{Single Source Shortest Paths}
122+
Of the four algorithms discussed here, only SSSP makes use of some
123+
edge property, in this case distance.
121124
Along with the input edge property, the algorithm also associates with
122125
every vertex a distance from the start vertex, and a predecessor
123126
vertex to store the shortest path.
124127
In Figure~\ref{fig:ssspsyntax} we see that \textbf{boost::graph} requires
125128
property maps to lookup edge and vertex properties.
126-
These property maps are tightly coupled with the graph data strucutres.
129+
These property maps are tightly coupled with the graph data structures.
127130
We propose properties be stored external to the graph.
128131
For edge properties we provide a weight lambda function to the algorithm
129132
to lookup distance from the \textbf{edge\_reference\_t}.
130133

131-
\subsection{TC}
134+
\subsection{Triangle Counting}
132135
\textbf{boost::graph} does not contain a global triangle counting
133-
similar to the one proposed by stdgraph.
136+
similar to the one proposed by \textbf{std::graph}.
134137
Instead we must iterate through the vertices counting the number of triangles
135138
on every vertex, and adjust for overcounting at the end.
136139

137-
138-
139-
140-
141-
140+
\clearpage
142141

143142
\section{Performance Comparison} \label{performance}
143+
\subsection{Experimental Setup}
144144
To evaluate the performance of this proposed library, we compare its reference implementation
145-
(stdgraph) against BGL and NWGraph on a subset of the GAP Benchmark Suite\cite{gapbs_2023}.
145+
(\textbf{std::graph}) against \textbf{boost::graph} and NWGraph on a subset of the GAP Benchmark Suite\cite{gapbs_2023}.
146146
This comparison includes four of the five GAP algorithms that are in the tier 1 algorithm list of this proposal:
147147
triangle counting (TC), weak connected components (CC), breadth-first search (BFS),
148148
and single-source shortest paths (SSSP).
@@ -157,11 +157,11 @@ \section{Performance Comparison} \label{performance}
157157
\begin{tabular}{c c c c c c c}
158158
Name & Description & \#Vertices & \#Edges & Degree & (Un)directed & References \\
159159
& & (M) & (M) & Distribution & & \\\hline
160-
road & USA road network & 23.9 & 57.7 & bounded & undirected & \\\hline
161-
Twitter & Twitter follower links & 61.6 & 1,468.4 & power & directed & \\\hline
162-
web & Web crawl of .sk domain & 50.6 & 1,930.3 & power & directed &\\\hline
163-
kron & Synthetic graph & 134.2 & 2,111.6 & power & undirected & \\\hline
164-
urand & Uniform random graph & 134.2 & 2,147.5 & normal & undirected & \\\hline
160+
road & USA road network & 23.9 & 57.7 & bounded & undirected & \cite{dimacs9th}\\\hline
161+
Twitter & Twitter follower links & 61.6 & 1,468.4 & power & directed & \cite{Twitter}\\\hline
162+
web & Web crawl of .sk domain & 50.6 & 1,930.3 & power & directed & \cite{LAW1}\\\hline
163+
kron & Synthetic graph & 134.2 & 2,111.6 & power & undirected & \cite{Graph500} \\\hline
164+
urand & Uniform random graph & 134.2 & 2,147.5 & normal & undirected & \cite{Erdos}\\\hline
165165
\end{tabular}
166166
\caption{Summary of GAP Benchmark Graphs}
167167
\label{tab:gap_graphs}
@@ -172,37 +172,108 @@ \section{Performance Comparison} \label{performance}
172172
To simplify experimental setup, we rerun these new experiments using the same machine used in\cite{REF_nwgraph_library},
173173
(compute nodes consisting of two Intel® Xeon® Gold 6230 processors, each with 20 physical cores running at 2.1 GHz,
174174
and 188GB of memory per processor).
175-
NWGraph and stdgraph were compiled with gcc 13.2 using -Ofast -march=native compilation flags.
175+
NWGraph and \textbf{std::graph} were compiled with gcc 13.2 using -Ofast -march=native compilation flags.
176176

177-
Even though NWGraph contains an implmentation of Dijkstra, the SSSP results in \cite{REF_nwgraph_library}
178-
were based on delta-stepping. For this comparison, stdgraph and NWgraph both use Dijkstra.
179-
The NWGraph and stdgraph implementation of CC is based on the Afforest \cite{sutton2018optimizing} algorithm.
180-
While BFS and SSSP implementations are very similar for NWGraph and stdgraph, the latter contains
181-
support for event-based visitors, and it is immportant to make sure this does not incur a performance penalty.
182-
Table~\ref{tab:performance_numbers} summarizes our GAP benchmark results for stdgraph compared to BGL and NWGraph.
177+
Even though NWGraph contains an implementation of Dijkstra, the SSSP results in \cite{REF_nwgraph_library}
178+
were based on delta-stepping. For this comparison, \textbf{std::graph} and NWgraph both use Dijkstra.
179+
The NWGraph and \textbf{std::graph} implementation of CC is based on the Afforest \cite{sutton2018optimizing} algorithm.
180+
While BFS and SSSP implementations are very similar for NWGraph and \textbf{std::graph}, the latter contains
181+
support for event-based visitors.
182+
If this functionality is not required it should be optimized out and not
183+
incur a performance penalty,
184+
but we seek to verify this experimentally.
185+
NWGraph and \textbf{std::graph} contain similar implementations of triangle
186+
counting which perform a set intersection of the neighbor list of vertices
187+
$u$ and $v$, only if $v$ is a neighbor of $u$.
188+
By first performing a lexicographic sort of the vertex ids of the adjacency
189+
structure, the set intersection is limited to neighbors with vertex ids greater
190+
than $u$ and $v$, or equivalently the upper triangular portion of the adjacency
191+
matrix.
192+
Table~\ref{tab:performance_numbers} summarizes our GAP benchmark results for \textbf{std::graph} compared to \textbf{boost::graph} and NWGraph.
183193

184194
\begin{table}[h!]
185195
\centering
186196
\begin{tabular}{ c c c c c c c }
187197
Algorithm & Library & road & twitter & kron & web & urand \\
188198
\hline
189-
\multirow{3}{*}{BFS} & BGL & 1.09s & 12.11s & 54.80s & 5.52s & 73.26s \\
199+
\multirow{3}{*}{BFS} & \textbf{boost::graph} & 1.09s & 12.11s & 54.80s & 5.52s & 73.26s \\
190200
& NWGraph & 0.91s & 11.25s & 38.86s & 2.37s & 64.63s \\
191-
& stdgraph & 1.39s & 8.54s & 16.34s & 3.52s & 62.75s \\
201+
& \textbf{std::graph} & 1.39s & 8.54s & 16.34s & 3.52s & 62.75s \\
192202
\hline
193203
\multirow{3}{*}{CC} & BGL & 1.36s & 21.96s & 81.18s & 6.64s & 134.23s \\
194204
& NWGraph & 1.05s & 3.77s & 10.16s & 3.04s & 36.59s \\
195-
& stdgraph & 0.78s & 2.81s & 8.37s & 2.23s & 33.75s \\
205+
& \textbf{std::graph} & 0.78s & 2.81s & 8.37s & 2.23s & 33.75s \\
196206
\hline
197207
\multirow{3}{*}{SSSP} & BGL & 4.03s & 47.89s & 167.20s & 28.29s & OOM \\
198208
& NWGraph & 3.63s & 109.37s & 344.12s & 35.58s & 400.23s \\
199-
& stdgraph & 4.22s & 79.75s & 211.37s & 33.87s & 493.15s \\
209+
& \textbf{std::graph} & 4.22s & 79.75s & 211.37s & 33.87s & 493.15s \\
200210
\hline
201211
\multirow{3}{*}{TC} & BGL & 1.34s & >24H & >24H & >24H & 4425.54s \\
202212
& NWGraph & 0.41s & 1327.63s & 6840.38s & 131.47s & 387.53s \\
203-
& stdgraph & 0.17s & 459.08s & 2357.95s & 50.04s & 191.36s \\
213+
& \textbf{std::graph} & 0.17s & 459.08s & 2357.95s & 50.04s & 191.36s \\
204214
\hline
205215
\end{tabular}
206-
\caption{GAP Benchmark Performance: Time for GAP benchmark algorithms is shown for Boost Graph Library, NWGraph, and this proposal's reference implementation (stdgraph)}
216+
\caption{GAP Benchmark Performance: Time for GAP benchmark algorithms is shown for \textbf{boost::graph}, NWGraph, \textbf{std::graph}}
207217
\label{tab:performance_numbers}
208218
\end{table}
219+
220+
\subsection{Experimental Analysis}
221+
BFS results are consistent between the three implementations,
222+
except for the kron graph where \textbf{std::graph} is 2.4x faster
223+
than NWGraph and 3.4x faster than \textbf{boost::graph}.
224+
225+
CC results are consistent between NWGraph and \textbf{std::graph}, which
226+
are both much faster than \textbf{boost::graph} on twitter, kron, and urand.
227+
This is reasonable as \textbf{boost::graph} is using a simple breadth-first
228+
search based CC algorithm while the other two implementations use the
229+
Afforest algorithm.
230+
Of the four algorithms, CC shows the closest agreement between NWGraph
231+
and \textbf{std::graph}.
232+
233+
SSSP results are more mixed, with differing performance on twitter and kron.
234+
Interestingly of the algorithms we profile, this is the only one where
235+
\textbf{boost::graph} is often faster than the other implementations,
236+
faster than \textbf{std::graph} by 1.7x on twitter and 1.3x on kron, though
237+
failing by running out of memory on urand.
238+
239+
TC performance from the na\"ive \textbf{boost::graph} implementation
240+
is far slower than the adjacency matrix set intersection used by NWGraph
241+
and \textbf{std::graph}.
242+
Since the same triangle is counted 6 times in \textbf{boost::graph},
243+
we expect at least that much of a slowdown, but in fact the slowdown
244+
is often much worse.
245+
However the TC results are concerning because the \textbf{std::graph}
246+
performance is around 2x that of NWGraph.
247+
We plan to review the implementation details to discover the cause of
248+
this discrepancy.
249+
250+
\section{Memory Allocation}
251+
Unlike existing STL algorithms, the graph algorithms we propose here
252+
will often require their own memory allocations.
253+
Table~\ref{tab:internalmem} records the internal memory allocations
254+
required for our implementations of the GAP Benchmark algorithms
255+
where relevant.
256+
It is important to note that the memory usage is not prescribed
257+
by the algorithm interface in P3128, and is ultimately up to the
258+
library implementer.
259+
Some memory use, such as the queues in BFS and SSSP, will
260+
probably be common to most implementations.
261+
However, the color map in BFS and the reindex map in CC
262+
(used to ensure the resulting component indices are contiguous)
263+
could potentially be avoided.
264+
265+
\begin{table}[h!]
266+
\centering
267+
\begin{tabular}{| c | c | c |}
268+
\hline
269+
Algorithm & Required Member Data & Max Size \\\hline
270+
BFS & queue & $O(|V|)$ \\
271+
& color map & V \\\hline
272+
CC & reindex map & $O(|components|)$ \\\hline
273+
SSSP & priority queue & $O(|E|)$\\\hline
274+
TC & None & N\/A\\
275+
\hline
276+
\end{tabular}
277+
\caption{Memory Allocations of GAP Benchmark Algorithm Implementations}
278+
\label{tab:internalmem}
279+
\end{table}

0 commit comments

Comments
 (0)
Please sign in to comment.