New implementations of kernel 2 and 3 for graph500 #388

moazzammoriani · 2022-08-26T14:37:27Z

Introduction

This PR is a follow-up to #371 . There are 4 major addtions to graph500 in this PR.

Search Keys Sampler
Kernel 1 update
Kernel 2
kernel 3

Search Keys Sampler

The search key sampler produces 64 sample search key for vertices that kernel 2 and kernel 3 will start at. These sample search keys are chosen such that they have no self-loops. Given that sparse.data has been produced by kernel 1, the sample search key data is produced as follows.

$ sampleSearchKeys.exe sparse.data -o samples.data

Kernel 1 Update

With this PR, we can now output the sparse graph produced by kernel 1 to a file. To do so we use the -o flag followed by the name of the output file. For example, the following command will convert the edge representation in edges.data to an adjacency-list sparse graph representation in sparse.data.

$ kernel1_run*.exe edges.data -o sparse.data

Note About running `kernel[2,3]_run.exe` and `kernel[2,3]_run_multicore.exe`

In order to run the executable wrappers for kernel 2 and 3 we have to provide sparse.data and samples.data as inputs in a specific order. The following example with sequential kernel 2 make this clear.

$ kernel2_run.exe sparse.data samples.data

For multicore kernel 2 we would do the same thing but we can also specify the number of domains using the flag -ndmoains followed by the number of domains as follows.

$ kernel2_run_multicore.exe sparse.data samples.data -ndomains 4

kernel 3 is obviously used in the same way.

Note: The output flag is important in kernel 1 because the sparse graph representation in sparse.data has to be fed into kernel 2 and kernel 3. The outputs of kernel 2 and kernel 3 don't need to be used for now so their executables do not have an optional output flag.

Performance

Note: I have a four-core machine.

Kernel 2

Kernel 2 has both a parallel and a sequential version. The difference between the sequential and parallel versions is how the samples array is operated upon. It's operated upon sequentially and in parallel respectively. I ran sequential kernel 2 on my machine with a graph of scale=20. This produced the following results.

My machine

$ time ./kernel2.sh
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 115.834836

real	1m58.724s
user	1m56.979s
sys	0m1.530s

The results for the parallel version of kernel 2 on the same graph were the following.

$ time ./kernel2Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 66.009398

real	1m10.435s
user	4m14.144s
sys	0m2.684s

Godel

I omitted the sequential version on godel because I didn't think it would be really useful here.

$ ./run_all.sh ./kernel2Par.sh
++ exe=./kernel2Par.sh
++ arg=
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 1
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 113.039523

real	1m59.303s
user	1m55.958s
sys	0m3.344s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 2
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 62.282622

real	1m8.525s
user	2m5.894s
sys	0m3.483s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 35.300615

real	0m41.548s
user	2m19.354s
sys	0m3.508s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 8
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 20.421836

real	0m26.662s
user	2m32.384s
sys	0m3.625s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 12
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 14.864902

real	0m21.116s
user	2m43.890s
sys	0m3.905s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 16
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 11.312145

real	0m17.609s
user	2m53.568s
sys	0m4.416s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 20
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 10.870812

real	0m17.232s
user	3m6.546s
sys	0m4.833s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 24
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 9.209034

real	0m15.588s
user	3m16.617s
sys	0m5.467s

Kernel 3

Currently kernel 3 runs in quadratic time but I think it can be improved by using a priority-queue data structure in place of the current data structure from which the min can only be extracted in linear time. Because of this, I have chosen to let scale=15 for the tests below.

The difference between the sequential and parallel versions of kernel 3 is essentially the same as in kernel 2.

My machine

$ time ./kernel3.sh
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 319.742707

real	5m19.883s
user	5m18.790s
sys	0m0.094s

$ time ./kernel3Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 95.310967

real	1m35.426s
user	6m19.609s
sys	0m0.071s

Godel

Omitted the sequential version here too.

$ ./run_all.sh ./kernel3Par.sh
++ exe=./kernel3Par.sh
++ arg=
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 1
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 453.135992

real	7m33.319s
user	7m33.195s
sys	0m0.118s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 2
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 226.645206

real	3m46.820s
user	7m33.107s
sys	0m0.100s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 113.246789

real	1m53.422s
user	7m32.934s
sys	0m0.104s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 8
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 56.706133

real	0m56.884s
user	7m32.906s
sys	0m0.128s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 12
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 42.408656

real	0m42.586s
user	7m39.685s
sys	0m0.128s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 16
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 28.381341

real	0m28.562s
user	7m33.359s
sys	0m0.156s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 20
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 28.284908

real	0m28.469s
user	7m39.960s
sys	0m0.168s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 24
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 21.231765

real	0m21.421s
user	7m39.853s
sys	0m0.317s

polytypic · 2023-03-02T16:22:11Z

benchmarks/graph500seq/kernel3Seq.ml

+let rec run_dijkstra g v intree parents (distance : float array) =
+    match (is_tree_complete intree v) with 
+    | true -> (parents, distance)
+    | false -> begin 


I think an if-then-else would be cleaner here than match ... with true -> ... | false -> ....

if is_tree_complete intree v then (parents, distance) else begin ... end

polytypic · 2023-03-02T16:32:44Z

benchmarks/graph500seq/fileHandler.mli

@@ -0,0 +1,3 @@
+val to_file : filename:string -> 'a -> unit
+
+val from_file : string -> 'a


I believe the Marshal usage came from previous work, but it is rather unsafe as it stands, because it doesn't (and cannot) check that the value returned is of the correct type. You could make it safer in a number of ways. I'd perhaps recommend finding a library that offers a type safe version of marshaling/serialization/pickling.

polytypic · 2023-03-02T16:35:22Z

benchmarks/graph500seq/kernel2Seq.ml

+    let nvertices = SparseGraphSeq.num_vertices g in
+    let parent_arr = Array.init nvertices (fun _ -> -1) in
+    let discovery_arr = Array.init nvertices (fun _ -> false) in
+    let q = ref (Queue.create ()) in


This use of ref seems superfluous. This could be just let q = Queue.create () in ... and then references of !q can be replaced with q in the code that follows.

polytypic · 2023-03-02T16:49:27Z

benchmarks/graph500seq/kernel2_run.ml

+
+let () =
+    Random.self_init ();
+    Arg.parse [] (fun filename -> files := filename::!files) usage_msg;


For a very simple program like this, one could parse the command line arguments using a match expression:

match Sys.argv with | [| _; sparse_graph_filename; samples_input_filename |] -> ... | _ -> Printf.eprintf "Usage: %s <...> <...>" (Filename.basename Sys.executable_name)

But there are also many libraries for parsing command line arguments that can e.g. provide completion.

polytypic · 2023-03-02T16:50:18Z

benchmarks/graph500seq/kernel2_run.ml

+    if sparse_graph_input_filename = "" then begin
+        Printf.eprintf "Must provide sparse graph input file argument.\n"; exit 1
+    end;
+    if sparse_graph_input_filename = "" then begin


This should probably refer to samples_input_filename?

polytypic · 2023-03-02T16:52:40Z

benchmarks/graph500seq/kernel3Seq.ml

+
+let dijkstra g start = 
+    let nvertices = SparseGraphSeq.num_vertices g in
+    let distance = Array.init nvertices (fun _ -> Float.infinity) in


Array.make nvertices Float.infinity would be a bit simpler here.

polytypic · 2023-03-02T16:59:32Z

benchmarks/graph500seq/sparseGraphSeq.mli

+(** Takes a sparse graph g and a vertex v and returns the next edgenode 
+    res in the edgenode list of v and changes the state of g with res removed
+    from the edgenode list of v. *)
+val get_next_edgenode : t -> vertex -> (vertex * weight)


It could make sense to introduce a type like

type edgenode = {vertex: vertex; weight: weight}

rather than use a tuple.

tmcgilchrist · 2024-02-01T06:18:51Z

Is this worthwhile to pickup again and get it merged @polytypic @Sudha247 @punchagan ?

Copy work from https://github.com/moazzammoriani/graph500

54d4086

moazzammoriani changed the title ~~New implementation of kernel 2 and 3 for graph500~~ New implementations of kernel 2 and 3 for graph500 Aug 26, 2022

moazzammoriani added 3 commits August 26, 2022 19:44

Remove dune-project

ea8eae1

Remove GraphTypes stanza from graph500seq dune file

8a13182

Remove unnecessary stanzas

cae646d

shakthimaan requested a review from Sudha247 September 2, 2022 07:59

Refactor cleanup and document

4ff7f33

polytypic reviewed Mar 2, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New implementations of kernel 2 and 3 for graph500 #388

New implementations of kernel 2 and 3 for graph500 #388

Uh oh!

moazzammoriani commented Aug 26, 2022 •

edited

Loading

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

polytypic Mar 2, 2023

Uh oh!

tmcgilchrist commented Feb 1, 2024

Uh oh!

Uh oh!

		@@ -0,0 +1,3 @@
		val to_file : filename:string -> 'a -> unit

		val from_file : string -> 'a

New implementations of kernel 2 and 3 for graph500 #388

Are you sure you want to change the base?

New implementations of kernel 2 and 3 for graph500 #388

Uh oh!

Conversation

moazzammoriani commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduction

Search Keys Sampler

Kernel 1 Update

Note About running kernel[2,3]_run.exe and kernel[2,3]_run_multicore.exe

Performance

Kernel 2

My machine

Godel

Kernel 3

My machine

Godel

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

polytypic Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

tmcgilchrist commented Feb 1, 2024

Uh oh!

Uh oh!

moazzammoriani commented Aug 26, 2022 •

edited

Loading

Note About running `kernel[2,3]_run.exe` and `kernel[2,3]_run_multicore.exe`