Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementations of kernel 2 and 3 for graph500 #388

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

moazzammoriani
Copy link
Contributor

@moazzammoriani moazzammoriani commented Aug 26, 2022

Introduction

This PR is a follow-up to #371 . There are 4 major addtions to graph500 in this PR.

  1. Search Keys Sampler
  2. Kernel 1 update
  3. Kernel 2
  4. kernel 3

Search Keys Sampler

The search key sampler produces 64 sample search key for vertices that kernel 2 and kernel 3 will start at. These sample search keys are chosen such that they have no self-loops. Given that sparse.data has been produced by kernel 1, the sample search key data is produced as follows.

$ sampleSearchKeys.exe sparse.data -o samples.data

Kernel 1 Update

With this PR, we can now output the sparse graph produced by kernel 1 to a file. To do so we use the -o flag followed by the name of the output file. For example, the following command will convert the edge representation in edges.data to an adjacency-list sparse graph representation in sparse.data.

$ kernel1_run*.exe edges.data -o sparse.data 

Note About running kernel[2,3]_run.exe and kernel[2,3]_run_multicore.exe

In order to run the executable wrappers for kernel 2 and 3 we have to provide sparse.data and samples.data as inputs in a specific order. The following example with sequential kernel 2 make this clear.

$ kernel2_run.exe sparse.data samples.data

For multicore kernel 2 we would do the same thing but we can also specify the number of domains using the flag -ndmoains followed by the number of domains as follows.

$ kernel2_run_multicore.exe sparse.data samples.data -ndomains 4

kernel 3 is obviously used in the same way.

Note: The output flag is important in kernel 1 because the sparse graph representation in sparse.data has to be fed into kernel 2 and kernel 3. The outputs of kernel 2 and kernel 3 don't need to be used for now so their executables do not have an optional output flag.

Performance

Note: I have a four-core machine.

Kernel 2

Kernel 2 has both a parallel and a sequential version. The difference between the sequential and parallel versions is how the samples array is operated upon. It's operated upon sequentially and in parallel respectively. I ran sequential kernel 2 on my machine with a graph of scale=20. This produced the following results.

My machine

$ time ./kernel2.sh
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 115.834836

real	1m58.724s
user	1m56.979s
sys	0m1.530s

The results for the parallel version of kernel 2 on the same graph were the following.

$ time ./kernel2Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 66.009398

real	1m10.435s
user	4m14.144s
sys	0m2.684s

Godel

I omitted the sequential version on godel because I didn't think it would be really useful here.

$ ./run_all.sh ./kernel2Par.sh
++ exe=./kernel2Par.sh
++ arg=
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 1
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 113.039523

real	1m59.303s
user	1m55.958s
sys	0m3.344s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 2
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 62.282622

real	1m8.525s
user	2m5.894s
sys	0m3.483s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 35.300615

real	0m41.548s
user	2m19.354s
sys	0m3.508s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 8
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 20.421836

real	0m26.662s
user	2m32.384s
sys	0m3.625s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel2Par.sh 12
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 14.864902

real	0m21.116s
user	2m43.890s
sys	0m3.905s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 16
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 11.312145

real	0m17.609s
user	2m53.568s
sys	0m4.416s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 20
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 10.870812

real	0m17.232s
user	3m6.546s
sys	0m4.833s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel2Par.sh 24
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing breadth-first searches...
Done. Time: 9.209034

real	0m15.588s
user	3m16.617s
sys	0m5.467s

Kernel 3

Currently kernel 3 runs in quadratic time but I think it can be improved by using a priority-queue data structure in place of the current data structure from which the min can only be extracted in linear time. Because of this, I have chosen to let scale=15 for the tests below.

The difference between the sequential and parallel versions of kernel 3 is essentially the same as in kernel 2.

My machine

$ time ./kernel3.sh
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 319.742707

real	5m19.883s
user	5m18.790s
sys	0m0.094s
$ time ./kernel3Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 95.310967

real	1m35.426s
user	6m19.609s
sys	0m0.071s

Godel

Omitted the sequential version here too.

$ ./run_all.sh ./kernel3Par.sh
++ exe=./kernel3Par.sh
++ arg=
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 1
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 453.135992

real	7m33.319s
user	7m33.195s
sys	0m0.118s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 2
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 226.645206

real	3m46.820s
user	7m33.107s
sys	0m0.100s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 4
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 113.246789

real	1m53.422s
user	7m32.934s
sys	0m0.104s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 8
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 56.706133

real	0m56.884s
user	7m32.906s
sys	0m0.128s
++ for i in 1 2 4 8 12
++ taskset --cpu-list 2-13 chrt -r 1 ./kernel3Par.sh 12
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 42.408656

real	0m42.586s
user	7m39.685s
sys	0m0.128s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 16
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 28.381341

real	0m28.562s
user	7m33.359s
sys	0m0.156s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 20
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 28.284908

real	0m28.469s
user	7m39.960s
sys	0m0.168s
++ for i in 16 20 24
++ taskset --cpu-list 2-13,16-27 chrt -r 1 ./kernel3Par.sh 24
Reading sparse graph from sparse.data...
Reading search keys from from samples.data...
Performing single-source shortest path searches...
Done. Time: 21.231765

real	0m21.421s
user	7m39.853s
sys	0m0.317s

@moazzammoriani moazzammoriani changed the title New implementation of kernel 2 and 3 for graph500 New implementations of kernel 2 and 3 for graph500 Aug 26, 2022
@shakthimaan shakthimaan requested a review from Sudha247 September 2, 2022 07:59
let rec run_dijkstra g v intree parents (distance : float array) =
match (is_tree_complete intree v) with
| true -> (parents, distance)
| false -> begin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an if-then-else would be cleaner here than match ... with true -> ... | false -> ....

if is_tree_complete intree v then
  (parents, distance)
else begin
  ...
end

@@ -0,0 +1,3 @@
val to_file : filename:string -> 'a -> unit

val from_file : string -> 'a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the Marshal usage came from previous work, but it is rather unsafe as it stands, because it doesn't (and cannot) check that the value returned is of the correct type. You could make it safer in a number of ways. I'd perhaps recommend finding a library that offers a type safe version of marshaling/serialization/pickling.

let nvertices = SparseGraphSeq.num_vertices g in
let parent_arr = Array.init nvertices (fun _ -> -1) in
let discovery_arr = Array.init nvertices (fun _ -> false) in
let q = ref (Queue.create ()) in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of ref seems superfluous. This could be just let q = Queue.create () in ... and then references of !q can be replaced with q in the code that follows.


let () =
Random.self_init ();
Arg.parse [] (fun filename -> files := filename::!files) usage_msg;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a very simple program like this, one could parse the command line arguments using a match expression:

match Sys.argv with
| [| _; sparse_graph_filename; samples_input_filename |] ->
  ...
| _ -> 
  Printf.eprintf "Usage: %s <...> <...>" (Filename.basename Sys.executable_name)

But there are also many libraries for parsing command line arguments that can e.g. provide completion.

if sparse_graph_input_filename = "" then begin
Printf.eprintf "Must provide sparse graph input file argument.\n"; exit 1
end;
if sparse_graph_input_filename = "" then begin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably refer to samples_input_filename?


let dijkstra g start =
let nvertices = SparseGraphSeq.num_vertices g in
let distance = Array.init nvertices (fun _ -> Float.infinity) in

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.make nvertices Float.infinity would be a bit simpler here.

(** Takes a sparse graph g and a vertex v and returns the next edgenode
res in the edgenode list of v and changes the state of g with res removed
from the edgenode list of v. *)
val get_next_edgenode : t -> vertex -> (vertex * weight)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense to introduce a type like

type edgenode = {vertex: vertex; weight: weight}

rather than use a tuple.

@tmcgilchrist
Copy link
Contributor

Is this worthwhile to pickup again and get it merged @polytypic @Sudha247 @punchagan ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants