-
Notifications
You must be signed in to change notification settings - Fork 34
Manual: Implementing Driver
The Graphalytics benchmark suite can be extended by developing a platform driver for a (new) graph processing platform. A platform driver can be rapidly prototyped by using the graphalytics-platforms-default-archetype
, the process of which is explained in this manual.
Download ldbc_graphalytics
from our core GitHub repository and install it into your local environment with Maven command mvn clean install
. After this action, graphalytics-platforms-default-archetype
becomes available for use.
Imagine a new graph processing platform becomes available: a platform named "XGraph 1.0" developed by "John Smith". The following key variables can be defined for our template:
platform_name="Xgraph" #a letter sequence with first letter capitalized and other letters in lower case.
platform_acronym="xgraph" #a letter sequence in lower case.
platform_version="1.0" #the version number.
developer_name="John Smith" #full name of the platform driver developer.
To use graphalytics-platforms-default-archetype
to prototype a platform-specific driver, run the following command:
mvn archetype:generate -B \
-DarchetypeGroupId=science.atlarge.graphalytics \
-DarchetypeArtifactId=graphalytics-platforms-default-archetype \
-DarchetypeVersion=1.0.0 \
-DgroupId=science.atlarge.graphalytics \
-Dpackage=science.atlarge \
-DartifactId="graphalytics-platforms-${platform_acronym}" -Dversion=0.1-SNAPSHOT \
-Dplatform-name="${platform_name}" \
-Dplatform-acronym="${platform_acronym}" \
-Dplatform-version="${platform_version}" \
-Ddeveloper-name="${developer_name}"
As a result, a directory named graphalytics-platforms-xgraph
will be generated with the following content.
.
├── bin
│ └── sh
│ ├── execute-job.sh
│ ├── load-graph.sh
│ ├── prepare-benchmark.sh
│ ├── terminate-job.sh
│ └── unload-graph.sh
├── config-template
│ └── platform.properties
├── LICENSE
├── pom.xml
├── README.md
└── src
├── main
│ ├── assembly
│ │ └── bin.xml
│ ├── c
│ ├── java
│ │ └── science
│ │ └── atlarge
│ │ └── graphalytics
│ │ └── xgraph
│ │ ├── algorithms
│ │ │ ├── bfs
│ │ │ │ └── BreadthFirstSearchJob.java
│ │ │ ├── cdlp
│ │ │ │ └── CommunityDetectionLPJob.java
│ │ │ ├── lcc
│ │ │ │ └── LocalClusteringCoefficientJob.java
│ │ │ ├── pr
│ │ │ │ └── PageRankJob.java
│ │ │ ├── sssp
│ │ │ │ └── SingleSourceShortestPathsJob.java
│ │ │ └── wcc
│ │ │ └── WeaklyConnectedComponentsJob.java
│ │ ├── XgraphCollector.java
│ │ ├── XgraphConfiguration.java
│ │ ├── XgraphJob.java
│ │ ├── XgraphLoader.java
│ │ ├── XgraphPlatform.java
| | └── ProcTimeLog.java
│ └── resources
│ └── project
│ └── build
│ └── platform.properties
└── test
└── java
└── science
└── atlarge
Now that there is a working prototype for the platform driver, adapt the driver with the platform-specific implementation. Presumably, all platforms can be interacted with command-line API. The platform driver prototype provide extensive supports for command-line platform API, which will be discussed in more details. However, keep in mind that for Java-compatible platforms (Java, Scala), it may be even easier to directly call the platform API from Java code.
The most important implementation regarding the benchmark execution can be found in ${platform-name}Platform
, which implements 8 key platform APIs: verify-setup
, load-graph
, prepare
, setup
, run
, finalize
, terminate
, and delete-graph
(see more details in Section 3.3 Benchmark execution of the technical specification).
Typically, a graph processing platform pre-processes graph data into some optimized platform-specific data format, and runs multiple algorithms on the pre-processed graph data (see more details in Section 2.5 Formal Definition: Job of the technical specification).
The loadGraph
and deleteGraph
functions of ${platform-name}Platform
are responsible the graph loading and unloading operations. These operations are implemented in ${platform-name}Loader
, which provide information regarding the graph dataset such as graph-name
, input-path
, output-path
, directed
and weighted
for corresponding command-line scripts bin/sh/load-graph.sh
and bin/sh/unload-graph.sh
.
- Java-compatible platform APIs can be called by adapting
${platform-name}Loader
directly. Furthermore,platform.properties
can be omitted by removing the related properties inXGraphJob.java
,XGraphConfiguration.java
and by omitting the checks inbin/sh/prepare-benchmark.sh
. - Command-line platform APIs can be called by adapting
bin/sh/load-graph.sh
andbin/sh/unload-graph.sh
.
If the platform does not support pre-processing the graph data, just use the provided data format during the job execution operations.
The run
and terminate
function ${platform-name}Platform
are responsible for job execution and termination. The operation is implemented in ${platform-name}Job
, which provides information regarding the benchmark, algorithm, dataset, and platform parameters. ${platform-name}Job
provides information regarding the job execution for the corresponding command-line scripts bin/sh/execute-job.sh
and bin/sh/terminate-job.sh
.
- Java-compatible platform APIs can be called by adapting
${platform-name}Platform
and${platform-name}Job
directly. - Command-line platform APIs can be called by adapting
bin/sh/execute-job.sh
andbin/sh/terminate-job.sh
.
Be aware that job termination is only necessary when the benchmark time-out is reached, and the termination operation is done by the benchmark suite, not the runner itself. Therefore ${platform-name}Job
does not have a terminate function.
To trace performance information during the benchmark execution, the logs of the benchmark run need to be kept for further analysis. Before and after each benchmark run, the startup
and the finalize
functions of ${platform-name}Platform
redirect the platform log to the directory containing the benchmark report. The implementation is provided in ${platform-name}Collector
.
Each benchmark run must provide key performance metrics, loading-time, makespan, and processing-time. While the loading-time and makespan are measured directly by the benchmark suite, the processing-time is measured by the platform driver itself. The processing-time needs to be extracted from the benchmark run log, or via other platform API. The processing-time can be obtained by calling ProcTimeLog.start()
and ProcTimeLog.end()
respectively before and after each algorithm execution.
The benchmark suite aims to provide traceable benchmark result. For each benchmark execution, the versioning information of the benchmark tools must be reported, for both graphalytics-core
and the platform driver. This requires the code base to be under software version control.
Adapt in pom.xml
the scm information. Change the URL to point to the actual address of the software repository. For git-based repository, the git commit hash
and git branch
information are automated labeled during the packaging of the platform driver distribution. For other types of repository, modify project/build/platform.properties
after the packaging process.
Tutorial
Documentation