Skip to content

Manual: Implementing Driver

Gabor Szarnyas edited this page Apr 5, 2019 · 8 revisions

The Graphalytics benchmark suite can be extended by developing a platform driver for a (new) graph processing platform. A platform driver can be rapidly prototyped by using the graphalytics-platforms-default-archetype, the process of which is explained in this manual.

Prototyping the platform driver

Download ldbc_graphalytics from our core GitHub repository and install it into your local environment with Maven command mvn clean install. After this action, graphalytics-platforms-default-archetype becomes available for use.

Imagine a new graph processing platform becomes available: a platform named "XGraph 1.0" developed by "John Smith". The following key variables can be defined for our template:

platform_name="Xgraph" #a letter sequence with first letter capitalized and other letters in lower case.
platform_acronym="xgraph" #a letter sequence in lower case.
platform_version="1.0" #the version number.
developer_name="John Smith" #full name of the platform driver developer.

To use graphalytics-platforms-default-archetype to prototype a platform-specific driver, run the following command:

mvn archetype:generate -B \
 -DarchetypeGroupId=science.atlarge.graphalytics \
 -DarchetypeArtifactId=graphalytics-platforms-default-archetype \
 -DarchetypeVersion=1.0.0 \
 -DgroupId=science.atlarge.graphalytics \
 -Dpackage=science.atlarge \
 -DartifactId="graphalytics-platforms-${platform_acronym}" -Dversion=0.1-SNAPSHOT \
 -Dplatform-name="${platform_name}" \
 -Dplatform-acronym="${platform_acronym}" \
 -Dplatform-version="${platform_version}" \
 -Ddeveloper-name="${developer_name}"

As a result, a directory named graphalytics-platforms-xgraph will be generated with the following content.

.
├── bin
│   └── sh
│       ├── execute-job.sh
│       ├── load-graph.sh
│       ├── prepare-benchmark.sh
│       ├── terminate-job.sh
│       └── unload-graph.sh
├── config-template
│   └── platform.properties
├── LICENSE
├── pom.xml
├── README.md
└── src
    ├── main
    │   ├── assembly
    │   │   └── bin.xml
    │   ├── c
    │   ├── java
    │   │   └── science
    │   │       └── atlarge
    │   │           └── graphalytics
    │   │               └── xgraph
    │   │                   ├── algorithms
    │   │                   │   ├── bfs
    │   │                   │   │   └── BreadthFirstSearchJob.java
    │   │                   │   ├── cdlp
    │   │                   │   │   └── CommunityDetectionLPJob.java
    │   │                   │   ├── lcc
    │   │                   │   │   └── LocalClusteringCoefficientJob.java
    │   │                   │   ├── pr
    │   │                   │   │   └── PageRankJob.java
    │   │                   │   ├── sssp
    │   │                   │   │   └── SingleSourceShortestPathsJob.java
    │   │                   │   └── wcc
    │   │                   │       └── WeaklyConnectedComponentsJob.java
    │   │                   ├── XgraphCollector.java
    │   │                   ├── XgraphConfiguration.java
    │   │                   ├── XgraphJob.java
    │   │                   ├── XgraphLoader.java
    │   │                   ├── XgraphPlatform.java
    |   |                   └── ProcTimeLog.java
    │   └── resources
    │       └── project
    │           └── build
    │               └── platform.properties
    └── test
        └── java
            └── science
                └── atlarge

Implementing the platform driver

Now that there is a working prototype for the platform driver, adapt the driver with the platform-specific implementation. Presumably, all platforms can be interacted with command-line API. The platform driver prototype provide extensive supports for command-line platform API, which will be discussed in more details. However, keep in mind that for Java-compatible platforms (Java, Scala), it may be even easier to directly call the platform API from Java code.

The most important implementation regarding the benchmark execution can be found in ${platform-name}Platform, which implements 8 key platform APIs: verify-setup, load-graph, prepare, setup, run, finalize, terminate, and delete-graph (see more details in Section 3.3 Benchmark execution of the technical specification).

Graph Loading and Unloading

Typically, a graph processing platform pre-processes graph data into some optimized platform-specific data format, and runs multiple algorithms on the pre-processed graph data (see more details in Section 2.5 Formal Definition: Job of the technical specification).

The loadGraph and deleteGraph functions of ${platform-name}Platform are responsible the graph loading and unloading operations. These operations are implemented in ${platform-name}Loader, which provide information regarding the graph dataset such as graph-name, input-path, output-path, directed and weighted for corresponding command-line scripts bin/sh/load-graph.sh and bin/sh/unload-graph.sh.

  • Java-compatible platform APIs can be called by adapting ${platform-name}Loader directly. Furthermore, platform.properties can be omitted by removing the related properties in XGraphJob.java, XGraphConfiguration.java and by omitting the checks in bin/sh/prepare-benchmark.sh.
  • Command-line platform APIs can be called by adapting bin/sh/load-graph.sh and bin/sh/unload-graph.sh.

If the platform does not support pre-processing the graph data, just use the provided data format during the job execution operations.

Job Execution and Termination

The run and terminate function ${platform-name}Platform are responsible for job execution and termination. The operation is implemented in ${platform-name}Job, which provides information regarding the benchmark, algorithm, dataset, and platform parameters. ${platform-name}Job provides information regarding the job execution for the corresponding command-line scripts bin/sh/execute-job.sh and bin/sh/terminate-job.sh.

  • Java-compatible platform APIs can be called by adapting ${platform-name}Platform and ${platform-name}Job directly.
  • Command-line platform APIs can be called by adapting bin/sh/execute-job.sh and bin/sh/terminate-job.sh.

Be aware that job termination is only necessary when the benchmark time-out is reached, and the termination operation is done by the benchmark suite, not the runner itself. Therefore ${platform-name}Job does not have a terminate function.

Logging and Metric Collection

To trace performance information during the benchmark execution, the logs of the benchmark run need to be kept for further analysis. Before and after each benchmark run, the startup and the finalize functions of ${platform-name}Platform redirect the platform log to the directory containing the benchmark report. The implementation is provided in ${platform-name}Collector.

Each benchmark run must provide key performance metrics, loading-time, makespan, and processing-time. While the loading-time and makespan are measured directly by the benchmark suite, the processing-time is measured by the platform driver itself. The processing-time needs to be extracted from the benchmark run log, or via other platform API. The processing-time can be obtained by calling ProcTimeLog.start() and ProcTimeLog.end() respectively before and after each algorithm execution.

Software version control

The benchmark suite aims to provide traceable benchmark result. For each benchmark execution, the versioning information of the benchmark tools must be reported, for both graphalytics-core and the platform driver. This requires the code base to be under software version control.

Adapt in pom.xml the scm information. Change the URL to point to the actual address of the software repository. For git-based repository, the git commit hash and git branch information are automated labeled during the packaging of the platform driver distribution. For other types of repository, modify project/build/platform.properties after the packaging process.