-
Notifications
You must be signed in to change notification settings - Fork 34
Adding a new graph processing platform
This page contains instructions for developers adding support for a new graph processing platform to Graphalytics.
The code for each platform is organized as a Maven project. Each project should include a script to set up the environment for running the benchmark, and may include configuration files. The platform integration code should be written in Java, but may use external scripts or programs to execute algorithms on the platform. The remainder of this page describes the process of adding a platform to Graphalytics step-by-step. The example platform used in this guide is called myplatform
and the project is located in a directory called graphalytics-platforms-myplatform
. When following this guide you should replace all occurrences of myplatform
with the name of your platform.
The first step is to create a Maven project for your platform integration code. The Maven POM file should include at least dependencies on the graphalytics-core
and graphalytics-core:resources
. In addition, the artifactId
should be graphalytics-platforms-myplatform
(replacing myplatform
with your platform name). The groupId
is free to choose by the authors/maintainers of the platform integration code. The directory structure should follow the Maven standard: src/main/java
and src/main/resources
for Java code and resources, respectively.
For example, the pom.xml
file for myplatform
:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>nl.tudelft.graphalytics</groupId>
<artifactId>graphalytics-platforms-myplatform</artifactId>
<version>0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<!-- Define all dependency versions in one place for easy management -->
<properties>
<graphalytics.version>0.3-SNAPSHOT</graphalytics.version>
<log4j.version>2.5</log4j.version>
</properties>
<dependencies>
<!-- graphalytics-core includes everything required to integrate with Graphalytics -->
<dependency>
<groupId>nl.tudelft.graphalytics</groupId>
<artifactId>graphalytics-core</artifactId>
<version>${graphalytics.version}</version>
</dependency>
<!-- graphalytics-core:resources bundles all scripts and configuration files required to run Graphalytics -->
<dependency>
<groupId>nl.tudelft.graphalytics</groupId>
<artifactId>graphalytics-core</artifactId>
<version>${graphalytics.version}</version>
<type>tar.gz</type>
<classifier>resources</classifier>
<scope>runtime</scope>
</dependency>
<!-- Use the Log4j2 backend for the Log4j2 and SLF4j API's used by graphalytics-core and its dependencies -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>${log4j.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java compiler settings -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<!-- Maven Shade plugin used by platform modules to create fat JARs -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.3</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
<minimizeJar>false</minimizeJar>
<artifactSet>
<excludes>
<exclude>*:*:*:resources</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
The integration with Graphalytics will be done in a single class implementing the nl.tudelft.graphalytics.Platform
interface. To do so, we create a MyPlatformPlatform
class (note: this is in line with the naming convention used by other platforms, such as GiraphPlatform
). The uploadGraph
, executeAlgorithmOnGraph
and deleteGraph
functions are specific to each platform, and can be implemented later according to the comments in the example code below, and the Javadoc of the Platform interface.
The MyPlatformPlatform
class is included below.
public class MyPlatformPlatform extends AbstractPlatform {
@Override
public void uploadGraph(Graph graph) throws Exception {
// This method can be used to pre-process the input graph. This can include uploading to a distributed
// filesystem or database, converting the graph to a different file format, etc.
// Graphalytics ensures that at any given time no more than one graph is uploaded. Before uploading the next
// graph, any previously uploaded graph is deleted using the deleteGraph method. Algorithms are executed only
// on a graph that has been uploaded but not deleted.
}
@Override
public PlatformBenchmarkResult executeAlgorithmOnGraph(Benchmark benchmark) throws PlatformExecutionException {
// This method is called once for every combination of algorithm and graph (encapsulated in a Benchmark object)
// that should be executed for the Graphalytics benchmark suite.
// If an algorithm is not supported, or a failure occurs during execution of an algorithm,
// a PlatformExecutionException should be thrown.
throw new PlatformExecutionException("Executing algorithm " + benchmark.getAlgorithm() + " is not yet supported");
// The PlatformBenchmarkResult class will be extended in the future to allow platforms to self-report
// additional information on e.g. runtime breakdown, settings used by the platform
return new PlatformBenchmarkResult(NestedConfiguration.empty());
}
@Override
public void deleteGraph(String graphName) {
// This method is called when a graph is no longer needed.
}
@Override
public String getName() {
return "myplatform";
}
}
The Graphalytics core includes a main function that reads the benchmark configuration, orchestrates the benchmarking process, and processes the results. To find the platform-specific integration code, Graphalytics needs two additional files. First, a prepare-benchmark.sh
script in the root of the project. This script should export the platform
environment variable with the name of the platform. In addition, it may set platform_classpath
to include additional files on the classpath when executing Graphalytics, or java_opts
to pass arguments to the JVM in which Graphalytics runs. An example for myplatform
is included below:
#!/usr/bin/sh
export platform=myplatform
Graphalytics uses the platform
variable to find the name of the class implementing the Platform interface. A second file is created for this; it should saved as src/main/resources/myplatform.platform
(where myplatform
must be identical to the value of the platform
environment variable), and it should contain exactly one name with the fully-qualified name of the class implementing the Platform interface. For example:
nl.tudelft.graphalytics.myplatform.MyPlatformPlatform
The final step to complete the skeleton for a platform integration is to add a maven-assembly descriptor for building a distribution of the benchmark. The majority of this descriptor is identical for all platforms, but minor changes may be needed to include, e.g., platform-specific configuration files. The descriptor for myplatform
, saved as src/main/assembly/bin.xml
, is included below:
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">
<id>bin</id>
<formats>
<!-- The distribution is packaged in a .tar.gz file -->
<format>tar.gz</format>
</formats>
<fileSets>
<!-- Copy the prepare-benchmark script and any configuration files to the distribution -->
<fileSet>
<directory>${project.basedir}</directory>
<outputDirectory>.</outputDirectory>
<includes>
<include>prepare-benchmark.sh</include>
<!-- Configuration files should be stored in config-template/ and can be included in the distribution by uncommenting the following line -->
<!--<include>config-template/**</include>-->
</includes>
</fileSet>
<!-- Copy the fat JAR produced by maven-shade-plugin to lib/ in the distribution -->
<fileSet>
<directory>${project.build.directory}</directory>
<outputDirectory>lib</outputDirectory>
<includes>
<include>*.jar</include>
</includes>
<excludes>
<!-- Exclude backup artifacts produced by maven-shade-plugin -->
<exclude>original*</exclude>
</excludes>
</fileSet>
</fileSets>
<files>
<!-- Copy the project README to the distribution with a different name to avoid conflicts -->
<file>
<source>${project.basedir}/README.md</source>
<outputDirectory>.</outputDirectory>
<destName>README-graphalytics-myplatform.md</destName>
</file>
</files>
<dependencySets>
<!-- Extract the scripts and other resources required by the core into the root of the distribution -->
<dependencySet>
<outputDirectory>.</outputDirectory>
<!-- Matches the graphalytics-core:resources dependency and any future dependencies containing resources -->
<includes>
<include>*:resources</include>
</includes>
<!-- Unpack the .tar.gz artifacts attached to the *:resources dependencies to make scripts, etc. available in the root -->
<unpack>true</unpack>
<unpackOptions>
<excludes>
<!-- Ignore metadata files produced by Java when extracting the dependencies -->
<exclude>META-INF/**</exclude>
</excludes>
</unpackOptions>
</dependencySet>
</dependencySets>
</assembly>
You should now be able to follow the instructions in the Graphalytics README for existing platform integrations to build and run your own. If you configure Graphalytics correctly and add a graph, you should be able to run the benchmark and see failures for all algorithms. From here you can add implementations of the different algorithms included in Graphalytics. To aid in testing your implementations we provide the graphalytics-validation
library with small sample graphs and code to verify that the output of your implementation matches the expected output. For a complete example of implementing and validating the Graphalytics benchmark, see the Graphalytics reference implementation at https://github.com/tudelft-atlarge/graphalytics-platforms-reference.
Tutorial
Documentation