Skip to content

Adding a new graph processing platform

Tim Hegeman edited this page Feb 16, 2016 · 7 revisions

This page contains instructions for developers adding support for a new graph processing platform to Graphalytics.

Overview

The code for each platform is organized as a Maven project. Each project should include a script to set up the environment for running the benchmark, and may include configuration files. The platform integration code should be written in Java, but may use external scripts or programs to execute algorithms on the platform. The remainder of this page describes the process of adding a platform to Graphalytics step-by-step. The example platform used in this guide is called myplatform and the project is located in a directory called graphalytics-platforms-myplatform. When following this guide you should replace all occurrences of myplatform with the name of your platform.

Step 1: Setting up the Maven project

The first step is to create a Maven project for your platform integration code. The Maven POM file should include at least dependencies on the graphalytics-core and graphalytics-core:resources. In addition, the artifactId should be graphalytics-platforms-myplatform (replacing myplatform with your platform name). The groupId is free to choose by the authors/maintainers of the platform integration code. The directory structure should follow the Maven standard: src/main/java and src/main/resources for Java code and resources, respectively.

For example, the pom.xml file for myplatform:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>nl.tudelft.graphalytics</groupId>
	<artifactId>graphalytics-platforms-myplatform</artifactId>
	<version>0.1-SNAPSHOT</version>
	<packaging>jar</packaging>

	<!-- Define all dependency versions in one place for easy management -->
	<properties>
		<graphalytics.version>0.3-SNAPSHOT</graphalytics.version>
		<log4j.version>2.5</log4j.version>
	</properties>

	<dependencies>
		<!-- graphalytics-core includes everything required to integrate with Graphalytics -->
		<dependency>
			<groupId>nl.tudelft.graphalytics</groupId>
			<artifactId>graphalytics-core</artifactId>
			<version>${graphalytics.version}</version>
		</dependency>
		<!-- graphalytics-core:resources bundles all scripts and configuration files required to run Graphalytics -->
		<dependency>
			<groupId>nl.tudelft.graphalytics</groupId>
			<artifactId>graphalytics-core</artifactId>
			<version>${graphalytics.version}</version>
			<type>tar.gz</type>
			<classifier>resources</classifier>
			<scope>runtime</scope>
		</dependency>

		<!-- Use the Log4j2 backend for the Log4j2 and SLF4j API's used by graphalytics-core and its dependencies -->
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-core</artifactId>
			<version>${log4j.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-slf4j-impl</artifactId>
			<version>${log4j.version}</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<!-- Java compiler settings -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.1</version>
				<configuration>
					<source>1.7</source>
					<target>1.7</target>
				</configuration>
			</plugin>

			<!-- Maven Shade plugin used by platform modules to create fat JARs -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-shade-plugin</artifactId>
				<version>2.3</version>
				<configuration>
					<createDependencyReducedPom>false</createDependencyReducedPom>
					<minimizeJar>false</minimizeJar>
					<artifactSet>
						<excludes>
							<exclude>*:*:*:resources</exclude>
						</excludes>
					</artifactSet>
					<filters>
						<filter>
							<artifact>*:*</artifact>
							<excludes>
								<exclude>META-INF/*.SF</exclude>
								<exclude>META-INF/*.DSA</exclude>
								<exclude>META-INF/*.RSA</exclude>
							</excludes>
						</filter>
					</filters>
				</configuration>
				<executions>
					<execution>
						<phase>package</phase>
						<goals>
							<goal>shade</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>
</project>

Step 2: Create platform implementation

The integration with Graphalytics will be done in a single class implementing the nl.tudelft.graphalytics.Platform interface. To do so, we create a MyPlatformPlatform class (note: this is in line with the naming convention used by other platforms, such as GiraphPlatform). The uploadGraph, executeAlgorithmOnGraph and deleteGraph functions are specific to each platform, and can be implemented later according to the comments in the example code below, and the Javadoc of the Platform interface.

The MyPlatformPlatform class is included below.

public class MyPlatformPlatform extends AbstractPlatform {

	@Override
	public void uploadGraph(Graph graph) throws Exception {
		// This method can be used to pre-process the input graph. This can include uploading to a distributed
		// filesystem or database, converting the graph to a different file format, etc.
		// Graphalytics ensures that at any given time no more than one graph is uploaded. Before uploading the next
		// graph, any previously uploaded graph is deleted using the deleteGraph method. Algorithms are executed only
		// on a graph that has been uploaded but not deleted.
	}

	@Override
	public PlatformBenchmarkResult executeAlgorithmOnGraph(Benchmark benchmark) throws PlatformExecutionException {
		// This method is called once for every combination of algorithm and graph (encapsulated in a Benchmark object)
		// that should be executed for the Graphalytics benchmark suite.

		// If an algorithm is not supported, or a failure occurs during execution of an algorithm,
		// a PlatformExecutionException should be thrown.
		throw new PlatformExecutionException("Executing algorithm " + benchmark.getAlgorithm() + " is not yet supported");

		// The PlatformBenchmarkResult class will be extended in the future to allow platforms to self-report
		// additional information on e.g. runtime breakdown, settings used by the platform
		return new PlatformBenchmarkResult(NestedConfiguration.empty());
	}

	@Override
	public void deleteGraph(String graphName) {
		// This method is called when a graph is no longer needed.
	}

	@Override
	public String getName() {
		return "myplatform";
	}

}

Step 3: Add glue files to allow the platform implementation to be found

The Graphalytics core includes a main function that reads the benchmark configuration, orchestrates the benchmarking process, and processes the results. To find the platform-specific integration code, Graphalytics needs two additional files. First, a prepare-benchmark.sh script in the root of the project. This script should export the platform environment variable with the name of the platform. In addition, it may set platform_classpath to include additional files on the classpath when executing Graphalytics, or java_opts to pass arguments to the JVM in which Graphalytics runs. An example for myplatform is included below:

#!/usr/bin/sh

export platform=myplatform

Graphalytics uses the platform variable to find the name of the class implementing the Platform interface. A second file is created for this; it should saved as src/main/resources/myplatform.platform (where myplatform must be identical to the value of the platform environment variable), and it should contain exactly one name with the fully-qualified name of the class implementing the Platform interface. For example:

nl.tudelft.graphalytics.myplatform.MyPlatformPlatform

Step 4: Create distribution using maven-assembly-plugin

The final step to complete the skeleton for a platform integration is to add a maven-assembly descriptor for building a distribution of the benchmark. The majority of this descriptor is identical for all platforms, but minor changes may be needed to include, e.g., platform-specific configuration files. The descriptor for myplatform, saved as src/main/assembly/bin.xml, is included below:

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 http://maven.apache.org/xsd/assembly-1.1.2.xsd">
	<id>bin</id>
	<formats>
		<!-- The distribution is packaged in a .tar.gz file -->
		<format>tar.gz</format>
	</formats>

	<fileSets>
		<!-- Copy the prepare-benchmark script and any configuration files to the distribution -->
		<fileSet>
			<directory>${project.basedir}</directory>
			<outputDirectory>.</outputDirectory>
			<includes>
				<include>prepare-benchmark.sh</include>
				<!-- Configuration files should be stored in config-template/ and can be included in the distribution by uncommenting the following line -->
				<!--<include>config-template/**</include>-->
			</includes>
		</fileSet>
		<!-- Copy the fat JAR produced by maven-shade-plugin to lib/ in the distribution -->
		<fileSet>
			<directory>${project.build.directory}</directory>
			<outputDirectory>lib</outputDirectory>
			<includes>
				<include>*.jar</include>
			</includes>
			<excludes>
				<!-- Exclude backup artifacts produced by maven-shade-plugin -->
				<exclude>original*</exclude>
			</excludes>
		</fileSet>
	</fileSets>

	<files>
		<!-- Copy the project README to the distribution with a different name to avoid conflicts -->
		<file>
			<source>${project.basedir}/README.md</source>
			<outputDirectory>.</outputDirectory>
			<destName>README-graphalytics-myplatform.md</destName>
		</file>
	</files>

	<dependencySets>
		<!-- Extract the scripts and other resources required by the core into the root of the distribution -->
		<dependencySet>
			<outputDirectory>.</outputDirectory>
			<!-- Matches the graphalytics-core:resources dependency and any future dependencies containing resources -->
			<includes>
				<include>*:resources</include>
			</includes>
			<!-- Unpack the .tar.gz artifacts attached to the *:resources dependencies to make scripts, etc. available in the root -->
			<unpack>true</unpack>
			<unpackOptions>
				<excludes>
					<!-- Ignore metadata files produced by Java when extracting the dependencies -->
					<exclude>META-INF/**</exclude>
				</excludes>
			</unpackOptions>
		</dependencySet>
	</dependencySets>
</assembly>

Step 5: Test run

You should now be able to follow the instructions in the Graphalytics README for existing platform integrations to build and run your own. If you configure Graphalytics correctly and add a graph, you should be able to run the benchmark and see failures for all algorithms. From here you can add implementations of the different algorithms included in Graphalytics. To aid in testing your implementations we provide the graphalytics-validation library with small sample graphs and code to verify that the output of your implementation matches the expected output. For a complete example of implementing and validating the Graphalytics benchmark, see the Graphalytics reference implementation at https://github.com/tudelft-atlarge/graphalytics-platforms-reference.