Skip to content

Gradoop Accumulo Store

Philip Fritzsche edited this page Jan 8, 2019 · 8 revisions

Accumulo Store for Gradoop

Apache Accumulo

Apache Accumulo is a key/value store based on the design of Google's BigTable.

With this adapter implementation you can use Apache Accumulo as DataSource or DataSink for your graph data.

Getting start

1. Adding a Accumulo Runtime Iterator

Before using gradoop accumulo, you must put gradoop-accumulo-iterator.jar into your accumulo runtime library.

Run the instructions below to build your accumulo runtime library:

cd gradoop-store/gradoop-accumulo && mvn clean install

Then copy gradoop-accumulo/target/iterator/*.jar to your accumulo runtime library path (local filesystem or hdfs).

If you use a native external lib , just copy it to $ACCUMULO_HOME/lib/ext.

For more details about accumulo's iterator setting, please visit Apache Accumulo Manual.

2. Using gradoop accumulo

Compile gradoop accumulo with

mvn clean install -DskipTests=true

Copy gradoop-accumulo/target/gradoop-accumulo-<ver>.jar into your client lib.

Or you can simply use maven pom as below:

<!-- Maven Gradoop Accumulo -->
<dependency>
    <groupId>org.gradoop</groupId>
    <artifactId>gradoop-accumulo</artifactId>
    <version>${gradoop.version}</version>
</dependency>

Creation of an Accumulo based Graph-store

// create gradoop accumulo configuration
GradoopAccumuloConfig config = GradoopAccumuloConfig.getDefaultConfig()
  .set(GradoopAccumuloConfig.ACCUMULO_USER, {user})
  .set(GradoopAccumuloConfig.ACCUMULO_INSTANCE, {instance})
  .set(GradoopAccumuloConfig.ZOOKEEPER_HOSTS, {comma separated zookeeper host list})
  .set(GradoopAccumuloConfig.ACCUMULO_PASSWD, {password})
  .set(GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX, {table prefix});

// create store
AccumuloStore graphStore = new AccumuloStore(config);

let's just add some graph elements

graphStore.writeGraphHead(graphHead);
graphStore.wirteVertex(vertex);
graphStore.writeEdge(edge);

graphStore.flush();

Accessing Data

Example for DataSink & DataSource

Read data from store with flink

// data source
GradoopFlinkConfig flinkConfig = GradoopFlinkConfig.createConfig(getExecutionEnvironment());
DataSource accumuloDataSource = new AccumuloDataSource(graphStore, flinkConfig);
GraphCollection result = accumuloDataSource.cypher(
    "MATCH (u1:Person)<-[:hasModerator]-(f:Forum)" +
    "(u2:Person)<-[:hasMember]-(f)" +
    "WHERE u1.name = \"Alice\"");

Write data from store with flink

// data sink
GradoopFlinkConfig flinkConfig = GradoopFlinkConfig.createConfig(getExecutionEnvironment());
DataSink accumuloSink = new AccumuloDataSink(graphStore, flinkConfig);
accumuloSink.write(result);

Store Layout

Gradoop store instances(GradoopAccumuloStore) are divided by their table name prefix (GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX)

GraphData (Table graph)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
  {id}    *-------------*-------------------------*---------------*---------------------
          |   property  |        property key     |               |  {property}
----------*-------------*-------------------------*---------------*---------------------

VertexData (Table vertex)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
          *-------------*-------------------------*---------------*---------------------
  {id}    |   property  |        property key     |               |  {property}
          *-------------*-------------------------*---------------*---------------------
          |   graph     |        {graph id}       |               |
----------*-------------*-------------------------*---------------*---------------------

EdgeData (Table edge)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
          *-------------*-------------------------*---------------*---------------------
          |   source    |                         |               |  {vertex id}
          *-------------*-------------------------*---------------*---------------------
  {id}    |   target    |                         |               |  {vertex id}
          *-------------*-------------------------*---------------*---------------------
          |   property  |        property key     |               |  {property}
          *-------------*-------------------------*---------------*---------------------
          |   graph     |        {graph id}       |               |
----------*-------------*-------------------------*---------------*---------------------