Skip to content

Gradoop Accumulo Store

手不要乱摸 edited this page Jun 29, 2018 · 8 revisions

Accumulo Store for Gradoop

Apache Accumulo

Apache Accumulo is a key/value store based on the design of Google's BigTable.

With this adapter implementation you can use Apache Accumulo as DataSource or DataSink for your graph data.

Using gradoop accumulo

Compile gradoop accumulo with mvn clean install -DskipTests=true. Copy gradoop-accumulo/target/gradoop-accumulo-<ver>.jar into your client lib.

Or you can simply use maven pom as below:

<!-- Maven Gradoop Accumulo -->
<dependency>
    <groupId>org.gradoop</groupId>
    <artifactId>gradoop-accumulo</artifactId>
    <version>${gradoop.version}</version>
</dependency>

Adding a Accumulo Runtime Iterator

Before using gradoop accumulo, you must put gradoop-accumulo-iterator.jar into your accumulo runtime library.

Run the instructions below to build your accumulo runtime library:

cd gradoop-store/gradoop-accumulo && mvn clean install

Then copy gradoop-accumulo/target/iterator/*.jar to your accumulo runtime library path (local filesystem or hdfs).

If you use a native external lib , just copy it to $ACCUMULO_HOME/lib/ext.

For more details about accumulo's iterator setting, please visit Apache Accumulo Manual.

Creation of an Accumulo based Graph-store

// flink execution env
ExecutionEnvironment env = ExecutionEnvironment.createLocalEnvironment();

// create gradoop accumulo configuration
GradoopAccumuloConfig config = GradoopAccumuloConfig.create(env)
  .set(GradoopAccumuloConfig.ACCUMULO_USER, {user})
  .set(GradoopAccumuloConfig.ACCUMULO_INSTANCE, {instance})
  .set(GradoopAccumuloConfig.ZOOKEEPER_HOSTS, {comma separated zookeeper host list})
  .set(GradoopAccumuloConfig.ACCUMULO_PASSWD, {password})
  .set(GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX, {table prefix});

// create store
AccumuloStore graphStore = new AccumuloStore(config);

let's just add some graph elements

graphStore.writeGraphHead(graphHead);
graphStore.wirteVertex(vertex);
graphStore.writeEdge(edge);

graphStore.flush();

Accessing Data

Example for DataSink & DataSource

// data source
DataSource accumuloDataSource = new AccumuloDataSource(config);

GraphCollection result = accumuloDataSource.cypher(
    "MATCH (u1:Person)<-[:hasModerator]-(f:Forum)" +
    "(u2:Person)<-[:hasMember]-(f)" +
    "WHERE u1.name = \"Alice\"");

// data sink
DataSink accumuloSink = new AccumuloDataSink(config);

accumuloSink.write(result);

Store Layout

Gradoop store instances(GradoopAccumuloStore) are divided by their table name prefix (GradoopAccumuloConfig.ACCUMULO_TABLE_PREFIX)

GraphData (Table graph)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
  {id}    *-------------*-------------------------*---------------*---------------------
          |   property  |        property key     |               |  {property}
----------*-------------*-------------------------*---------------*---------------------

VertexData (Table vertex)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
          *-------------*-------------------------*---------------*---------------------
  {id}    |   property  |        property key     |               |  {property}
          *-------------*-------------------------*---------------*---------------------
          |   graph     |        {graph id}       |               |
----------*-------------*-------------------------*---------------*---------------------

EdgeData (Table edge)

----------*-------------*-------------------------*---------------*---------------------
  row     |     cf      |           cq            |  timestamp    |   value
----------*-------------*-------------------------*---------------*---------------------
          |   label     |                         |               |  {label}
          *-------------*-------------------------*---------------*---------------------
          |   source    |                         |               |  {vertex id}
          *-------------*-------------------------*---------------*---------------------
  {id}    |   target    |                         |               |  {vertex id}
          *-------------*-------------------------*---------------*---------------------
          |   property  |        property key     |               |  {property}
          *-------------*-------------------------*---------------*---------------------
          |   graph     |        {graph id}       |               |
----------*-------------*-------------------------*---------------*---------------------