Skip to content
enis edited this page Sep 14, 2010 · 6 revisions

Design goals

Gora is an ORM framework for column oriented datastores. The design goal is to have a common API to access and manage multiple data stores. Gora differs from Java ORM frameworks, in that the special focus is given to column oriented data bases, like Apache HBase and Apache Cassandra, however persisting objects using JDBC is also planned for the future.

The aim of Gora is similar to JDO/DataNucleus projects, however Gora differs from these in these aspects :
- JDO is complex. Although we very much love JDO, full JDO support is not planned for the short term.
- JDO is explicitly based on Java. Although Gora is primarily written in Java, using Apache Avro will ease multi language support
- JDO depends on byte code object enhancement, while Gora exends Avro compilers to generate objects from avro schema definitions

Data stores

Gora supports or plans to support the following back ends

Store Status
Apache HBase implemented
Apache Cassandra planned
SQL implemented
Avro DataFiles implemented
Hadoop TFile/MapFile planned
CSV planned

On top of these, gora intends to develop plugins for Pig, Cassandra and the like, so that any gora data store can be used, and the mapping of rich data structures from DataStore → Gora → Pig/Cassandra is realized.

Design Overview

GoraCompiler

Data beans in Gora are defined by Avro schemas. Gora extends Avro specific compiler, and adds necessary fields to track the persistency state of the data objects. Using GoraCompiler, the objects defined in JSON are compiled into Java objects. The compiled objects keep state information about the fields of the object. When the field value changes through setter methods, that field becomes dirty.

Persistent

Gora data beans implement the Persistent interface. Persistent interface offers methods for introspect the objects state about persistency. Data beans, generated by the GoraCompiler already implements this interface.

DataStore and DataStoreFactory

DataStore handles actual object persistence. DataStores offer methods for retrieving, persisting, deleting and querying objects. The data model in Gora is a key to persistent object mapping. The key is like the primary key of SQL databases, where as the persistent object is the actual data. All operations require the key to be present.

DataStoreFactory is a factory for DataStores. DataStoreFactory reads the gora.properties file from the classpath to read required properties.

Mapping

Actual mapping of object fields to data store concepts are defined by XML mapping files. Gora does not assume the actual layout or data model of the data stores, the schema of mapping files are unique to the data store (However, we may later add, column-oriented, key-value, SQL base mappings).

Query / Result

The data stores can be queried using an object implementing the Query interface. The query object can be instantiated from the DataStore#newQuery() method.

Query interface defined methods for setting key range, time range, limit, etc. Moreover, a SQL-like String based Query interface, and Query.setFilter(String) is planned to be implemented soon.

The results of the Query can be iterated with the Result interface.

An optional operation regarding queries is the PartitionQuery interface. PartitionQuery extends the Query interface with an getLocations() method, which returns the addresses of nodes, which the query will run locally on. This is specially useful in creating splits for mapreduce.

Mapreduce

Gora supports Mapreduce out of the box. Gora data stores can be used as inputs and outputs of jobs. Moreover, the objects can be serialized, and passed between tasks keeping their persistency state. For the serialization, gora extends Avro DatumWriters.

Clone this wiki locally