-
Notifications
You must be signed in to change notification settings - Fork 2
Design
Gora is an ORM framework for column oriented datastores. The design goal is to have a common API to access and manage multiple data stores. Gora differs from Java ORM frameworks, in that the special focus is given to column oriented data bases, like Apache HBase and Apache Cassandra.
The aim of Gora is similar to JDO/DataNucleus projects, however Gora differs from these in these aspects :
- JDO is complex. Although we very much love JDO, full JDO support is not planned for the short term.
- JDO is explicitly based on Java. Although Gora is primarily written in Java, using Apache Avro will ease multi language support
- JDO depends on byte code object enhancement, while Gora exends Avro compilers to generate objects from avro schema definitions
Gora supports or plans to support the following back ends
Store | Status |
---|---|
Apache HBase | implemented |
Apache Cassandra | planned |
SQL | implemented |
Avro DataFiles | implemented |
Hadoop TFile/MapFile | planned |
CSV | planned |
On top of these, gora intends to develop plugins for Pig, Cassandra and the like, so that any gora data store can be used, and the mapping of rich data structures from DataStore → Gora → Pig/Cassandra is realized.
Data beans in Gora are defined by Avro schemas. Gora extends Avro specific compiler, and adds necessary fields to track the persistency state of the data objects. Using GoraCompiler, the objects defined in JSON are compiled into Java objects. The compiled objects keep state information about the fields of the object. When the field value changes through setter methods, that field becomes dirty.
Gora data beans implement the Persistent interface. Persistent interface offers methods for introspect the objects state about persistency. Data beans, generated by the GoraCompiler already implements this interface.
DataStore handles actual object persistence. DataStores offer methods for retrieving, persisting, deleting and querying objects. The data model in Gora is a key to persistent object mapping. The key is like the primary key of SQL databases, where as the persistent object is the actual data. All operations require the key to be present.
DataStoreFactory is a factory for DataStores. DataStoreFactory reads the gora.properties file from the classpath to read required properties.
Actual mapping of object fields to data store concepts are defined by XML mapping files. Gora does not assume the actual layout or data model of the data stores, the schema of mapping files are unique to the data store (However, we may later add, column-oriented, key-value, SQL base mappings).
The data stores can be queried using an object implementing the Query interface. The query object can be instantiated from the DataStore#newQuery() method.
Query interface defined methods for setting key range, time range, limit, etc. Moreover, a SQL-like String based Query interface, and Query.setFilter(String) is planned to be implemented soon.
The results of the Query can be iterated with the Result interface.
An optional operation regarding queries is the PartitionQuery interface. PartitionQuery extends the Query interface with an getLocations() method, which returns the addresses of nodes, which the query will run locally on. This is specially useful in creating splits for mapreduce.
Gora supports Mapreduce out of the box. Gora data stores can be used as inputs and outputs of jobs. Moreover, the objects can be serialized, and passed between tasks keeping their persistency state. For the serialization, gora extends Avro DatumWriters.