-
Notifications
You must be signed in to change notification settings - Fork 11
High Level Architecture Design
Welcome to the goffish_v3 wiki!
Broad classification of required components:
- Programming Model
Specifies the programming abstraction provided by the framework using which applications can be written. Popular choices of programming model includes vertex-centric and subgraph-centric. - Runtime Model
Responsible for dynamic operations which can be performed during the run time. - Storage Model
Deals with efficient storage and retrieval of various types of graphs. - Admin Model
Responsible for admin tasks including logging, setup, UI etc.
Specific requirements:
-
Support for time series graphs and dynamic graphs [1][2]
In addition to the static graphs, time series and dynamic graphs should be natively supported by the framework. Should be able to process multiple graphs at the same time. -
Dynamic arrival of graphs [2]
Process streaming graph data. Existing Systems with this feature: Streaming Graphs supported in Kineograph. -
Subgraph centric programming model [1]
Programming model which should be exposed to the programmers to use the framework. Support to define subgraph (WCC,SCC,etc.) -
Repartitioning on the fly [2]
Repartitioning during run time to support efficient processing of graph. -
Modify topology [1][2]
Insertion, deletion and changing edge direction in the graph structure. -
Updates to storage [3]
Update persistent stored data on the file system. -
Operations on meta-graph [1][2]
Processing meta-graph instead of the whole graph.
Ability to run analytics on meta-graph before running actual algorithm, result of which can be used for resource allocation. -
Transaction jobs and long running jobs [1][2]
Support for both type of processing: batch and real time. -
Scale out and in the storage [3]
Native support for horizontal scaling of storage. -
Analyzing packing and reliability [2]
Efficient packing of messages and reliably delivering them. -
Worker tasks allocation and scheduling [2]
Allocation of tasks to worker nodes and scheduling tasks. Ability to specify subgraphID, workerID, HostID for a vertex in the input format itself. -
Dynamic mapping of workers to VMs [2]
Dynamic mapping to efficiently use resources and reduce execution time. Ability to specify threshold value for parameters like outgoing message, compute time, beyond which mapping will be recomputed. -
Composition of phases, tasks etc [1][2]
Applications can be designed as collection of phases which can have specific order and resource requirements. Generalization of master-compute, BSP, reduce, init and conclusion. -
Flexible graph API [1]
Flexible graph API to support various graph data structure as per the application requirements. -
Logging, setup, UI, reliability [4]
Admin tasks to ease development of applications. -
Write out non-graph data to file system [3]
Write results to the file system. -
Pre-computation stats [1][2]
Stats to gain insights about the graph and perform dynamic operations. -
Check points [2][3]
Check points to restart and migrate graph data and worker tasks to help in recovery in case of failures. -
Different types of input, application and architecture [1][2]
For applications to leverage the 3-dimensions: input, application and architecture. -
Support for scale in/out for VMs Same deployment can be used with subset of available VMs.
Dynamically scale in/out support for VM like Storm.