Skip to content

High Level Architecture Design

ravikant7890 edited this page Jul 18, 2016 · 16 revisions

Welcome to the goffish_v3 wiki!

Broad classification of required components:

  1. Programming Model
    Specifies the programming abstraction provided by the framework using which applications can be written. Popular choices of programming model includes vertex-centric and subgraph-centric.
  2. Runtime Model
    Responsible for dynamic operations which can be performed during the run time.
  3. Storage Model
    Deals with efficient storage and retrieval of various types of graphs.
  4. Admin Model
    Responsible for admin tasks including logging, setup, UI etc.

Specific requirements:

  • Support for time series graphs and dynamic graphs [1][2]
    In addition to the static graphs, time series and dynamic graphs should be natively supported by the framework. Should be able to process multiple graphs at the same time.

  • Dynamic arrival of graphs [2]
    Process streaming graph data. Existing Systems with this feature: Streaming Graphs supported in Kineograph.

  • Subgraph centric programming model [1]
    Programming model which should be exposed to the programmers to use the framework. Support to define subgraph (WCC,SCC,etc.)

  • Repartitioning on the fly [2]
    Repartitioning during run time to support efficient processing of graph.

  • Modify topology [1][2]
    Insertion, deletion and changing edge direction in the graph structure.

  • Updates to storage [3]
    Update persistent stored data on the file system.

  • Operations on meta-graph [1][2]
    Processing meta-graph instead of the whole graph.
    Ability to run analytics on meta-graph before running actual algorithm, result of which can be used for resource allocation.

  • Transaction jobs and long running jobs [1][2]
    Support for both type of processing: batch and real time.

  • Scale out and in the storage [3]
    Native support for horizontal scaling of storage.

  • Analyzing packing and reliability [2]
    Efficient packing of messages and reliably delivering them.

  • Worker tasks allocation and scheduling [2]
    Allocation of tasks to worker nodes and scheduling tasks. Ability to specify subgraphID, workerID, HostID for a vertex in the input format itself.

  • Dynamic mapping of workers to VMs [2]
    Dynamic mapping to efficiently use resources and reduce execution time. Ability to specify threshold value for parameters like outgoing message, compute time, beyond which mapping will be recomputed.

  • Composition of phases, tasks etc [1][2]
    Applications can be designed as collection of phases which can have specific order and resource requirements. Generalization of master-compute, BSP, reduce, init and conclusion.

  • Flexible graph API [1]
    Flexible graph API to support various graph data structure as per the application requirements.

  • Logging, setup, UI, reliability [4]
    Admin tasks to ease development of applications.

  • Write out non-graph data to file system [3]
    Write results to the file system.

  • Pre-computation stats [1][2]
    Stats to gain insights about the graph and perform dynamic operations.

  • Check points [2][3]
    Check points to restart and migrate graph data and worker tasks to help in recovery in case of failures.

  • Different types of input, application and architecture [1][2]
    For applications to leverage the 3-dimensions: input, application and architecture.

  • Support for scale in/out for VMs Same deployment can be used with subset of available VMs.
    Dynamically scale in/out support for VM like Storm.

Clone this wiki locally