Skip to content

Latest commit

 

History

History

sys_design

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Parrot's System Design

Parrot is a distributed serving system for LLM-based Applications. It can be divided into three layers basically:

Overview

The Parrot API w/ Semantic Variable is served by a centralized cluster manager called ServeCore, which manages many Engine instances.

ServeCore serves the Parrot APIs w/ Semantic Variable. It also responsible for managing everything in the cluster and scheduling requests (GlobalScheduler). Most optimizations and scheduling strategies in Parrot are implemented in ServeCore.

Each Parrot Engine runs a single LLM model and communicates with ServeCore by contextual Fill/Gen APIs. Note that:

  • Engine server is independent: Each Engine is capable of providing language model services independently. Each Engine has its own scheduler (We call it LocalScheduler) to perform common techniques like Continous Batching. And there are also many kernel-level optimizations (e.g. PagedAttention, Sharing Prompts) in our builtin engine implementation.

  • Engine is an abstraction: Any server which can implement our internal Engine APIs can be registered as an Engine in Parrot, therefore the system is horizontally scalable and many types of Engines can be integrated into Parrot (e.g., vLLM, FasterTransformer, etc.) easily.

    For example, you can use a distributed serving mechanism (like tensor parallelism) in a single multi-GPU machine or multi machines, expose a single HTTP server w/ our Engine APIs and register it as a Engine.

The following picture illustrates the overview architecture of Parrot. Please refer our OSDI'24 paper Parrot: Efficient Serving of LLM-based Applications with Semantic Variable for more details.

Code Structure

The code of Parrot is organized basically by the above three-layer architecture.

parrot/
    frontend/
        pfunc/ # PFunc frontend
    serve/ # Serve Layer
    engine/ # Engine Layer
    protocol/ # Common Protocols & APIs
    utils/ # Utilities (logging, async, recycle pool, ...)
    testing/ # Test related tools