QuivrHQ · devin-ai-integration · Dec 22, 2024 · Dec 22, 2024 · Dec 22, 2024 · Dec 22, 2024
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -0,0 +1,186 @@
+# MegaParse Architecture
+
+This document provides a comprehensive overview of the MegaParse system architecture, including component relationships, data flow, and core implementation details.
+
+## System Components
+
+### 1. Core Parser Library (megaparse)
+
+The core library provides the fundamental parsing capabilities:
+
+```
+libs/megaparse/
+├── src/megaparse/
+│   ├── parser/           # Parser implementations
+│   │   ├── base.py      # Abstract base parser
+│   │   ├── unstructured_parser.py
+│   │   ├── megaparse_vision.py
+│   │   ├── llama.py
+│   │   └── doctr_parser.py
+│   ├── api/             # FastAPI application
+│   │   └── app.py       # API endpoints
+│   └── checker/         # Format utilities
+```
+
+### 2. Client SDK (megaparse_sdk)
+
+The SDK provides a high-level interface for API interaction:
+
+```
+libs/megaparse_sdk/
+├── src/megaparse_sdk/
+│   ├── client/          # API client implementation
+│   └── schema/          # Data models and configurations
+```
+
+### 3. FastAPI Interface
+
+The API layer exposes parsing capabilities as HTTP endpoints:
+
+- `/v1/file`: File upload and parsing
+- `/v1/url`: URL content extraction and parsing
+- `/healthz`: Health check endpoint
+
+## Data Flow
+
+1. **Document Input**
+   ```
+   Client → SDK → API → Parser Library
+   ```
+   - Client submits document through SDK
+   - SDK validates and sends to API
+   - API routes to appropriate parser
+   - Parser processes and returns results
+
+2. **Parser Selection**
+   ```
+   Input → Strategy Selection → Parser Assignment → Processing
+   ```
+   - Input type determines available strategies
+   - Strategy influences parser selection
+   - Parser processes according to strategy
+
+## Core Classes and Flow
+
+### MegaParse Class
+
+The central orchestrator managing the parsing workflow:
+
+```python
+class MegaParse:
+    def __init__(self, parser: BaseParser):
+        self.parser = parser
+
+    def load(self, file_path: str, strategy: StrategyEnum = StrategyEnum.AUTO) -> str:
+        # 1. Validate input
+        # 2. Select strategy
+        # 3. Process document
+        # 4. Format output
+```
+
+### Parser Hierarchy
+
+```
+BaseParser (Abstract)
+├── UnstructuredParser
+│   └── Basic document parsing
+├── MegaParseVision
+│   └── AI-powered parsing (GPT-4V)
+├── LlamaParser
+│   └── Enhanced PDF parsing
+└── DoctrParser
+    └── OCR-based parsing
+```
+
+### Strategy Selection
+
+The `StrategyEnum` determines parsing behavior:
+
+- `AUTO`: Automatic strategy selection based on input
+- `FAST`: Optimized for speed (simple documents)
+- `HI_RES`: Maximum accuracy (complex documents)
+
+## Implementation Details
+
+### Parser Selection Logic
+
+1. **Input Analysis**
+   - File type detection
+   - Content complexity assessment
+   - Available parser evaluation
+
+2. **Strategy Application**
+   - AUTO: Selects optimal parser
+   - FAST: Prioritizes UnstructuredParser
+   - HI_RES: Prefers MegaParseVision/LlamaParser
+
+### Error Handling
+
+The system implements multiple error handling layers:
+
+1. **SDK Level**
+   - Input validation
+   - Connection error handling
+   - Rate limiting management
+
+2. **API Level**
+   - Request validation
+   - Authentication
+   - Resource management
+
+3. **Parser Level**
+   - Format-specific error handling
+   - Processing error recovery
+   - Output validation
+
+## Deployment Architecture
+
+### Docker Support
+
+Two deployment options:
+
+1. **Standard Image**
+   ```yaml
+   # Basic parsing capabilities
+   docker compose up
+   ```
+
+2. **GPU-Enabled Image**
+   ```yaml
+   # Enhanced processing with GPU support
+   docker compose -f docker-compose.gpu.yml up
+   ```
+
+### API Server
+
+- FastAPI application
+- Uvicorn ASGI server
+- Interactive documentation at `/docs`
+- Health monitoring at `/healthz`
+
+## Extension Points
+
+### Custom Parser Implementation
+
+Extend `BaseParser` for custom parsing logic:
+
+```python
+class CustomParser(BaseParser):
+    def convert(self, file_path: str, strategy: StrategyEnum) -> str:
+        # Custom implementation
+        pass
+
+    async def aconvert(self, file_path: str, strategy: StrategyEnum) -> str:
+        # Async implementation
+        pass
+```
+
+### Strategy Customization
+
+Create custom strategies by extending `StrategyEnum`:
+
+```python
+class CustomStrategy(StrategyEnum):
+    CUSTOM = "custom"
+    # Define behavior in parser implementation
+```
diff --git a/README.md b/README.md
@@ -6,6 +6,46 @@
 
 MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.
 
+## Quick Start Guide 🚀
+
+1. **Prerequisites**
+   - Python >= 3.11
+   - Poppler (for PDF support)
+   - Tesseract (for OCR support)
+   - libmagic (for file type detection)
+
+2. **Installation**
+   ```bash
+   # Install system dependencies (Ubuntu/Debian)
+   sudo apt-get update
+   sudo apt-get install -y poppler-utils tesseract-ocr libmagic1
+
+   # Install system dependencies (macOS)
+   brew install poppler tesseract libmagic
+
+   # Install MegaParse
+   pip install megaparse
+   ```
+
+3. **Environment Setup**
+   ```bash
+   # Create a .env file with your API keys
+   OPENAI_API_KEY=your_openai_key  # Required for MegaParseVision
+   LLAMA_CLOUD_API_KEY=your_llama_key  # Optional, for LlamaParser
+   ```
+
+## Project Architecture 🏗️
+
+MegaParse is organized into two main components:
+
+- **megaparse**: Core parsing library with multiple parsing strategies
+  - UnstructuredParser: Basic document parsing
+  - MegaParseVision: Advanced parsing with GPT-4V
+  - LlamaParser: Enhanced PDF parsing using LlamaIndex
+  - DoctrParser: OCR-based parsing
+
+- **megaparse_sdk**: Client SDK for interacting with the MegaParse API
+
 ## Key Features 🎯
 
 - **Versatile Parser**: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
@@ -23,62 +63,87 @@ MegaParse is a powerful and versatile parser that can handle various types of do
 
 https://github.com/QuivrHQ/MegaParse/assets/19614572/1b4cdb73-8dc2-44ef-b8b4-a7509bc8d4f3
 
-## Installation
-
-required python version >= 3.11
-
-```bash
-pip install megaparse
-```
-
-## Usage
-
-1. Add your OpenAI or Anthropic API key to the .env file
-
-2. Install poppler on your computer (images and PDFs)
-
-3. Install tesseract on your computer (images and PDFs)
-
-4. If you have a mac, you also need to install libmagic ```brew install libmagic```
+## Usage Examples 💡
 
+### Basic Usage with UnstructuredParser
+The UnstructuredParser is the default parser that works with most document types without requiring additional API keys:
 
 ```python
 from megaparse import MegaParse
-from langchain_openai import ChatOpenAI
 from megaparse.parser.unstructured_parser import UnstructuredParser
 
+# Initialize the parser
 parser = UnstructuredParser()
 megaparse = MegaParse(parser)
-response = megaparse.load("./test.pdf")
+
+# Parse a document
+response = megaparse.load("./document.pdf")
 print(response)
-megaparse.save("./test.md")
-```
 
-### Use MegaParse Vision
+# Save the parsed content as markdown
+megaparse.save("./output.md")
+```
 
-* Change the parser to MegaParseVision
+### Advanced Usage with MegaParseVision
+MegaParseVision uses advanced AI models for improved parsing accuracy:
 
 ```python
 from megaparse import MegaParse
 from langchain_openai import ChatOpenAI
 from megaparse.parser.megaparse_vision import MegaParseVision
 
-model = ChatOpenAI(model="gpt-4o", api_key=os.getenv("OPENAI_API_KEY"))  # type: ignore
+# Initialize with GPT-4V
+model = ChatOpenAI(model="gpt-4v", api_key=os.getenv("OPENAI_API_KEY"))
 parser = MegaParseVision(model=model)
 megaparse = MegaParse(parser)
-response = megaparse.load("./test.pdf")
+
+# Parse with advanced features
+response = megaparse.load("./complex_document.pdf")
 print(response)
-megaparse.save("./test.md")
+megaparse.save("./output.md")
+```
+
+**Supported Models**: MegaParseVision works with multimodal models:
+- OpenAI: GPT-4V
+- Anthropic: Claude 3 Opus, Claude 3 Sonnet
+- Custom models (implement the BaseModel interface)
+
+### Parsing Strategies
+MegaParse supports different parsing strategies to balance speed and accuracy:
 
+- **AUTO**: Automatically selects the best strategy based on document type
+- **FAST**: Optimized for speed, best for simple documents
+- **HI_RES**: Maximum accuracy, recommended for complex documents
+
+```python
+from megaparse.parser.strategy import StrategyEnum
+
+# Use high-resolution parsing
+response = megaparse.load("./complex_document.pdf", strategy=StrategyEnum.HI_RES)
 ```
-**Note**: The model supported by MegaParse Vision are the multimodal ones such as claude 3.5, claude 4, gpt-4o and gpt-4.
 
-## Use as an API
-There is a MakeFile for you, simply use :
-```make dev```
-at the root of the project and you are good to go.
+## Running the API Server 🌐
+
+### Using Docker (Recommended)
+```bash
+# Build and start the API server
+docker compose build
+docker compose up
+
+# For GPU support
+docker compose -f docker-compose.gpu.yml up
+```
+
+### Manual Setup
+```bash
+# Install dependencies using UV (recommended)
+UV_INDEX_STRATEGY=unsafe-first-match uv pip sync
+
+# Start the API server
+uv pip run uvicorn megaparse.api.app:app
+```
 
-See localhost:8000/docs for more info on the different endpoints !
+The API will be available at http://localhost:8000 with interactive documentation at http://localhost:8000/docs
 
 ## BenchMark