Skip to content

Latest commit

 

History

History

python-doc-to-markdown-converter

Convert documents to markdown using MarkItDown and Trigger.dev

This demo showcases how to use Trigger.dev with Python to convert documents to markdown using Microsoft's MarkItDown library. This can be especially useful when converting documents to markdown for use in AI applications.

Prerequisites

  • A Trigger.dev account and project set up
  • Python installed on your machine. This example requires Python 3.10 or higher.

Features

  • A Trigger.dev task which downloads a document from a URL and runs the Python script to convert it to Markdown
  • A Python script to convert documents to Markdown using Microsoft's MarkItDown library
  • Uses the Trigger.dev Python build extension to install dependencies and run Python scripts

Getting Started

| This demo is using Trigger.dev v4 which is currently in beta (as of 28/04/2025), you will need to install the v4 CLI to run this project. Upgrade v4 docs here

  1. After cloning the repo, run npm install to install the dependencies.
  2. Run the Trigger.dev CLI init command to initialize the project.
  3. Create a virtual environment python -m venv venv
  4. Activate the virtual environment, depending on your OS: On Mac/Linux: source venv/bin/activate, on Windows: venv\Scripts\activate
  5. Install the Python dependencies pip install -r requirements.txt. Make sure you have Python 3.10 or higher installed.
  6. Copy the project ref from your Trigger.dev dashboard and add it to the trigger.config.ts file.
  7. In a new terminal, run the Trigger.dev CLI dev command (it may ask you to authorize the CLI if you haven't already).
  8. Test the task in the dashboard by providing valid payloads.
  9. Deploy the task to production using the Trigger.dev CLI deploy command.

Relevant Code

Learn more

MarkItDown Conversion Capabilities

  • Convert various file formats to Markdown:
    • Office formats (Word, PowerPoint, Excel)
    • PDFs
    • Images (with optional LLM-generated descriptions)
    • HTML, CSV, JSON, XML
    • Audio files (with optional transcription)
    • ZIP archives
    • And more
  • Preserve document structure (headings, lists, tables, etc.)
  • Handle multiple input methods (file paths, URLs, base64 data)
  • Optional Azure Document Intelligence integration for better PDF and image conversion