This demo showcases how to use Trigger.dev with Python to convert documents to markdown using Microsoft's MarkItDown library. This can be especially useful when converting documents to markdown for use in AI applications.
- A Trigger.dev account and project set up
- Python installed on your machine. This example requires Python 3.10 or higher.
- A Trigger.dev task which downloads a document from a URL and runs the Python script to convert it to Markdown
- A Python script to convert documents to Markdown using Microsoft's MarkItDown library
- Uses the Trigger.dev Python build extension to install dependencies and run Python scripts
| This demo is using Trigger.dev v4 which is currently in beta (as of 28/04/2025), you will need to install the v4 CLI to run this project. Upgrade v4 docs here
- After cloning the repo, run
npm install
to install the dependencies. - Run the Trigger.dev CLI init command to initialize the project.
- Create a virtual environment
python -m venv venv
- Activate the virtual environment, depending on your OS: On Mac/Linux:
source venv/bin/activate
, on Windows:venv\Scripts\activate
- Install the Python dependencies
pip install -r requirements.txt
. Make sure you have Python 3.10 or higher installed. - Copy the project ref from your Trigger.dev dashboard and add it to the
trigger.config.ts
file. - In a new terminal, run the Trigger.dev CLI dev command (it may ask you to authorize the CLI if you haven't already).
- Test the task in the dashboard by providing valid payloads.
- Deploy the task to production using the Trigger.dev CLI deploy command.
- convertToMarkdown.ts defines the Trigger.dev task which orchestrates the document conversion
- markdown-converter.py contains the Python code for converting documents to Markdown
- trigger.config.ts uses the Trigger.dev Python extension to install the dependencies and run the script
- Learn more about the Trigger.dev Python extension
- Learn more about MarkItDown
- Convert various file formats to Markdown:
- Office formats (Word, PowerPoint, Excel)
- PDFs
- Images (with optional LLM-generated descriptions)
- HTML, CSV, JSON, XML
- Audio files (with optional transcription)
- ZIP archives
- And more
- Preserve document structure (headings, lists, tables, etc.)
- Handle multiple input methods (file paths, URLs, base64 data)
- Optional Azure Document Intelligence integration for better PDF and image conversion