Skip to content

A Rust library designed to facilitate the conversion of various document formats into markdown text.

License

Notifications You must be signed in to change notification settings

uhobnil/markitdown-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

markitdown-rs

markitdown-rs is a Rust library designed to facilitate the conversion of various document formats into markdown text. It is a Rust implementation of the original markitdown Python library.

Features

It supports:

  • Excel(.xlsx)
  • Word(.docx)
  • PowerPoint
  • PDF
  • Images
  • Audio
  • HTML
  • CSV(UTF-8)
  • Text-based formats (.xml, .rss, .atom)
  • ZIP

Usage

Command-Line

Installation

cargo install markitdown

Convert a File

markitdown path-to-file.pdf

Or use -o to specify the output file:

markitdown path-to-file.pdf -o document.md

Rust API

Installation

Add the following to your Cargo.toml:

[dependencies]
markitdown = "0.1.10"

Initialize MarkItDown

use markitdown::MarkItDown;

let mut md = MarkItDown::new();

Convert a File

use markitdown::{ConversionOptions, DocumentConverterResult, MarkItDown};

// Basic conversion - file type is auto-detected
let result = md.convert("path/to/file.xlsx", None)?;

// Or explicitly specify options
let options = ConversionOptions {
    file_extension: Some(".xlsx".to_string()),
    url: None,
    llm_client: None,
    llm_model: None,
};

let result = md.convert("path/to/file.xlsx", Some(options))?;

// To use Large Language Models for image descriptions
let options = ConversionOptions {
    file_extension: Some(".jpg".to_string()),
    url: None,
    llm_client: Some("gemini".to_string()),
    llm_model: Some("gemini-2.0-flash".to_string()),
};

let result = md.convert("path/to/file.jpg", Some(options))?;

if let Some(conversion_result) = result {
    println!("Converted Text: {}", conversion_result.text_content);
} else {
    println!("Conversion failed or unsupported file type.");
}

Convert from Bytes

use markitdown::{ConversionOptions, MarkItDown};

let file_bytes = std::fs::read("path/to/file.pdf")?;

// Auto-detect file type from bytes
let result = md.convert_bytes(&file_bytes, None)?;

// Or specify options explicitly
let options = ConversionOptions {
    file_extension: Some(".pdf".to_string()),
    url: None,
    llm_client: None,
    llm_model: None,
};

let result = md.convert_bytes(&file_bytes, Some(options))?;

if let Some(conversion_result) = result {
    println!("Converted Text: {}", conversion_result.text_content);
}

Register a Custom Converter

You can extend MarkItDown by implementing the DocumentConverter trait for your custom converters and registering them:

use markitdown::{DocumentConverter, DocumentConverterResult, ConversionOptions, MarkItDown};
use markitdown::error::MarkitdownError;

struct MyCustomConverter;

impl DocumentConverter for MyCustomConverter {
    fn convert(
        &self,
        local_path: &str,
        args: Option<ConversionOptions>,
    ) -> Result<DocumentConverterResult, MarkitdownError> {
        // Implement file conversion logic
        todo!()
    }

    fn convert_bytes(
        &self,
        bytes: &[u8],
        args: Option<ConversionOptions>,
    ) -> Result<DocumentConverterResult, MarkitdownError> {
        // Implement bytes conversion logic
        todo!()
    }
}

let mut md = MarkItDown::new();
md.register_converter(Box::new(MyCustomConverter));

License

MarkItDown is licensed under the MIT License. See LICENSE for more details.

About

A Rust library designed to facilitate the conversion of various document formats into markdown text.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages