mi

microsoft/markitdown

๐ŸŽ–๏ธ ๐Ÿ ๐Ÿ  MCP tool access to MarkItDown -a library that converts many file formats (local or remote) to Markdown for LLM consumption.

#markdown#file-conversion#LLM
Publishermicrosoft/markitdown
Submitted date4/19/2025

is%3Aopen+label%3A%22open+for+contribution%22) | | PRs | All PRs | PRs open for reviewing |

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Overview of MarkItDown

Title

MarkItDown โ€“ A lightweight Python utility for converting various files to Markdown for use with LLMs and text analysis pipelines.

How to Use

Command-Line

markitdown path-to-file.pdf > document.md

or specify output file:

markitdown path-to-file.pdf -o document.md

Piping content is also supported:

cat path-to-file.pdf | markitdown

Python API

from markitdown import MarkItDown md = MarkItDown(enable_plugins=False) result = md.convert("test.xlsx") print(result.text_content)

Docker

docker build -t markitdown:latest . docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

Key Features

  • Wide Format Support: Converts PDF, PowerPoint, Word, Excel, images, audio, HTML, text-based formats, ZIP files, YouTube URLs, and more to Markdown.
  • Optional Dependencies: Install only the dependencies you need (e.g., pip install 'markitdown[pdf, docx, pptx]').
  • Plugins: Supports 3rd-party plugins for extended functionality.
  • Azure Document Intelligence Integration: Enhanced conversion using Microsoft Document Intelligence.
  • LLM Integration: Optionally uses Large Language Models for image descriptions.
  • Lightweight and Efficient: Focuses on preserving document structure for text analysis tools.

Use Cases

  • LLM Applications: Convert documents to Markdown for seamless integration with LLMs like GPT-4o.
  • Text Analysis Pipelines: Process various file formats into a consistent Markdown format for analysis.
  • Automated Documentation: Generate Markdown from diverse sources for documentation purposes.
  • Content Processing: Extract and structure content from PDFs, presentations, spreadsheets, and more.

Installation

Install via pip:

pip install 'markitdown[all]'

or from source:

git clone [email protected]:microsoft/markitdown.git cd markitdown pip install -e 'packages/markitdown[all]'

Visit More

View All