GitHub - xo/usql: Universal command-line interface for SQL databases
Universal command-line interface for SQL databases - xo/usql
MarkItDown is an open-source Python tool by Microsoft that converts multiple file types—including PDFs, Office documents, images (with OCR and EXIF), audio (with transcription), HTML, and archives—into Markdown format.
The tool uses a modular architecture with registered converters for each format and supports optional dependencies for activating specific file types. Its command-line interface and API are designed for integration with LLMs and text analysis pipelines, and it offers extensibility through plugins and Docker support.
PDFs require pre-extraction for text, as MarkItDown lacks built-in OCR for PDF documents.