Skip to content

Docling

Docling DS4SD%2Fdocling | Trendshift

arXiv PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT PyPI Downloads

Docling parses documents and exports them to the desired format with ease and speed.

Features

  • πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
  • πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
  • 🧩 Unified, expressive DoclingDocument representation format
  • πŸ€– Plug-and-play integrations incl. LangChain, LlamaIndex, Crew AI & Haystack for agentic AI
  • πŸ” OCR support for scanned PDFs
  • πŸ’» Simple and convenient CLI

Coming soon

  • ♾️ Equation & code extraction
  • πŸ“ Metadata extraction, including title, authors, references & language

Get started

IBM ❀️ Open Source AI

Docling has been brought to you by IBM.