Skip to content

Docling

Docling DS4SD%2Fdocling | Trendshift

arXiv PyPI version Python Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT

Docling parses documents and exports them to the desired format with ease and speed.

Features

  • πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
  • πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
  • 🧩 Unified, expressive DoclingDocument representation format
  • πŸ€– Easy integration with LlamaIndex πŸ¦™ & LangChain πŸ¦œπŸ”— for powerful RAG / QA applications
  • πŸ” OCR support for scanned PDFs
  • πŸ’» Simple and convenient CLI

Coming soon

  • ♾️ Equation & code extraction
  • πŸ“ Metadata extraction, including title, authors, references & language
  • πŸ¦œπŸ”— Native LangChain extension

IBM ❀️ Open Source AI

Docling has been brought to you by IBM.