Skip to content

Docling

Docling DS4SD%2Fdocling | Trendshift

arXiv PyPI version PyPI - Python Version Poetry Code style: black Imports: isort Pydantic v2 pre-commit License MIT PyPI Downloads

Docling parses documents and exports them to the desired format with ease and speed.

Features

  • πŸ—‚οΈ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
  • πŸ“‘ Advanced PDF document understanding incl. page layout, reading order & table structures
  • 🧩 Unified, expressive DoclingDocument representation format
  • πŸ€– Easy integration with πŸ¦™ LlamaIndex & πŸ¦œπŸ”— LangChain for powerful RAG / QA applications
  • πŸ” OCR support for scanned PDFs
  • πŸ’» Simple and convenient CLI

Coming soon

  • ♾️ Equation & code extraction
  • πŸ“ Metadata extraction, including title, authors, references & language
  • πŸ¦œπŸ”— Native LangChain extension

IBM ❀️ Open Source AI

Docling has been brought to you by IBM.