Docling
Docling parses documents and exports them to the desired format with ease and speed.
Features
- ποΈ Reads popular document formats (PDF, DOCX, PPTX, XLSX, Images, HTML, AsciiDoc & Markdown) and exports to HTML, Markdown and JSON (with embedded and referenced images)
- π Advanced PDF document understanding incl. page layout, reading order & table structures
- 𧩠Unified, expressive DoclingDocument representation format
- π€ Easy integration with π¦ LlamaIndex & π¦π LangChain for powerful RAG / QA applications
- π OCR support for scanned PDFs
- π» Simple and convenient CLI
Coming soon
- βΎοΈ Equation & code extraction
- π Metadata extraction, including title, authors, references & language
- π¦π Native LangChain extension
IBM β€οΈ Open Source AI
Docling has been brought to you by IBM.