Docling
Docling parses documents and exports them to the desired format with ease and speed.
Features
- ποΈ Reads popular document formats (PDF, DOCX, PPTX, Images, HTML, AsciiDoc, Markdown) and exports to Markdown and JSON
- π Advanced PDF document understanding incl. page layout, reading order & table structures
- 𧩠Unified, expressive DoclingDocument representation format
- π€ Easy integration with LlamaIndex π¦ & LangChain π¦π for powerful RAG / QA applications
- π OCR support for scanned PDFs
- π» Simple and convenient CLI
Coming soon
- βΎοΈ Equation & code extraction
- π Metadata extraction, including title, authors, references & language
- π¦π Native LangChain extension
IBM β€οΈ Open Source AI
Docling has been brought to you by IBM.