Skip to content

CLI reference

This page provides documentation for our command line tools.

docling

Usage:

docling [OPTIONS] source

Options:

Name Type Description Default
--from choice (docx | pptx | html | image | pdf | asciidoc | md | xlsx) Specify input formats to convert from. Defaults to all formats. None
--to choice (md | json | html | text | doctags) Specify output formats. Defaults to Markdown. None
--image-export-mode choice (placeholder | embedded | referenced) Image export mode for the document (only in case of JSON, Markdown or HTML). With placeholder, only the position of the image is marked in the output. In embedded mode, the image is embedded as base64 encoded string. In referenced mode, the image is exported in PNG format and referenced from the main exported document. ImageRefMode.EMBEDDED
--ocr / --no-ocr boolean If enabled, the bitmap content will be processed using OCR. True
--force-ocr / --no-force-ocr boolean Replace any existing text with OCR generated text over the full content. False
--ocr-engine choice (easyocr | tesseract_cli | tesseract | ocrmac | rapidocr) The OCR engine to use. OcrEngine.EASYOCR
--ocr-lang text Provide a comma-separated list of languages used by the OCR engine. Note that each OCR engine has different values for the language names. None
--pdf-backend choice (pypdfium2 | dlparse_v1 | dlparse_v2) The PDF backend to use. PdfBackend.DLPARSE_V2
--table-mode choice (fast | accurate) The mode to use in the table structure model. TableFormerMode.FAST
--artifacts-path path If provided, the location of the model artifacts. None
--abort-on-error / --no-abort-on-error boolean If enabled, the bitmap content will be processed using OCR. False
--output path Output directory where results are saved. .
--verbose, -v integer Set the verbosity level. -v for info logging, -vv for debug logging. 0
--debug-visualize-cells / --no-debug-visualize-cells boolean Enable debug output which visualizes the PDF cells False
--debug-visualize-ocr / --no-debug-visualize-ocr boolean Enable debug output which visualizes the OCR cells False
--debug-visualize-layout / --no-debug-visualize-layout boolean Enable debug output which visualizes the layour clusters False
--debug-visualize-tables / --no-debug-visualize-tables boolean Enable debug output which visualizes the table cells False
--version boolean Show version information. None
--document-timeout float The timeout for processing each document, in seconds. None
--help boolean Show this message and exit. False