CLI reference
This page provides documentation for our command line tools.
docling
Usage:
docling [OPTIONS] source
Options:
Name | Type | Description | Default |
---|---|---|---|
--from |
choice (docx | pptx | html | xml_pubmed | image | pdf | asciidoc | md | xlsx | xml_uspto ) |
Specify input formats to convert from. Defaults to all formats. | None |
--to |
choice (md | json | html | text | doctags ) |
Specify output formats. Defaults to Markdown. | None |
--headers |
text | Specify http request headers used when fetching url input sources in the form of a JSON string | None |
--image-export-mode |
choice (placeholder | embedded | referenced ) |
Image export mode for the document (only in case of JSON, Markdown or HTML). With placeholder , only the position of the image is marked in the output. In embedded mode, the image is embedded as base64 encoded string. In referenced mode, the image is exported in PNG format and referenced from the main exported document. |
ImageRefMode.EMBEDDED |
--ocr / --no-ocr |
boolean | If enabled, the bitmap content will be processed using OCR. | True |
--force-ocr / --no-force-ocr |
boolean | Replace any existing text with OCR generated text over the full content. | False |
--ocr-engine |
choice (easyocr | tesseract_cli | tesseract | ocrmac | rapidocr ) |
The OCR engine to use. | OcrEngine.EASYOCR |
--ocr-lang |
text | Provide a comma-separated list of languages used by the OCR engine. Note that each OCR engine has different values for the language names. | None |
--pdf-backend |
choice (pypdfium2 | dlparse_v1 | dlparse_v2 ) |
The PDF backend to use. | PdfBackend.DLPARSE_V2 |
--table-mode |
choice (fast | accurate ) |
The mode to use in the table structure model. | TableFormerMode.FAST |
--artifacts-path |
path | If provided, the location of the model artifacts. | None |
--abort-on-error / --no-abort-on-error |
boolean | If enabled, the bitmap content will be processed using OCR. | False |
--output |
path | Output directory where results are saved. | . |
--verbose , -v |
integer | Set the verbosity level. -v for info logging, -vv for debug logging. | 0 |
--debug-visualize-cells / --no-debug-visualize-cells |
boolean | Enable debug output which visualizes the PDF cells | False |
--debug-visualize-ocr / --no-debug-visualize-ocr |
boolean | Enable debug output which visualizes the OCR cells | False |
--debug-visualize-layout / --no-debug-visualize-layout |
boolean | Enable debug output which visualizes the layour clusters | False |
--debug-visualize-tables / --no-debug-visualize-tables |
boolean | Enable debug output which visualizes the table cells | False |
--version |
boolean | Show version information. | None |
--document-timeout |
float | The timeout for processing each document, in seconds. | None |
--num-threads |
integer | Number of threads | 4 |
--device |
choice (auto | cpu | cuda | mps ) |
Accelerator device | AcceleratorDevice.AUTO |
--help |
boolean | Show this message and exit. | False |