Cleanup Scans / OCR (Optical Character Recognition)

The OCR/Scan Cleanup feature allows users to perform optical character recognition (OCR) processing on scanned PDF documents to extract text content and optimize document readability. Users can select languages and OCR modes and set additional OCR options such as correcting skewed scan angles, cleaning pages, and forcing OCR. This is useful for users who need to extract text content from scanned documents or optimize document readability. For example, in archive management, users may need to extract text content from old scanned documents for archiving and auditing; in academic research, users may need to extract text content from scanned papers or reports for referencing and analysis. Additionally, this feature can be used to optimize scanned document visual effects for better content display and use. By using the OCR/scan cleanup feature, users can ensure document readability and text content extraction and easily manage and distribute them.

This service uses OCRmyPDF and Tesseract for OCR.

Please read this documentation on how to use this for other languages and/or use not in docker