How to OCR PDF
Optical Character Recognition (OCR) allows you to extract editable text from images or scanned PDF documents. With Tenorshare PDNob, you can easily perform OCR on PDF files to make them searchable and editable. This guide provides a straightforward, step-by-step tutorial on how to install the OCR feature and use it on your PDFs.
Video Guide on How to OCR a PDF Document
1. What Is OCR
OCR (Optical Character Recognition) is a technology that converts scanned images or image-based PDF files into editable and searchable text. It's especially useful when dealing with scanned documents or printed materials in digital form.
2. Download and Install OCR
- Launch the Tenorshare PDNob software on your computer.
- On the main interface, click the "OCR PDF" button.
- Select the desired files from your computer, then click "Open" to load them into PDNob.
- Click the "Download" button to download the OCR functionality if prompted.
3. How to OCR PDFs
- Once the OCR feature is installed, click the "OCR" button on the top toolbar to initiate the OCR process.
- A settings window will appear. Choose the OCR mode based on your needs:
-
Scan to Editable Text
Converts scanned text into fully editable content. After OCR, you can edit, move, copy, or delete text in the PDF like a regular document. -
Scan to Searchable Text in Image
Makes text searchable and selectable while keeping the original scanned image unchanged. Ideal for text lookup and indexing, without editing. - After selecting the appropriate option, click "Perform OCR" to begin text recognition.
- After OCR is complete, the recognized text becomes searchable and copyable. When "Scan to Editable Text" is selected, you can edit the text directly in the PDF.
4. OCR Advanced Settings (Supported in PDNob 2.0)
PDNob 2.0 introduces OCR Advanced Settings, allowing users to optimize recognition results. Enabling these settings may improve recognition accuracy but could slow down OCR processing speed.
To access these options, open the OCR settings panel and click Advanced Settings. From there, you can enable or disable specific image processing and text detection features. Refer to the screenshot below for the exact location of this entry.
Learn more about these settings in this part.
- Auto Crop Page – Automatically detects the valid content area in the image and trims unnecessary edges, making the page more compact while improving OCR recognition accuracy.
- Auto Deskew Page – Automatically corrects tilted pages based on the text orientation in the image, aligning text horizontally and enhancing OCR recognition precision.
- Enhance Local Contrast – Enhances contrast and sharpness in local areas of the image to improve clarity of blurry text and increase OCR success rate. This may slightly affect the original colors.
- Remove Dark Spots – Automatically detects and removes small dark noise spots in the image, making the page cleaner and improving the neatness of OCR results.
- Remove Noise – Filters out white specks and other noise in the image, reducing interference and clarifying text edges to improve OCR recognition quality.
- Detect Text on Pictures – When enabled, OCR will also recognize and extract text from image regions, capturing textual content embedded in pictures.