UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

PDF OCR — Make Scanned PDFs Searchable

Run OCR on a scanned PDF in your browser and download a searchable copy

About PDF OCR — Make Scanned PDFs Searchable

PDF OCR converts a scanned PDF — pages stored as flat images with no underlying text — into a searchable, copy-able document by running optical character recognition directly in your browser. Drop your file, pick which pages to process, choose a recognition language (English, French, German, Spanish, Portuguese, Italian, or Dutch), and the tool launches Tesseract.js in a worker and reads each page one at a time, showing per-page progress and confidence scores. The recognised text is then layered behind every page using pdf-lib, producing a brand-new PDF that looks identical to the original but lets you Ctrl+F to search, copy lines into other documents, and feed the file to screen readers. Because everything runs locally, sensitive scans of contracts, receipts, medical letters, and ID cards never leave your device.

Why use PDF OCR — Make Scanned PDFs Searchable

Searchable Output

The downloaded PDF keeps the original page imagery and adds an invisible text layer, so Ctrl+F and copy-paste work in Acrobat, Preview, and every browser.

100% Browser Processing

Tesseract.js and pdf-lib run locally in a worker — your scans never upload to a server, which matters for contracts, IDs, and medical paperwork.

Multi-Language Recognition

Switch between English, French, German, Spanish, Portuguese, Italian, and Dutch with one dropdown — the right language model dramatically improves accuracy.

Page Picker

Process only the pages that matter — handy for 200-page document dumps where you only need text from chapters 3 and 7.

Per-Page Confidence

Each page reports a confidence score so you can spot pages that need a sharper rescan before relying on the OCR text.

Free with No Watermark

Convert as many scans as your laptop can chew through — no daily quota, no signup, no branded output.

How to use PDF OCR — Make Scanned PDFs Searchable

  1. Drop or select a scanned PDF — the tool reads its page count and shows a thumbnail picker.
  2. Pick which pages to OCR (default: all pages) — useful if you only need text from a few key pages.
  3. Choose the document's primary language so Tesseract loads the matching trained data file.
  4. Click Run OCR and watch the per-page progress bar — each page reports a confidence percentage as it finishes.
  5. Review the extracted text in the side panel and copy any snippet to the clipboard if needed.
  6. Click Download Searchable PDF to save a new copy with an invisible text layer behind every page.

When to use PDF OCR — Make Scanned PDFs Searchable

  • When a scanned contract or PDF arrives by email and you need to search for clauses or names without retyping.
  • When archiving paper receipts and tax slips into a searchable PDF library before tossing the originals.
  • When making old academic books or articles accessible to screen readers and full-text search.
  • When preparing legal discovery materials so attorneys can grep for terms across thousands of scanned pages.
  • When digitising handwritten notes or printed meeting minutes so they can be indexed by note-taking apps.
  • When you receive a scanned ID or passport copy that needs to be parsed without uploading the file to a third-party service.

Examples

Searchable scanned contract

Input: scan-12pp.pdf (12 image-only pages, English), all pages selected

Output: scan-12pp-ocr.pdf (12 pages, identical look, full-text searchable in Acrobat)

Selective OCR on a long report

Input: annual-report.pdf (180 pages), pages 5, 22-24, 110 selected, language: English

Output: annual-report-ocr.pdf with text layer added only on the 5 selected pages

French invoice batch

Input: factures-q1.pdf (8 pages), language: French

Output: factures-q1-ocr.pdf with French diacritics correctly recognised and searchable

Tips

  • Higher-resolution scans (≥ 300 DPI) produce dramatically better OCR results — rescan low-quality faxes if you can.
  • If a page mixes two languages, pick the one with more text — Tesseract handles single-language documents most accurately.
  • Crop or deskew the original before running OCR; even 2–3° of skew tanks recognition accuracy.
  • Process big PDFs in batches of 20–30 pages at a time on mobile devices to avoid running out of memory.
  • After downloading, verify by searching for a phrase you know is on page 1 — if it doesn't match, the language or scan quality probably needs adjusting.

Frequently Asked Questions

Does the file upload anywhere?
No. Tesseract.js, pdf.js, and pdf-lib all run in your browser. The PDF and recognised text never leave your device.
Why is the first page slow?
Tesseract has to download the trained data file for the chosen language (~10–20 MB) on first use. Subsequent pages are much faster — the model stays cached for the session.
What languages are supported?
English, French, German, Spanish, Portuguese, Italian, and Dutch are available in the picker. They cover the vast majority of Latin-script use cases.
Will the output PDF look exactly like the input?
Yes — the original page images are kept as-is and a hidden text layer is added behind them. Visually the file is identical.
How accurate is the OCR?
On clean 300 DPI scans, expect 95%+ accuracy on common Latin-script languages. Faxes, handwriting, and skewed scans drop the accuracy substantially.
Can I OCR a password-protected PDF?
Not directly — unlock the file first with the PDF Unlock tool, then run it through PDF OCR.
What's the file-size limit?
There's no hard limit, but practical browser memory tops out around 200–300 MB on desktop and far less on mobile. Use the page picker for very large PDFs.
Does it handle handwriting?
Tesseract was trained on printed text and is unreliable for handwriting. For handwritten notes consider a specialised handwriting OCR service.

Explore the category

Glossary

OCR
Optical Character Recognition — the process of converting page images into machine-readable text.
Searchable PDF
A PDF that retains its original visual layout but also embeds a text layer so the contents can be searched, copied, and read aloud.
Text Layer
An invisible layer behind a PDF's images that holds the recognised text and its on-page coordinates.
Tesseract.js
A JavaScript port of Google's Tesseract OCR engine that runs entirely in the browser via WebAssembly.
Confidence Score
A percentage Tesseract assigns to each recognised page indicating how certain it is of the read.
DPI
Dots per inch — the resolution of a scanned image; ≥ 300 DPI is the practical floor for reliable OCR.