OCR PDF — Text from Scans, in Your Browser

Tesseract.js · WASM · runs locally · no upload

Drop a scanned PDF and Tesseract runs entirely in your browser to recognise the text on each page. Choose plain text output, or a "searchable PDF" where the original page image stays visible but selectable text sits behind it. OCR is slow — expect tens of seconds per page on a laptop.

Drop your scanned PDF here

Multiple files allowed · OCR runs locally (Tesseract.js)

Output Language DPI for OCR

no files

Ready.

When to use this tool

Scanned receipts, old book pages, contracts you only have on paper — anything whose "text" is actually pixels. OCR adds a real text layer you can copy, search, or feed into another tool. The "searchable PDF" mode is best when you need the document to look unchanged but want Ctrl-F to work.

Step by step

Drop the PDFs. Each page will be rasterised, then OCR'd.
Pick output. Plain text is fastest. Searchable PDF preserves the visual page.
Pick a language. If your document is in two languages, pick the dominant one — Tesseract still recognises Latin characters for the other.
Click "Run OCR". Expect tens of seconds per page. Status updates show which page is being processed.

FAQ

Why so slow?

OCR is genuinely expensive computationally, and we run it locally in WebAssembly so your files stay private. Server-side OCR services are faster but require uploading your PDF.

How accurate is it?

Good for clean 300 dpi scans (95%+). Poor for low-resolution, handwritten, or skewed input. For best results, OCR clean scans at 200–300 dpi.

Are my files uploaded?

Never — Tesseract.js runs in a Web Worker inside your tab. The model data is fetched once from a CDN; your PDF stays local. See the privacy policy.