UtilityKit

500+ fast, free tools. Most run in your browser only; Image & PDF tools upload files to the backend when you run them.

PDF to Text

Convert any PDF document into a clean .txt file in your browser. Pick page range, preserve line breaks, copy or download.

About PDF to Text

PDF to Text extracts every readable string from your PDF and produces a clean .txt file you can copy or download. It uses Mozilla's pdf.js engine to walk the text layer of each page in your browser — meaning the file never leaves your device — and gives you four useful knobs: a page-range filter so you can extract just chapter 3, a 'preserve line breaks' switch so paragraphs stay readable, a whitespace-trimmer to collapse the messy multi-spaces some exporters leave behind, and a 'page header' toggle to either label each page in the output or merge everything into a single flow. The tool works on text-layer PDFs (those exported from Word, Google Docs, LaTeX, or any normal authoring tool). For scanned image-only PDFs, run them through the PDF OCR tool first — there is no text layer to extract from a pure image.

Why use PDF to Text

Browser-Only

pdf.js runs locally in WebAssembly — your PDF never uploads, never gets logged, never lives on a remote server.

Page Range Filter

Pull just the chapter or section you need instead of dumping the entire book into a text file.

Smart Line-Break Handling

Optional 'preserve EOL' switch keeps paragraphs and lists readable rather than running everything together.

Whitespace Trimming

Collapses double spaces and ragged column gaps that PDF exporters sometimes leave behind.

Page Header Markers

Optional '--- Page N ---' separators make it easy to cite or navigate a long extraction.

Free, No Signup, No Watermark

One-click extraction with copy and download; no daily quota or account needed.

How to use PDF to Text

  1. Drop your PDF onto the upload area or click to browse.
  2. Optional: enter a page range like '1-5,8' to extract only specific pages — leave empty for all pages.
  3. Toggle 'Include page headers' to add or remove '--- Page 1 ---' markers in the output.
  4. Toggle 'Preserve line breaks' to keep paragraph structure (recommended for reading) or merge into one flow.
  5. Toggle 'Trim whitespace' to collapse multiple spaces — useful for cleaning columns from journal articles.
  6. Click 'Extract text' — the result appears in a textarea you can copy or download as a .txt file.

When to use PDF to Text

  • When you need to copy text out of a PDF for an email or chat without manual selection.
  • When extracting an article from a research paper for use in a citation manager or notes app.
  • When converting a PDF eBook into a plain-text file for an e-reader that wants .txt input.
  • When grabbing a transcript or speech body from a published PDF for analysis.
  • When pulling product descriptions from a vendor catalogue PDF for a spreadsheet import.
  • When you need a diff-friendly text version of a contract draft to compare against another version.

Examples

Extract a chapter

Input: book.pdf, range '45-67', preserve breaks on

Output: book-text.txt — pages 45–67 with --- Page N --- markers, paragraph structure intact

Whole document for analysis

Input: report.pdf, range empty, preserve breaks off, trim whitespace on

Output: report-text.txt — single continuous flow ideal for grep or LLM prompt

Just the abstract

Input: paper.pdf, range '1', headers off, breaks on

Output: paper-text.txt — first page only, clean paragraph form, no '--- Page 1 ---' header

Tips

  • Use the page range filter aggressively — extracting only what you need keeps the output focused and the .txt file small.
  • If the output looks mashed together, turn 'Preserve line breaks' on; if you want a continuous flow for an LLM prompt, turn it off.
  • Trim whitespace cleans up journal-style two-column PDFs that often leave gappy alignment in the text layer.
  • Scanned PDFs (image-only) produce empty output — run them through PDF OCR first to create a text layer.
  • For very large books, extract chapter-by-chapter so the textarea doesn't lag your browser.

Frequently Asked Questions

Why does my scanned PDF produce empty output?
Scanned PDFs are images of text, not real text. There is nothing in the text layer to extract. Run the file through OCR first (any modern OCR tool) to embed a text layer, then convert to text here.
Does this preserve formatting like bold or italics?
No — .txt is a plain text format. Only the characters and line breaks survive. Use a Word-compatible export if you need formatting.
What about tables?
Tables flatten into rows of text separated by spaces or tabs depending on how the PDF stored them. Complex multi-column tables may need post-processing in a spreadsheet.
Are my files uploaded?
No. pdf.js runs in your browser via WebAssembly. The PDF stays on your device the entire time.
Maximum page count?
Limited by browser memory. Books with hundreds of pages work fine; extract chapter-by-chapter for very long documents to keep the UI responsive.
What encoding is the output?
UTF-8 — supports every Unicode character including accents, CJK, Arabic, and emoji that may appear in your source PDF.
Can I extract from password-protected PDFs?
Encrypted PDFs need to be unlocked first via the PDF Unlock tool. Once decrypted, this extractor reads them like any other file.
Does the tool work offline?
Once the page has loaded, pdf.js is cached and the tool works without an active internet connection. No round-trip to a server is needed for extraction.

Explore the category

Glossary

Text Layer
The portion of a PDF that contains actual text characters (as opposed to images of text); the source this tool reads from.
OCR
Optical Character Recognition — converts an image of text into a real text layer; needed for scanned PDFs before text extraction works.
EOL Marker
End-of-line marker in pdf.js telling the renderer where a soft line break occurs in the text layer.
UTF-8
The character encoding used for the output .txt — supports every Unicode character including accents, emoji, and CJK scripts.
Page Range Syntax
Comma- and dash-separated notation: '1-5,8,10-12' selects pages 1 through 5, page 8, and pages 10 through 12.
pdf.js
Mozilla's JavaScript PDF parser and renderer; the engine that walks the text layer for this extractor.