Listen to a PDF Aloud — Free, In-Browser, No Upload
If you have a text-based PDF and a browser, you can listen to it read aloud, free, with no account and no upload. Here's how the parsing actually works, why some PDFs read cleaner than others, and what to do with scanned ones that have no text in them at all.
How it works
Drop a PDF into Quick TTS and the file is parsed in your browser using pdf.js, Mozilla's open-source PDF rendering library — the same engine Firefox uses to display PDFs natively. The text layer is extracted page by page, joined in reading order, and handed to whichever voice engine you've picked. Audio starts within seconds and the rest streams while you listen. Nothing is uploaded.
Three voice options:
- Browser TTS — your operating system's built-in voice. Works on every device. Sounds like a 2015-era GPS unit; fine for utility, less great for immersion.
- Piper — a ~60MB neural model that runs in WebAssembly. Works in any modern desktop or mobile browser. Dramatically more natural than Browser TTS.
- Kokoro HQ — ~80MB, runs on your GPU via WebGPU. Desktop Chrome or Edge for now. Closer to a real audiobook narrator than to a robot — the only option that holds up for a long textbook or research paper.
Why PDFs are harder than ebooks
PDF is a print-layout format, not a semantic document format. The job of a PDF is to put glyphs at exact x/y coordinates so the page looks identical on every screen and every printer. There's no inherent concept of "this is a paragraph," "this is a footnote," or "this is the next column" — just positioned text runs.
That makes reading order a guess. A two-column academic paper might be stored left-column then right-column, or interleaved line by line. Footnotes might appear after the body text, before it, or in the middle. Page headers and running footers repeat on every page and bleed into the prose.
Tagged PDFs embed the semantic structure (paragraphs, columns, reading order) alongside the visual layout — see the tagged-PDF section of the PDF spec. Most PDFs exported from Word, Pages, or modern InDesign are tagged and read cleanly. Older scientific papers, government forms, and anything from a print pipeline often aren't, and the reading order will be approximate.
Practical workflow
- 1. Drop the PDF onto Quick TTS. A few hundred pages is fine; the limit is your browser's memory.
- 2. Wait a beat. A 200-page PDF takes a couple of seconds to parse. The extracted text appears in the textarea so you can see what the engine got.
- 3. Skim for junk. Repeating page headers, a "© 2024 Publisher" footer on every page, the table of contents as a string of dotted leaders. Delete the worst of it before pressing play.
- 4. Pick your voice. Kokoro HQ on desktop Chrome or Edge with WebGPU; Piper otherwise; Browser TTS if you don't want a model download.
- 5. Press play. 1.3× on the speed slider is a comfortable spot for non-fiction once your ear adapts.
What to do when the reading order is wrong
If the audio jumps between columns, reads footnote numbers mid-sentence, or repeats a header on every page, the PDF wasn't tagged well. A few fixes:
- Edit in the textarea. The extracted text is right there. Delete the repeating header, strip footnote markers, fix column order by cut-and-paste. Faster than it sounds for a chapter-length section.
- Try the publisher's HTML version. Many academic papers ship as both. The HTML usually has clean reading order, and Quick TTS reads HTML files directly.
- Re-export through a word processor. Open the PDF in Word, Pages, or LibreOffice, save as DOCX or ODT, then drop that in. The round-trip often produces cleaner reading order than the PDF itself.
Scanned PDFs (image-of-text) — the honest limit
Scanned PDFs can't be read directly by any TTS tool, including this one. A scanned PDF is a stack of page-sized images with no text layer — to pdf.js, literally no text to extract, just pixels. You need to run OCR (optical character recognition) first to generate the text layer.
To tell what you have: open the PDF in any viewer and try to select a sentence with the cursor. If you can highlight text, Quick TTS will read it. If the cursor selects a rectangular image region, it's scanned.
Free or cheap OCR options:
- Adobe Acrobat's built-in OCR — paid but ubiquitous. "Scan & OCR" → "Recognize Text" produces a searchable PDF you can then drop into Quick TTS. Adobe's online OCR is the web version.
- Free desktop OCR — Tesseract is the open-source standard and ships inside many GUI wrappers (gImageReader, OCRmyPDF). Slower than Adobe but free and runs offline.
- NaturalReader and Speechify bundle OCR inside their paid tiers, so a scanned PDF goes straight to audio without a two-step workflow. If you process scanned PDFs frequently, NaturalReader or Speechify are the right tool for that step. Quick TTS doesn't try to compete on OCR.
For text-based PDFs (the majority of modern documents), Quick TTS reads them as-is with no OCR step.
Privacy: your PDF never leaves your browser
This is the part most "PDF to MP3" sites don't say out loud. Server-based converters require you to upload your PDF — meaning a copy of the file, including any internal markings, watermarks, redactions, or metadata it carries, ends up on their server, gets logged, and may be retained.
Quick TTS doesn't upload anything. The PDF is parsed locally by pdf.js, and audio comes either from your operating system's TTS engine or from neural model files cached in your browser. The only outbound network requests are the one-time downloads of the AI voice models from jsDelivr and Hugging Face — after that, the tool works offline. That matters most for the PDFs you don't want on a stranger's server: contracts, medical records, internal company documents, legal filings.
Limitations worth knowing
- Scanned PDFs need OCR first (covered above). No way around that with any in-browser tool.
- Footnotes, page headers, and running footers often read inline because the PDF doesn't tag them as separate from the body. Edit in the textarea before playing if it bothers you.
- Two-column layouts in untagged PDFs may read in the wrong order. Re-exporting via Word or grabbing the publisher's HTML version usually fixes it.
- Mathematical formulas, code blocks, and tabular data read as literal punctuation ("equals sign, x, plus..."). Skip those manually if they matter.
- Forms and signed PDFs may have field text stored separately from the body. The body reads fine; the form values may not appear.
Try it
Open Quick TTS, drop a PDF, pick Kokoro HQ (or Piper if your browser doesn't support WebGPU), and press play. Nothing to install, nothing to sign up for, nothing that phones home.
For specific questions — file size limits, AI voice details, commercial use — the FAQ covers them. The guide walks through nine other use cases. For ebooks, the EPUB-to-speech post handles that side. If you're weighing this against paid tools, the comparison page lays out where Quick TTS wins and where NaturalReader and Speechify do.