Listen to a PDF Aloud — Free, In-Browser, No Upload
If you have a text-based PDF and a browser, you can listen to it read aloud, free, with no account and no upload. Here's how the parsing actually works, why some PDFs read cleaner than others, and what to do with scanned ones that have no text in them at all.
How it works
Drop a PDF into Quick TTS and the file is parsed in your browser using pdf.js, Mozilla's open-source PDF rendering library — the same engine Firefox uses to display PDFs natively. The text layer is extracted page by page, joined in reading order, and handed to whichever voice engine you've picked. Audio starts within seconds and the rest streams while you listen. Nothing is uploaded.
Three voice options:
- Browser TTS — your operating system's built-in voice. Works on every device. Sounds like a 2015-era GPS unit; fine for utility, less great for immersion.
- Piper — a ~60MB neural model that runs in WebAssembly. Works in any modern desktop or mobile browser. Dramatically more natural than Browser TTS.
- Kokoro HQ — ~80MB, runs on your GPU via WebGPU. Desktop Chrome or Edge for now. Closer to a real audiobook narrator than to a robot — the only option that holds up for a long textbook or research paper.
Why PDFs are harder than ebooks
PDF is a print-layout format, not a semantic document format. The job of a PDF is to put glyphs at exact x/y coordinates so the page looks identical on every screen and every printer. There's no inherent concept of "this is a paragraph," "this is a footnote," or "this is the next column" — just positioned text runs.
That makes reading order a guess. A two-column academic paper might be stored left-column then right-column, or interleaved line by line. Footnotes might appear after the body text, before it, or in the middle. Page headers and running footers repeat on every page and bleed into the prose.
Tagged PDFs embed the semantic structure (paragraphs, columns, reading order) alongside the visual layout — see the tagged-PDF section of the PDF spec. Most PDFs exported from Word, Pages, or modern InDesign are tagged and read cleanly. Older scientific papers, government forms, and anything from a print pipeline often aren't, and the reading order will be approximate.
Practical workflow
- 1. Drop the PDF onto Quick TTS. A few hundred pages is fine; the limit is your browser's memory.
- 2. Wait a beat. A 200-page PDF takes a couple of seconds to parse. The extracted text appears in the textarea so you can see what the engine got.
- 3. Skim for junk. Repeating page headers, a "© 2024 Publisher" footer on every page, the table of contents as a string of dotted leaders. Delete the worst of it before pressing play.
- 4. Pick your voice. Kokoro HQ on desktop Chrome or Edge with WebGPU; Piper otherwise; Browser TTS if you don't want a model download.
- 5. Press play. 1.3× on the speed slider is a comfortable spot for non-fiction once your ear adapts.
What to do when the reading order is wrong
If the audio jumps between columns, reads footnote numbers mid-sentence, or repeats a header on every page, the PDF wasn't tagged well. A few fixes:
- Edit in the textarea. The extracted text is right there. Delete the repeating header, strip footnote markers, fix column order by cut-and-paste. Faster than it sounds for a chapter-length section.
- Try the publisher's HTML version. Many academic papers ship as both. The HTML usually has clean reading order, and Quick TTS reads HTML files directly.
- Re-export through a word processor. Open the PDF in Word, Pages, or LibreOffice, save as DOCX or ODT, then drop that in. The round-trip often produces cleaner reading order than the PDF itself.
Scanned PDFs (image-of-text): now read in-browser with OCR
A scanned PDF is a stack of page-sized images with no text layer — to pdf.js, literally no text to extract, just pixels. Historically that meant you had to run OCR (optical character recognition) somewhere else first. Quick TTS now does that step for you, in the browser.
When you open a PDF, Quick TTS reads the embedded text layer page by page as before. For any page that comes back empty — a scan — it falls back to OCR automatically: the page is rendered to a canvas and run through Tesseract.js, the open-source OCR engine compiled to WebAssembly. It even auto-detects sideways scans (a landscape slide deck exported onto portrait pages) by trying all four rotations on the first scanned page and reusing the orientation that reads cleanest. No upload, no second tool, no searchable-PDF round-trip.
A few honest caveats so you know what to expect:
- It's slower than reading a text layer. The OCR engine is a one-time ~3 MB download the first time a scan is detected (cached after), and each scanned page takes a second or two to recognize. Born-digital PDFs still read instantly — OCR only kicks in for the pages that need it.
- English text for now. The bundled OCR model is English
(
eng). It will read English scans well; other scripts aren't covered yet. The 16-locale interface is unrelated to which languages OCR can read. - Accuracy tracks scan quality. A clean 300 DPI scan reads very well; a skewed phone photo of a page or a faint fax will have errors. For high-volume or archival OCR, a dedicated pipeline (OCRmyPDF, Adobe Acrobat) still produces a cleaner searchable PDF — but for "I just want to hear this scan read aloud," you no longer have to leave the page.
To tell what you have before you start: open the PDF in any viewer and try to select a sentence with the cursor. If you can highlight text, Quick TTS reads the text layer directly. If the cursor selects a rectangular image region, it's scanned — and OCR takes over automatically. Either way, it stays in your browser.
Privacy: your PDF never leaves your browser
This is the part most "PDF to MP3" sites don't say out loud. Server-based converters require you to upload your PDF — meaning a copy of the file, including any internal markings, watermarks, redactions, or metadata it carries, ends up on their server, gets logged, and may be retained.
Quick TTS doesn't upload anything. The PDF is parsed locally by pdf.js, and audio comes either from your operating system's TTS engine or from neural model files cached in your browser. The only outbound network requests are the one-time downloads of the AI voice models from jsDelivr and Hugging Face — after that, the tool works offline. That matters most for the PDFs you don't want on a stranger's server: contracts, medical records, internal company documents, legal filings.
Limitations worth knowing
- Scanned PDFs are OCR'd in-browser (covered above) — slower than a text layer, English-only for now, and accuracy tracks scan quality.
- Footnotes, page headers, and running footers often read inline because the PDF doesn't tag them as separate from the body. Edit in the textarea before playing if it bothers you.
- Two-column layouts in untagged PDFs may read in the wrong order. Re-exporting via Word or grabbing the publisher's HTML version usually fixes it.
- Mathematical formulas, code blocks, and tabular data read as literal punctuation ("equals sign, x, plus..."). Skip those manually if they matter.
- Forms and signed PDFs may have field text stored separately from the body. The body reads fine; the form values may not appear.
Try it
Open Quick TTS, drop a PDF, pick Kokoro HQ (or Piper if your browser doesn't support WebGPU), and press play. Nothing to install, nothing to sign up for, nothing that phones home.
For specific questions — file size limits, AI voice details, commercial use — the FAQ covers them. The guide walks through nine other use cases. For ebooks, the EPUB-to-speech post handles that side. If you're weighing this against paid tools, the comparison page lays out where Quick TTS wins and where NaturalReader and Speechify do.