PDF → Markdown

heading inference from font-size jumps

Drop a PDF and get a Markdown file back. Headings are inferred by comparing each line's font size against the document's body-text size — bigger lines become # / ## / ### headings. Bullet markers are normalised to -.

Drop PDF files here or click to select

Multiple files allowed

no files

Ready.

When to use this tool

Use this when you want the prose and some structure — not just a wall of text. The common case is moving a long PDF (an article, a manual, a whitepaper) into a wiki, a blog, or a Markdown-based note system. The result needs cleanup, but it's a much better starting point than plain extracted text.

Step by step

Drop the PDFs. Each input produces a .md file.
Click "Convert & download". The tool measures font sizes, infers headings from the larger ones, and writes Markdown.
Open the .md file in your preferred editor and clean up the rough edges (false-positive headings, mis-ordered columns, broken hyphens).

How heading inference works

The tool measures every text fragment's font size, takes the median across the document as the "body" size, and compares each line's largest fragment against that:

1.6× body size or bigger → # H1
1.3× body size → ## H2
1.15× body size → ### H3
Everything else → plain paragraph
Lines starting with bullet markers (•, ·, -, *) → unordered list items

Common use cases

Wiki / Confluence migration. Convert a stack of PDFs into Markdown, paste into the wiki one page at a time.
Blog drafts from research. Get a paper's content into a Markdown editor with headings already bracketed out.
Notes from textbooks. Quick way to pull a chapter into your note system for editing.
Static-site content. Migrating a PDF library to a Hugo / Jekyll / Astro site.
LLM-friendly chunking. Markdown with headings is easier for an LLM to navigate than raw text.

Common mistakes

Trusting the headings. Heading inference is a heuristic — always review and clean up. Some bold-but-not-bigger headings will be missed; some larger captions will be falsely promoted.
Multi-column layouts. Two-column papers may interleave text from both columns. Manual reflow is sometimes required.
Page boundaries in mid-paragraph. The tool inserts  markers but doesn't try to bridge paragraphs across pages — review around those markers.

FAQ

How is this different from PDF → Text?

PDF → Text gives you a flat text file. PDF → Markdown adds heading detection and bullet-list normalisation. Pick text if you want raw words; pick Markdown if you want minimal structure to start editing from.

Will it preserve tables?

Tables come out as runs of words (no Markdown table syntax). Recreating tables would need column detection, which is hard in pure browser JS. Use a dedicated tool (Tabula, pdfplumber) for serious table extraction.

Are images extracted into the Markdown?

No — images are skipped. Markdown lacks a way to embed image data inline cleanly, and pulling images out of PDFs is a separate operation.

Does it handle code blocks?

Not specifically. Code in monospaced fonts may come through as plain text. Manual fence-up (` ``` `) is needed.

Is there a "high-fidelity" mode?

Not in this tool. For higher fidelity, try pandoc with the --pdf input filter, or commercial converters that do layout analysis.