pdftoolskit.org
PDF (Portable Document Format) utilities, in the browser
Say hi →

PDF → Markdown

heading inference from font-size jumps

Drop a PDF and get a Markdown file back. Headings are inferred by comparing each line's font size against the document's body-text size — bigger lines become # / ## / ### headings. Bullet markers are normalised to -.

Drop PDF files here or click to select

Multiple files allowed

    no files
    Ready.

    When to use this tool

    Use this when you want the prose and some structure — not just a wall of text. The common case is moving a long PDF (an article, a manual, a whitepaper) into a wiki, a blog, or a Markdown-based note system. The result needs cleanup, but it's a much better starting point than plain extracted text.

    Step by step

    1. Drop the PDFs. Each input produces a .md file.
    2. Click "Convert & download". The tool measures font sizes, infers headings from the larger ones, and writes Markdown.
    3. Open the .md file in your preferred editor and clean up the rough edges (false-positive headings, mis-ordered columns, broken hyphens).

    How heading inference works

    The tool measures every text fragment's font size, takes the median across the document as the "body" size, and compares each line's largest fragment against that:

    Common use cases

    Common mistakes

    FAQ

    How is this different from PDF → Text?

    PDF → Text gives you a flat text file. PDF → Markdown adds heading detection and bullet-list normalisation. Pick text if you want raw words; pick Markdown if you want minimal structure to start editing from.

    Will it preserve tables?

    Tables come out as runs of words (no Markdown table syntax). Recreating tables would need column detection, which is hard in pure browser JS. Use a dedicated tool (Tabula, pdfplumber) for serious table extraction.

    Are images extracted into the Markdown?

    No — images are skipped. Markdown lacks a way to embed image data inline cleanly, and pulling images out of PDFs is a separate operation.

    Does it handle code blocks?

    Not specifically. Code in monospaced fonts may come through as plain text. Manual fence-up (` ``` `) is needed.

    Is there a "high-fidelity" mode?

    Not in this tool. For higher fidelity, try pandoc with the --pdf input filter, or commercial converters that do layout analysis.