Extract Images from PDF
Pull every embedded image out of a PDF and download them individually or as a single .zip. Runs in your browser — no upload.
- Drop a PDF or click "browse for one".
- Click "Extract images" — a thumbnail grid appears.
- Click any thumbnail to download that image, or "Download all as .zip" for a bundle.
- Images are saved as PNG; original encoding is decoded into a canvas first.
What does it do?
Walks every page of the PDF, locates every paintImageXObject operation, and extracts the underlying image bitmap. Each extracted image is normalized into a PNG via canvas — JPEG sources lose their original compression, but the output pixels are the same as what the PDF rendered. Images stored in unsupported codecs (JBIG2, CCITT for fax, JPX for JPEG 2000) are reported in the count but not decoded — those would require dedicated codec libraries beyond pdfjs-dist.
Common issues
PDF image extraction is fundamentally codec-dependent. Most PDFs work — these are the patterns where extraction may produce surprising results.
- Unsupported image codecs. JBIG2 (some scanned documents), CCITT (fax-style scans), and JPEG 2000 (JPX) are not decoded. The status line reports how many were skipped. To extract those, render via /pdf-to-images instead — that rasterizes the whole page including the image.
- Original JPEG quality lost. Images are exported as PNG to preserve transparency and avoid double-compression artifacts. If your source was a JPEG embedded in the PDF, the PNG output is larger but pixel-identical to what pdf-lib decoded.
- Inline images missed. Some PDFs use inline image data (BI/ID/EI operators) instead of XObjects — typically very small images. v1 does not extract these. Most photos and screenshots are XObjects and are extracted correctly.
- Same image, multiple times. PDFs often reference one image XObject from multiple pages. v1 extracts the image once per paintImageXObject call, so a duplicated logo will appear once per usage. De-duplicate by filename or hash if needed.
- Encrypted PDFs. Password-protected PDFs cannot be opened without the password. Run them through /pdf-unlock first if you have the owner password.
- Very large PDFs. Each extracted image lives in browser memory until you clear or navigate away. PDFs with hundreds of high-resolution images can use 100s of MB of RAM. Use the .zip download promptly and click Clear when done.
Frequently asked questions
Why are my images PNG, not JPG?
PNG preserves transparency and avoids re-encoding artifacts. The pixels are the same as the original; the file is larger because PNG is lossless. To save space, run the result through /image-compress in WebP mode.
How do I tell which page each image came from?
Filenames follow the pattern `pageN-imgM.png` — N is the source page number, M is a sequence number within that page. Sort by filename to see the order they appear in the PDF.
Will encrypted images come out scrambled?
Image data inside an unencrypted PDF is not separately encrypted — it decodes normally. If the PDF itself is encrypted, the tool cannot read it at all (see the encrypted-PDFs note above).
Why is the count higher than the visible images?
Some PDFs use multiple image XObjects per visible image (e.g., a soft-mask alpha channel stored as a separate grayscale image). v1 extracts each one — the soft-mask is what makes the main image look right when composited, but on its own it appears as a black-and-white silhouette.
Is my PDF uploaded?
No. Everything runs in your browser — your PDF is parsed by pdfjs-dist and image bitmaps are rendered via canvas, all client-side. No network requests fire.
How big a PDF can I extract from?
Up to about 100 MB of PDF before the browser starts to feel sluggish. The hard limit is your tab memory — if extraction OOMs, split the PDF via /pdf-split and run each section separately.