modern-pdf-lib / deduplicateImages
Function: deduplicateImages()
deduplicateImages(
doc):DeduplicationReport
Defined in: src/assets/image/deduplicateImages.ts:110
Deduplicate identical images in a PDF document.
Scans all image XObjects, hashes their compressed stream data (plus dimensions and filter), and replaces duplicate references in page resource dictionaries with the canonical (first-seen) copy.
This operation modifies the document in-place. Duplicate streams are not removed from the object registry (they become unreferenced and will be omitted on save if the writer supports garbage collection).
Parameters
doc
A parsed PdfDocument (from loadPdf()).
Returns
A report summarizing deduplication results.
Example
ts
import { loadPdf, deduplicateImages } from 'modern-pdf-lib';
const doc = await loadPdf(pdfBytes);
const report = await deduplicateImages(doc);
console.log(`Removed ${report.duplicatesRemoved} duplicate images`);
console.log(`Saved ~${(report.bytesSaved / 1024).toFixed(0)} KB`);
const optimizedBytes = await doc.save();