Skip to content

modern-pdf-lib


modern-pdf-lib / deduplicateImages

Function: deduplicateImages()

deduplicateImages(doc): DeduplicationReport

Defined in: src/assets/image/deduplicateImages.ts:110

Deduplicate identical images in a PDF document.

Scans all image XObjects, hashes their compressed stream data (plus dimensions and filter), and replaces duplicate references in page resource dictionaries with the canonical (first-seen) copy.

This operation modifies the document in-place. Duplicate streams are not removed from the object registry (they become unreferenced and will be omitted on save if the writer supports garbage collection).

Parameters

doc

PdfDocument

A parsed PdfDocument (from loadPdf()).

Returns

DeduplicationReport

A report summarizing deduplication results.

Example

ts
import { loadPdf, deduplicateImages } from 'modern-pdf-lib';

const doc = await loadPdf(pdfBytes);
const report = await deduplicateImages(doc);

console.log(`Removed ${report.duplicatesRemoved} duplicate images`);
console.log(`Saved ~${(report.bytesSaved / 1024).toFixed(0)} KB`);

const optimizedBytes = await doc.save();

Released under the MIT License.