I wanted to do this (diff PDFs) recently with these requirements:
- ignore whitespace, line breaks, page breaks, etc.
- easily see when just a couple words that changed, not just entire lines/paragraphs.
- color diff output
I installed pdftotext, wdiff, and colordiff, available in various package managers. (With macports: sudo port install poppler wdiff colordiff
)
Then:
wdiff <(pdftotext old.pdf -) <(pdftotext new.pdf -) | colordiff
Now I can see which words, nicely colored, have changed.
More details: http://philfreo.com/blog/how-to-view-a-color-diff-of-text-from-two-pdfs/
Variation:
Using dwdiff
can produce slightly better results.
I also wanted HTML output so this tiny script makes a basic web page with a bit of CSS.
bash pc-script.bash old.pdf new.pdf > q.htlm
Then open q.html
with your web browser.
pc-script.bash
file:
#!/bin/bash OLD="$1" NEW="$2" cat <<EOF <html><head><meta charset="UTF-8"/><title>Changes from $OLD to $NEW</title></head><style> .plus { color: green; background: #E7E7E7; } .minus { color: red; background: #D7D7D7; text-decoration: line-through; } </style><body><h1>Changes from [ <span class="minus">$OLD</span> ] to [ <span class="plus">$NEW</span> ]</h1><pre> EOF dwdiff -i -A best -P \ --start-delete='<span class="minus">' --stop-delete='</span>' \ --start-insert='<span class="plus" >' --stop-insert='</span>' \ <( pdftotext -enc UTF-8 -layout "$OLD" - ) \ <( pdftotext -enc UTF-8 -layout "$NEW" - ) \ cat <<EOF </pre></body></html> EOF
An example of output can be seen here