JPEG uses a “lossy” compression algorithm. That’s basically the secret of why a JPEG version of an image can be so much smaller than a PNG or TIFF; it’s because the JPEG throws away some of the information. This is not really a secret; it’s obvious from the algorithm, and it’s explicitly stated in nearly all articles about the format (for example, wikipedia).One of the main uses for JPEG is compressing images to be displayed on web sites. For that use, the quality doesn’t need to be too high (they’re being looked at full-size on the screen, not blown up, and not printed at high quality), and keeping the size small is fairly important. Also, if that’s the intended use of an image, you can look at the JPEG you’ve made of it and decide for yourself whether the quality is high enough (and if it isn’t, resave at a higher quality level).
The lossiness of JPEG isn’t too important in that application; the purpose is to make an image that satisfies human perception, and the human preparing the image can easily examine the result and decide if it is indeed satisfactory.
The trouble comes if somebody opens a JPEG image, edits it some, and then saves it again as a new JPEG. When this is done, the image is compressed again. It is widely asserted (and it is true) that each time this happens, a bit more damage is done to the image. In fact, it isn’t necessary to modify the image; simply uncompressing the JPEG into an image and recompressing that image into a new JPEG does additional damage to the image, despite the exact same algorithm being applied both times.
In a workflow where you modify and resave an image, you should avoid using any lossy compression mechanism (JPEG is the only widely used lossy compression scheme for still images).
The Experiment
I’m going to demonstrate that this is true experimentally. An image is loaded from a JPEG, resaved as a new JPEG, and then I compare the two files. They are different.
Since there’s some possibility that the difference found between the files is in metadata rather than in the actual image, we’ll go a step further and do a visual comparison of the two copies of the image, using Photoshop, and show that they do indeed differ.
Generation 2
To be sure that both JPEGs are saved with identical settings, I loaded an image into Irfanview and then saved it as a progressive JPEG with quality 50%.
Here’s that image:
Generation 3
I then loaded gen02.jpg into Irfanview and resaved it as a new JPEG, using the same settings as above.
Here is that image:
Difference
The files are definitely different. The Unix binary file compare program says so:
[annie]$ cmp gen02.jpg gen03.jpg
gen02.jpg gen03.jpg differ: byte 1449, line 4
The MD5 checksum algorithm also notices the differences:
[annie]$ md5sum gen02.jpg gen03.jpg
b2ca69d3215d5618142eb277e805fda8 gen02.jpg
63dde13daccf49e59ce7b9e4e76d0fb5 gen03.jpg
However, JPEG files are complicated; it’s not beyond the bounds of possibility that the differences being detected are in metadata of some sort, rather than in the actual bits of the image. So let’s find a way to compare just the images. One way would be to save the images, and no metadata, in some simple format like BMP, and then compare those files. However, a more striking result is possible by subtracting the two images in Photoshop to produce a visual differences image.
I loaded both images into Photoshop. I then used “Images/Apply” to subtract one image from the other.
The resulting difference image looked pretty much like a black rectangle, which would mean that the images were the same. However, close examination of the histogram showed that the pixels weren’t in fact all black; there was some very small amount of variation.
To make it more visible, I used auto levels on it, resulting in the following exaggerated differences image:
So, as is quite clear, the two images were not in fact identical. The simple act of recompressing gen02 into a new JPEG, which became gen03, caused further changes to the image, beyond what happened the first time it was saved as a JPEG.