Learning Images : WebP
Originally developed by Google, WebP is a lossy image format designed to replace JPEG, capable of producing smaller files than JPEG-encoded images of equivalent quality. Later updates to the format introduced options like lossless compression, PNG-like alpha channel transparency, and GIF-like animation, all of which can be used alongside JPEG-style lossy compression. WebP is an incredibly versatile format.
WebP's lossy compression algorithm is based on the method used by the VP8 video codec to compress keyframes in videos. At a high level, it resembles JPEG encoding: WebP operates in "blocks" rather than individual pixels and maintains a similar division between luminance and chrominance. WebP's luminance blocks are 16x16
and chrominance blocks are 8x8
, with these "macroblocks" further subdivided into 4x4 sub-blocks.
Two fundamental features that distinguish WebP from JPEG are "block prediction" and "adaptive block quantization."
Block Prediction
Block prediction refers to the process of predicting the content of each chrominance and luminance block based on the values of its surrounding blocks, particularly those above and to the left. As you might imagine, the algorithms performing this task are quite complex, but in simple terms, it's like saying, "If the blocks above and to the left are blue, assume this block is blue too."
In reality, PNG and JPEG also perform some form of prediction. However, WebP's uniqueness lies in how it samples data from surrounding blocks and then attempts to fill the current block using several different "prediction modes," effectively trying to "paint" the missing parts of the image. Each prediction mode's result is then compared to the actual image data, and the closest match is selected.
Of course, even the closest prediction won't be entirely accurate, so the difference between the predicted and actual values for the block is encoded into the file. When decoding the image, the rendering engine applies the same prediction logic using the same data to generate identical predicted values for each block. The encoded differences between the predicted and actual images are then applied to these predictions—similar to how a Git commit represents a diff patch applied to local files rather than an entirely new copy.
For example: To avoid delving into the complex math of actual prediction algorithms, we'll invent a WebP-like encoding with a single prediction mode that effectively passes a grid of numbers like older formats. Our algorithm has one prediction mode called "Prediction Mode One": Each block's value is the sum of the blocks above and to its left, starting from 1.
Now, suppose we start with the following real image data:
111151111 122456389
Using our prediction model to determine the contents of a 2x9
grid, we'd get:
111111111 123456789
Our data fits well with our invented prediction algorithm—the predicted data closely matches the real data. Of course, it's not a perfect match—some blocks differ. Thus, the encoded file includes not only the prediction method but also the differences for any blocks that deviate from the predictions.
_ _ _ _ +4 _ _ _ _ _ _ -1 _ _ _ -4 _ _
Using Prediction Mode One for a 2x9 grid: +4 at 1x5, -1 at 2x3, -4 at 2x7.
The result is an incredibly efficient encoded file.
Adaptive Block Quantization
JPEG compression is a uniform operation, applying the same quantization level to every block in the image. This makes sense for images with uniform composition—but real-world photos are no more uniform than the world around us. In practice, this means our JPEG compression settings are dictated not by high-frequency details (which JPEG handles well) but by the parts of the image most likely to show compression artifacts.
In this example, the butterfly's wings in the foreground appear relatively clear—slightly grainy compared to the high-resolution original but hardly noticeable without direct comparison. Similarly, the bee balm flowers and foreground leaves—even with excessive compression, you and I might spot some artifacts, but the foreground remains passable. However, the low-frequency information in the top-left corner—the blurry green background leaves—looks terrible. Even an untrained eye would immediately notice the quality issues—subtle gradients in the background are compressed into jagged, solid-colored blocks.
To avoid this, WebP employs adaptive quantization: The image is divided into up to four visually similar segments, and compression parameters are adjusted independently for each. Applying the same excessive compression with WebP:
Both image files are roughly the same size. When examining the butterfly's wings, their quality is similar—tiny differences might be visible under close scrutiny, but overall, there's no significant quality gap. In WebP, the bee balm flowers are slightly sharper—again, unnoticeable unless comparing side by side. The background, however, is entirely different: It shows almost none of JPEG's glaring artifacts. WebP delivers the same file size but higher image quality—except for minor details our visual system wouldn't notice without close comparison.
Using WebP
WebP's internal mechanics may be far more complex than JPEG encoding, but for daily work, it's just as simple: All of WebP's encoding complexity is normalized into a single "quality" value, ranging from 0 to 100, just like JPEG. That doesn't mean you're limited to a single "quality" setting, though. You can—and should—tweak all the details of WebP encoding, if only to better understand how these typically invisible settings affect file size and quality.
Google provides a command-line encoder called cwebp
for converting or compressing single files or entire directories of images:
$ cwebp -q 80 butterfly.jpg -o butterfly.webp Saving file 'butterfly.webp' File: butterfly.jpg Dimension: 1676 x 1418 Output: 208418 bytes Y-U-V-All-PSNR 41.00 43.99 44.95 41.87 dB (0.70 bpp) block count: intra4: 7644 (81.80%) Intra16: 1701 (18.20%) Skipped: 63 (0.67%) bytes used: header: 249 (0.1%) mode-partition: 36885 (17.7%) Residuals bytes |segment 1|segment 2|segment 3|segment 4| total macroblocks: | 8%| 22%| 26%| 44%| 9345 quantizer: | 27 | 25 | 21 | 13 | filter level: | 8 | 6 | 19 | 16 |
If you're not inclined to use the command line, Squoosh also offers WebP encoding. It allows side-by-side comparisons between different encodings, settings, quality levels, and file sizes versus JPEG.