The discussion in the comments of my recent article about HD Photo (a.k.a. JPEG-XR) got me thinking about all of the different beasts that go by the name “JPEG.”

JPEG: What most of us consider to be “JPEG” is just one of many processes for image encoding and decoding defined within the same specification. The process that makes up 99.99999% of all of the JPEGs ever created is “JPEG Baseline (Process 1)” for 8-bit lossy compression. (That’s just my estimate, which is probably low. It’s probably better to say “almost 100% of all JPEGs.”)

This process divides an image into a bunch of 8×8 blocks, uses the discrete cosine transform (DCT) to move the data into the frequency domain, and compresses the data by (among other things) removing some of the high frequency data that the human visual system usually can’t detect. You can think about it as abridging a novel by taking out a few sentences per paragraph. Unfortunately, if the quality settings are too low, it’s really easy to notice that something has gone missing; or if a scene has a lot of information — such as one with lots of fine detail — there will be blocky artifacts where there the detail should be.

While the removal of high-frequency detail is inherently lossy, even with a maximum quality setting, the original JPEG standard specified a separate lossless mode not based on the DCT. Images compressed this way can be completely retrieved from the compressed data. This is important when you need to preserve all of the data within in image or when adding artifacts can have devastating consequences. “Is that a nodule in the patient’s chest X-ray or a JPEG compression artifact? I guess we’d better do a biopsy just in case….” In fact, the lossless modes for JPEG are really only used within DICOM files, the format used for digital imaging and communications in medicine.

Old school JPEG also supports 12 and 16 bits of data in each channel of a pixel. For color images, this is the difference between about 17 million colors for an 8-bit image, 68 billion colors for a 12-bit image, and 281 trillion colors when using 16 bits. Once again, only those medical imaging people use the extra bit depths, and they just use the gray colors.

JPEG-LS was supposed to be a better lossless format but never really got going. The promises of JPEG 2000 probably had a lot to do with this.

JPEG 2000 is (1) a wavelet-based compression method, (2) a scheme for encoding wavelet-compressed images into randomly accessable “codestreams”, and (3) a file format for encapsulating compressed codestreams. Because it uses a discrete wavelet transform (DWT) the results are generally better than the older JPEG format when comparing images with the same compression ratio.

Images in JPEG 2000 can have an arbitrary bit depth (1 - 32 bps), and different planes can have different bit depths. (For example the luminance channel of a YCbCr image can have a high bit depth to support HDR imagery.) Certain portions of an image can have higher spatial resolution or be encoded at a different compression level. JPEG 2000 has both lossy and lossless components as part of the baseline. Several colorspaces are supported, including bi-level, grayscale, sRGB, YCbCr, and indexed imagery. Hyperspectral and n-sample images are supported using a somewhat convoluted “multi-component” schema. Images can also include alpha channels for transparency. A really amazing thing about JPEG 2000 is that its possible to reorder the parts of the codestream to change how the data is accessed (e.g. access regions faster v. access different resolutions faster) without decompressing and recompressing the data, which can be expensive.

The JPEG 2000 file format uses about 20 hierarchical “boxes” to nest metadata about the compressed codestreams. While the file format is technically unnecessary to read and process a JPEG 2000 image, the extra formatting facilitates random data access, long-term cataloguing and IP management, and efficient transmission. JPEG 2000 files can also contain a limited subset of ICC color profiles. EXIF metadata support is not part of the JPEG 2000 standard, although it can appear as a private metadata field.

JPEG 2000 was touted as the format to replace the 1991 JPEG standard, but this didn’t happen for several reasons. Perhaps most important, the algorithms at the heart of JPEG 2000 require a lot of processing power, making it slower for desktop computers than rendering old-school JPEG and prohibitive for many embedded devices. As of 2007, few Web browsers have built-in support for it, and consumer-level digital cameras don’t produce imagery in the format. In 2007, Adobe Photoshop CS3 stopped including the JPEG 2000 export module in a typical installation.

But because of the smaller file size, flexibility, and more pleasing artifact appearance, the medical and remote sensing communities have adopted it. Both NITF and DICOM have incorporated JPEG 2000 data into their files. NITF is the friendly format used for “national imagery.” I will let you Google that so the NSA can start tracking you.

JPEG-XR is the name that Microsoft’s HD Photo format might have if it’s standardized, which I sincerely hope it will be. JPEG-XR uses a principal components photo core transform (PCT) which I know absolutely nothing about but which promises equivalent performance to JPEG 2000 with lower computational complexity — which means you can put it on a consumer device more easily — and much better size-versus-quality performance compared to the original JPEG format. It also supports more bit depths, high dynamic range imagery, lossy and lossless encoding/decoding using the same algorithm, and wide gamut color; uses a linear light gamma making it possibly suitable to replace RAW formats or enable post-CRT workflows; and can store bucketloads of metadata including EXIF and XMP.

JPEG-Plus. And then there’s JPEG+, which you might reasonably call JPEG - 20% because it’s essentially the same as the original DCT-based JPEG with a modest file-size performance improvement and some claims about better visual appearance. I’m not holding my breath for it; but given the 29+ processes that made up the original JPEG standard, what’s an extra one that no one will implement?

Update: For posterity, the PCT stands for “Photo Core Transform” not “Principal Component Transform”. Thomas Richter said this about it on sci.image.processing:

The transform is an overlapped 4×4 block transform that is related to a traditional DCT scheme, or at least approximates it closely. The encoding is a simple adaptive huffman with a move-to-front list defining the scanning order, and an inter-block prediction for the DC and the lowest-frequency AC path of the transformation.

Some parts are really close to H264 I-frame compression, i.e. the idea to use a pyramidal transformation scheme and transform low-passes again (here with the same, in H264 with a simpler transformation).

The good part is that lossy and lossless use the same transformation. The bad part is that the quantizer is the same for all frequencies, meaning there is no CSF adaption, and the entropy coder back-end is not state of the art.