How a digital camera works

I began this technical article a week ago to synthesize different things I know about the digital camera pipeline and digital image formation. Photographers may be interested in learning what happens after they trip the shutter and before they import it into Photoshop. If you’re not at all technical, you still may enjoy several of the linked articles, some of which show off optical illusions.

How does a digital camera make an image? It seems like such a simple question with a really obvious answer: Light strikes a sensor which converts it to a grid of numbers that is written to a file.

Well, yeah. But how does that grid of numbers (the image) get formed? Recently I helped someone at work answer that question. The result was this chart:

Click for larger

Not every camera does everything the same way. It’s something of a dark art, really. Camera manufacturers do all sorts of secret, proprietary stuff to tweak images. It’s a fact that vendors don’t like to talk about at parties, but, contrary to popular belief, even RAW files don’t contain the raw sensor data. . . . But I’m getting ahead of myself. Let’s look at the various stages in image formation.

Capturing light with a bucket

To be honest, this is the part that I know the least about. There are hardware guys and software guys. I’ve never used a soldering iron and have probably broken Ohm’s Law on a number of occasions. But here’s the important thing to know. In a digital camera, light (energy) strikes a photosensitive material, which induces a current in a circuit attached to the sensor. More brightness, more energy. More energy, more current. Depending on the type of circuitry used, a separate analog-to-digital converter (ADC) turns the current into a count — CCDs need a separate ADC; CMOS sensors don’t.

The current gets counted at each of the millions of “buckets” on the sensor. In order to create a color image from colorful light, you need to break it down into its red, green, and blue components. The eye does this in a very sophisticated way. A camera is much simpler, separating light into three (sometimes four) components via millions of tiny filters, allowing the components to be individually counted.

“It goes up to 11…”

At this point, light has been converted into millions of counts, which range from 0 (no light) to some theoretical maximum. The count depends on the detector’s sensitivity, the ADC, the amount of light, and the color of light. The maximum possible count determines how many individual brightness levels can be recorded. Sensitivity is typically expressed as “bit depth.” An 8-bit sensor can record 256 (2^8) different levels of light. The more bits, the more distinguishable counts.

Bit depth Number of Levels Minimum Value Maximum Value
8 2^8 0 255
10 2^10 0 1023
12 2^12 0 4095
14 2^14 0 16383
16 2^16 0 65535

Not all pixels are created equal. Some buckets on the image sensor are defective and are always on (“hot”) or always off (“dead”). The sensor itself produces heat (“dark noise”), which it detects when recording the scene. And light “leaks” from a sensor before it gets digitized — imagine a bucket brigade that loses a little bit of water with each hand-off. Better sensors have fewer problems.

The result is noise. Sometimes its speckled (so-called “salt and pepper noise”). Often it isn’t uniform across the sensor. In every case, it should be corrected. Hot and dead pixels are averaged with their neighbors (to the dismay of astrophotographers if done on the image data). Another technique known as “dark subtraction” attempts to remove noise by subtracting the heat noise from the exposure. If the dead/hot pixels are known, they can be masked out. Often this happens before the “raw” image is recorded, making it impossible to get the actual sensor data.

Noise disportionately affects the dark parts of the image. Why? Camera sensors experience light differently than we do. If you double the intensity of light (as measured in absolute units, like lux) the light appears twice as bright to a human observer. As a result, when the intensity of light increases, there are larger gaps in sensor counts between equivalent perceptual changes. At the darker end of a scanner’s sensitivity perceived brightness may double every 32 or 64 values; while at the brighter end a similar perceptual change might require 1024 or 2048 levels. (Charles Poynton describes “gamma” in exhaustive detail.) Consequently, small numerical changes make big noise differences in the darker part of the image.

“This ain’t no image, no RGBG

After digitization and noise removal we still don’t have a recognizable image. There’s no color to it yet and the brightness won’t look right either. Remember that millions of red, green, blue, and (sometimes) cyan filters cover the image sensor in order to provide color accuity. Each filter is only sensitive to one color — blue, for example — but if manufacturers use the right pattern for filters, an algorithm can calculate the the other colors (green and red) at the blue filter’s location. The sensor’s pseudo-image is known as a “Bayer pattern.” A “raw” file contains this pseudo-image along with the metadata needed to construct a final image.

Here is a typical color filter array (Bayer) pattern. Don’t blame me if looking at this image makes you have a seizure. If the image appears to be moving or breathing, well, that’s just a result of simultaneous contrast. It’s natural.

A number of different algorithms “demosaic” this pattern data into the RGB image we’re expecting. Typically, speed and accuracy compete to determine the “best” demosaicing pattern. Some algorithms handle images that contain a lot of pronounced edges better than others. In any case, these algorithm look at neighboring pixels to infer the other colors. Collecting these constructed RGB values into separate red, green, and blue color planes (or channels) expands the information content of an image by a factor of three. Adding information to an image post facto almost always involves subjective judgments about image quality.

After demosaicing, the image is more or less what you would expect a color image to look like but probably nowhere near the final state. Remember that the image contains intensity counts at each pixel location, which is not exactly how we experience image brightness or color. The image requires gamma correction and probaly appears dark. In addition, the color balance is probably incorrect, requiring a white point adjustment.

Gamma correction applies a nonlinear power function to the pixel values, making the pixel values match perceived brightness. Often a separate “tone response curve” is applied to each color channel. The power function — which takes larger numbers and creates smaller output values — has the effect of compressing the image’s effective bit-depth (and its dynamic range), as visually redundant information is squeezed out. Some camera vendors, such as Nikon in its D70 model, perform this gamma compression at an earlier stage in order to reduce RAW file size.

It’s not uncommon somewhere in this stage to remap an image’s values from its effective bit-depth to 8-bits or 16-bits. Image processing and image manipulation applications usually need this in order to display the image correctly. Unlike gamma correction, this is typically a linear remapping.

Making pretty images

Tone mapping adjusts the perceived brightness of image pixels but has very little effect on the color balance of those pixels. Several factors determine the color of image pixels:

  • The individual red, green, and blue (R,G,B) pixel values
  • The color of the “pure” red, green, and blue primaries
  • The whitepoint — that is, the color of white

The primaries determine the color of the most saturated red, green, and blue colors that can be recorded or displayed. In an RGB system, every color is a combination of various intensities of these three primaries. If you change one primary’s definition, every other color is changed accordingly. If you’re having trouble with the concept of multiple “pure” red colors being called “red”, just consider what happens when you fiddle with the color controls on your monitor or television set. You aren’t changing the input values that are displayed, but different colors show up on the screen. Different devices have different red, green, and blue sensitivities; and it’s necessary to take the (R,G,B) values from the camera and put them into a well-understood color space where those particular (R,G,B) values have specific color meanings. Some of these color spaces include Adobe RGB (1998), ProPhoto RGB, and sRGB. Industry-wide adoption of ICC color profiles has largely standardized these color translations.

So how does the white point fit in? When you take your camera into a scene with a different color of light, the camera’s color sensitivity doesn’t change. But the human visual system’s sensitivity sure does! Our brains adapt to the scene’s white point, but the camera does not. As a result, the images that we saw and the camera recorded aren’t the same. A simple white point change corrects this problem. (Some cameras have the ability to record the color of the ambient lighting and store it in the RAW file for later use, which is pretty cool if you ask me.)

Digital cameras can store other information in a RAW file — we’ll get to those special files in a second — that help create “good looking” final images. Some cameras store mask information to hide low-quality portions of the image. Any system that samples data into discrete values introduces errors, which often appear as “steppy edges.” Specifying how much chroma blur to add reduces this unwanted effect. (Any image that is blurred — also a result of quanization — probably deserves some sharpening, too.)

Most mid-level and high-end cameras store the Bayer pattern “image” and the hints for reconstructing it in one of the many RAW file formats vendors have defined. Vendors stuff lots of other information there as well, possibly including a lower resolution lossy JPEG thumbnail and EXIF, XMP, and IPTC metadata. Many manufacturers perform lossless compression on the pixels to reduce the size of the RAW file, but this takes time to do. The possibilities and permutations are endless. DNG is a relatively new format whose proponents hope to unify these proprietary formats.

Now go outside and play

I hope you’ve enjoyed this (sort of) brief description of digital image formation and that you now have a better sense of the factors that impact “raw” images. The imaging world is finishing up its amazing transition from film to digital capture, but the image processing tools and digital pipeline are still evolving. Nowadays, knowing what goes into your raw images is akin to knowing how the Zone System impacts film-based photography. If you know of anything that’s out-of-date, please leave a comment.

This entry was posted in Color and Vision, Computing, Fodder for Techno-weenies, Photography. Bookmark the permalink.

7 Responses to How a digital camera works

  1. hari says:

    how do we convert *.raw file to *.jpg/tif using Matlab?

  2. Jeff Mather says:

    Hi Hari,

    There isn’t anything in MATLAB proper to read RAW files. You might find this MATLAB Central file submission useful to read them. Of course, it’s about three years old, so it doesn’t support all cameras.

    Once you get the image in, you can easily write it out using IMWRITE.

  3. hari says:

    Thank you for the reply. Excluding MATLAB, is there any other way to do it? what i mean is, i would like to collect the raw file from the camera and convert it to jpg and then read it into MATLAB using IMREAD.

  4. Jeff Mather says:

    I think the easiest thing to do might be this:

    1) Bring the RAW images from your camera using whatever process you normally use, such as copying them via a card reader. You may just be able to plug your camera or card reader into a USB port and then CD to the directory containing the images.

    2) Use dcraw or a similar batch converter to transcode the raw images into JPEG. The dcraw program is free, and many people like it.

    3) Call IMREAD on the converted file.

  5. ego says:

    The author is wrong about how the sensor reacts differently to light than the eye. It has the same response as the eye, twice the amount of light is perceived in both cases as twice as bright. What confuses people is that fact that between the raw and jpeg image a non-linear tone-curve is applied, but this is to compensate for the non-linear effect that was present in old cathode-ray monitors, and which is still adjusted for in video graphical cards even though the LCD monitor is also linear.

  6. Jeff Mather says:

    Ego: You are correct, of course. I mistakenly conflated the compressed, tone-mapped pixel values with the pixel counts before they were changed by the gamma curve. Rookie mistake made at the end of the work week. :)

  7. D S PATIL says:

    really,i like to read your info about digital camera THANKING YOU

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>