Monthly Archives: January 2008

Electronic Imaging 2008: Megapixels matter

I snap a lot of pictures with the camera in my mobile phone. This is a relatively new thing for me; I own several high quality cameras, and it didn’t occur to me for quite a while that it was okay to make low quality (640×480) snapshots of the random things that happen when I’m out and about.

Turns out, I’m late to the game. In a session yesterday, Reiner Fageth of CeWe Color AG reported that 3% of all of the images that their customers upload for them to print come from mobile phone cameras. That surprised me and many people in the audience. It’s rare that my pictures from my moby are good enough to consider printing. Usually the resolution is just too low, the colors are off, or they’re a bit shaky.

Bror Hultgren discussed the quality issue today in a paper entitled “Megapixel mythology and photospace: estimating photospace for camera phones from large image sets.” Basically, he and Dirk Hertel (his collaborator) wanted to answer the question of whether more megapixels make for better images from cameras in mobile devices.

They began by collecting images of people and places taken on mobile devices. Some of these images they created themselves, while they used Flickr for the rest. (This wasn’t the first time this week I’ve heard of Flickr being used as a serious tool for image quality research.) They group these objects into a “photospace” with two independent axes: illumination level and distance to the subject. (e.g., A landscape of the beach = high illumination and far distance; shots of your girlfriend at the pub = low illumination and (hopefully) close distance — unless you’re stalking your imaginary girlfriend. Hey we’ve all been there.) Each axis is a continuum, which they divided into four or five segments for simplicity.

To evaluate the images they asked human observers to rate the images on a quality scale ranging from “very poor” to “excellent” via a software application. Observers were also asked what was objectionable about low-rated images, such as “too blurry,” “too dark,” “not sharp enough,” etc.

The results. Images from camera phones were rated lowest when they fell into the “dark close-up” bucket. This isn’t surprising. But what was unexpected in his research was that this section of the photospace represented the largest segment of photographs made on mobile devices. In fact, the quality of photos is strongly negatively correlated with the number of photographs they found in each part of the photospace. We bring our mobile phones with us everywhere, and we do things with our friends after dark or indoors. And most of us use our devices to make pictures of people more than we do to make pictures of things. So camera phone manufacturers are producing devices that perform poorest in the situations where we most want to use the devices. (The major failure mode was blur, by the way. No surprises there.)

So do megapixels matter? Can you get better results with more megapixels? Hultgren and Hertel say “yes.” Quality is directly proportional to megapixel count. But here’s the catch: It’s only statistically significant for the best photos. In their study these were images in the 90th percentile of quality. For images in the 75th percentile and lower for quality, the statistics suggest a correlation between megapixel count and quality but not with the required level of certainty to draw rigorous conclusions.

Vik Muniz at the San Jose Museum of Art

Spreading bacteria at the Tech Museum

Posted in Computing, Fodder for Techno-weenies, Photography | Leave a comment

Electronic Imaging 2008: Digital Image Forensics

Now that Steve has linked to me again, I guess it would be timely to post something from the Electronic Imaging symposium. Let’s start with something fun: digital image forensics.

I find this topic really interesting for a few reasons. (1) I’m a photographer who believes that all images contain a seed of untruth but feel ambivalent about what that means in our information age where images form the basis of what most people “know.” (2) Practical applications of image processing are always interesting. (3) It’s fresh in my mind, since I recently helped someone show that an Ethiopian passport was a fraud. (Image processing played only a small part — the content and font on the machine readable section didn’t match the international standard — but showing that parts were pasted in digitally helped create a preponderance of evidence.)

Hany Farid of Dartmouth gave the first plenary session: “Digital Forensics”. Prof. Farid started by quoting a science journal editor’s staggering statistic: 20-30% of submitted journal entries need images resubmitted because of “inappropriate image manipulation.” His lab’s work aims to out these digital forgeries. Here are some techniques that you can use to identify likely forgeries.

JPEG quality tables (Q-Tables) — These 8-by-8 tables of quantization values are stored in each JPEG file and used to decompress the images. Something I did not know was that most camera vendors use unique Q-tables and that they frequently change them when they introduce new camera models. Photoshop, on the other hand, has not changed their Q-tables since version 1. So you can extract these values from a file and see if it has been saved by Photoshop, which might hint at manipulation.

Cloning — Partition the image into blocks, do principal component analysis, lexigraphically sort the results, do region growing, and look for similar regions.

Resampling (shrinking, growing, rotating, etc.) — Use statistical correlation to look for simple interpolation between values.

Included objects frequently have a different color filter array (CFA) pattern than the rest of the image, possibly because of resizing or different in-camera decoding. Also you can create a vector field of the chromatic aberration color fringing throughout the image; inserted parts will likely have vectors pointing the wrong direction.

In addition, the human visual system (HVS) doesn’t easily notice subtle differences in lighting or shadows in parts of a composite image, but Farid laid out some techniques for estimating lighting direction. One impressive method involves determining the direction of the light (in 3-space, mind you) by looking at the specular highlights in the eye. (See Micah Kimo Johnson’s thesis for full details.)

During the Q&A, one wag asked the question on many of our minds. Are there tools or techniques available to mask all of these forgery detection techniques? Apparently yes, since Prof. Farid consults with Adobe on issues of making more realistic photo manipulations.

Update — 10 August 2008: You can see a thirteen minute video with Dr. Farid at PBS’s Nova web site. (Thanks to Jeff Tranberry.)

Posted in Color and Vision, Computing, Fodder for Techno-weenies, Photography | 3 Comments

Me Gusta Tacos y Burritos

I’m full. Sated. Happy.

Fish tacos, my friends. That’s what has brought me to this wonderful place. Batter-fried fish on corn tortillas with salsa, finely cut cabbage, and some secret ingredient that Rubio’s puts in their “Pesky” tacos. With tortilla chips and refried beans on the side. Mmm. . .

It would be wrong to say that I came to California for fish tacos, but I was certainly looking forward to it for several months.

No, I’m in San Jose attending the 20th annual joint SPIE/IS&T Electronic Imaging symposium. Sunday I attended nine hours of short courses: “Color Processing and its Characterization for Digital Photography” and “Perceptual Metrics for Image Quality Evaluation.” Today — before the fish tacos — I sat in on a full day of paper presentations spread over the “Human Vision and Electronic Imaging XIII”, “Image Quality and System Performance V”, and “Digital Photography IV” conferences. (Tomorrow: “Color Imaging XIII: Processing, Hardcopy, and Applications” and “Rocky IV”.) I had hoped to see “Inferring illumination direction estimated from disparate sources in paintings: an investigation into Jan Vermeer’s ‘Girl with a Pearl Earring’”, but I was glad that I heard Daniel Tamburrino‘s paper “Digital camera workflow for high-dynamic-range images using a model of retinal processing” instead.

(Supposing that I feel bored motivated, I will try to post some of my notes here in the coming days.)

I’ve only been here two and a half days, but I already feel like I’ve done so much. As I advocate in my book How to Get Rich through Petty Cash, I’m taking the opportunity to enjoy myself while on a business trip. If you travel around the country and only see the inside of your hotel room, you’re wasting your time on this earth. You’ve got to get out of the hotel, out of the high-priced bubble that surrounds any place where convention-goers congregate, and out of town if possible.

The trip from Boston on Saturday was one of the nicest cross-country flights I’ve ever had. It was my first time flying Jet Blue — I like the seats but don’t think I made the most of the seat-back amenities — and I was the only person in my row. Looking out my window I saw the white, snow-covered fields, ice-covered ponds and the sluggish rivers of New England give way to a deep shag of clouds over Minnesota and the corrugated origami of South Dakota and Wyoming. Rapid City, which I’ve never liked at ground-level, was a fine, delicate etched glass trophy on the edge of the blue Black Hills, the gateway to my old flame. Were those spiral holes in the ground and the furrows of overburden south of Gillette there last time? Surely those oil wells near Midwest still make the same unearthly bullfrog croaking I remember when we stopped the car to look at the bright smear of the milky way on a cold winter’s night more than a dozen years ago. And there’s the interstate leading to my city, my river, my mountain, my mother, my long-gone adolescent angst. I press my hand to the window. Clouds and snow fill in the depths of the Wind River Range, the last of the Rockies before the great folds in the earth when we enter Utah and then Terra Incognita and Terra Nullius in Nevada. Lake Tahoe, defiant, is not frozen but a deep black, unlike the muddy water covering fields in the Central Valley. And there’s glorious Point Reyes, unbelievably beautiful in the light of a western sun shining through broken clouds. Finally, the Golden Gate and its fabulous bridge.

After getting my rental car, I immediately headed to the SFMoMA, to see the Jeff Wall exhibit, which closed Sunday while I was in my classes. I like many of Jeff Wall’s photographs, but I’m deeply ambivalent about his work in general. First he has a reputation as the most cerebral living photographer, but I often I feel that the nonstop art historical references — less that a quarter of which I doubt I caught — get in the way of making a photograph that’s pleasing to look at. Should we really let folks like Wolfgang Tillmans, Jürgen Teller (NSFW), and Wall — or me for that matter — revel in elevating every ordinary scene and still claim a fig leaf of art historical pretension? Perhaps if I were more of an insider, I would be less ambivalent; but such is the way with me and all modern art. An-My Lê’s photographs from her Small Wars and 29 Palms series were perfect. And the black-and-white and color work of the Silicon Valley from Gabriele Basilico had me amazed and envious. Finally, I have to admit that despite liking monographs better than surveys, and themed exhibitions better than a hodge-podge of recent acquisitions, I liked Picturing Modernity, a hodge-podge survey of photographs from the museums collection, very loosely grouped around a two word title and including several pictures that (too conveniently) would have fit in recent exhibits at the National Gallery of Art (like this one and that one). Surely it was the luscious, large deadpan photos at the exhibit entrance that enticed me to give it a free pass.

Well that’s all that’s new from the other coast. I wish Lisa were here, and I miss having the cat lie upon my lap while I scratch under his chin. But these small prices must be paid by an international playboy with an expense account a conference to attend.

Posted in Color and Vision, OPP, Photography, This is who we are, Travel, USA | 1 Comment

Privilege survey

As seen at The Clutter Museum:

Bold the statements that are true.

  1. Father went to college
  2. Father finished college (don’t know . . . maybe)
  3. Mother went to college
  4. Mother finished college (The same year I did, for which we were all very proud.)
  5. Have any relative who is an attorney, physician, or professor.
  6. Were the same or higher class than your high school teachers.
  7. Had more than 50 books in your childhood home. (You really only needed one book.)
  8. Had more than 500 books in your childhood home.
  9. Were read children’s books by a parent
  10. Had lessons of any kind before you turned 18
  11. Had more than two kinds of lessons before you turned 18
  12. The people who dress and talk like me are portrayed in the media
  13. Had a credit card with your name on it before you turned 18
  14. Your parents (or a trust) paid for the majority of your college costs (I don’t think so. It was all very sudden.)
  15. Your parents (or a trust) paid for all of your college costs
  16. Went to a private high school
  17. Went to summer camp
  18. Had a private tutor before you turned 18
  19. Family vacations involved staying at hotels
  20. Your clothing was all bought new before you turned 18 (after a while)
  21. Your parents bought you a car that was not a hand-me-down from them (I loved that ’63 Dodge Dart.)
  22. There was original art in your house when you were a child
  23. You and your family lived in a single-family house (about half my childhood)
  24. Your parent(s) owned their own house or apartment before you left home
  25. You had your own room as a child
  26. Participated in a SAT/ACT prep course
  27. Had your own TV in your room in high school
  28. Owned a mutual fund or IRA in high school or college
  29. Flew anywhere on a commercial airline before you turned 16 (I won a trip to Chicago, which was exciting.)
  30. Went on a cruise with your family
  31. Went on more than one cruise with your family
  32. Your parents took you to museums and art galleries as you grew up
  33. You were unaware of how much heating bills were for your family

12/34 — Now it’s your turn. Feel free to use the comments if you don’t have your own web log.

Posted in General, This is who we are, Uncategorized | Leave a comment

A type of addiction

As seen in the comments of Coding Horror’s article Typography: Where Engineers and Designers Meet:

Typefaces are the gateway drug of typography. It starts with fonts, then you’ll be hand kerning and eventually you’ll be up all night with grids and vertical rhythm and column width. Be careful, kids.

Don’t I know it! One day you’re saying “I’ll never use Arial” and changing the font in your annual review to something with more appeal than Times New Roman. Then you’re downloading Linotype’s FontExplorer™ X and reading Wikipedia articles about Apple Advanced Typography. A bit later you’re learning about how to create an input method editor for Mac OS X and writing about typography weblogs. After a while, you’re standing in the middle of the mall pointing out all of the Helvetica inspired typefaces to your spouse. . . .

Posted in Computing, I like type | 1 Comment

Agile Software Development for Executives

A quick post today containing two links:

“Agile” is a family of light-weight, quality-based software development methodologies that have many things going for them. Put over the last three semesters in my software engineering program, I have noticed that very few people work in agile organizations or teams and that there’s a great deal of skepticism from the top-down. I personally like it a lot, though my work team is still learning how to make the most out of it.

Perhaps these two links from Brad Appleton’s ACME Blog will help convince some people jump into the modern age. (Seen at Software Best Practices.)

Posted in Software Engineering | Leave a comment

The JPEG Family Circus

The discussion in the comments of my recent article about HD Photo (a.k.a. JPEG-XR) got me thinking about all of the different beasts that go by the name “JPEG.”

JPEG: What most of us consider to be “JPEG” is just one of many processes for image encoding and decoding defined within the same specification. The process that makes up 99.99999% of all of the JPEGs ever created is “JPEG Baseline (Process 1)” for 8-bit lossy compression. (That’s just my estimate, which is probably low. It’s probably better to say “almost 100% of all JPEGs.”)

This process divides an image into a bunch of 8×8 blocks, uses the discrete cosine transform (DCT) to move the data into the frequency domain, and compresses the data by (among other things) removing some of the high frequency data that the human visual system usually can’t detect. You can think about it as abridging a novel by taking out a few sentences per paragraph. Unfortunately, if the quality settings are too low, it’s really easy to notice that something has gone missing; or if a scene has a lot of information — such as one with lots of fine detail — there will be blocky artifacts where there the detail should be.

While the removal of high-frequency detail is inherently lossy, even with a maximum quality setting, the original JPEG standard specified a separate lossless mode not based on the DCT. Images compressed this way can be completely retrieved from the compressed data. This is important when you need to preserve all of the data within in image or when adding artifacts can have devastating consequences. “Is that a nodule in the patient’s chest X-ray or a JPEG compression artifact? I guess we’d better do a biopsy just in case….” In fact, the lossless modes for JPEG are really only used within DICOM files, the format used for digital imaging and communications in medicine.

Old school JPEG also supports 12 and 16 bits of data in each channel of a pixel. For color images, this is the difference between about 17 million colors for an 8-bit image, 68 billion colors for a 12-bit image, and 281 trillion colors when using 16 bits. Once again, only those medical imaging people use the extra bit depths, and they just use the gray colors.

JPEG-LS was supposed to be a better lossless format but never really got going. The promises of JPEG 2000 probably had a lot to do with this.

JPEG 2000 is (1) a wavelet-based compression method, (2) a scheme for encoding wavelet-compressed images into randomly accessable “codestreams”, and (3) a file format for encapsulating compressed codestreams. Because it uses a discrete wavelet transform (DWT) the results are generally better than the older JPEG format when comparing images with the same compression ratio.

Images in JPEG 2000 can have an arbitrary bit depth (1 – 32 bps), and different planes can have different bit depths. (For example the luminance channel of a YCbCr image can have a high bit depth to support HDR imagery.) Certain portions of an image can have higher spatial resolution or be encoded at a different compression level. JPEG 2000 has both lossy and lossless components as part of the baseline. Several colorspaces are supported, including bi-level, grayscale, sRGB, YCbCr, and indexed imagery. Hyperspectral and n-sample images are supported using a somewhat convoluted “multi-component” schema. Images can also include alpha channels for transparency. A really amazing thing about JPEG 2000 is that its possible to reorder the parts of the codestream to change how the data is accessed (e.g. access regions faster v. access different resolutions faster) without decompressing and recompressing the data, which can be expensive.

The JPEG 2000 file format uses about 20 hierarchical “boxes” to nest metadata about the compressed codestreams. While the file format is technically unnecessary to read and process a JPEG 2000 image, the extra formatting facilitates random data access, long-term cataloguing and IP management, and efficient transmission. JPEG 2000 files can also contain a limited subset of ICC color profiles. EXIF metadata support is not part of the JPEG 2000 standard, although it can appear as a private metadata field.

JPEG 2000 was touted as the format to replace the 1991 JPEG standard, but this didn’t happen for several reasons. Perhaps most important, the algorithms at the heart of JPEG 2000 require a lot of processing power, making it slower for desktop computers than rendering old-school JPEG and prohibitive for many embedded devices. As of 2007, few Web browsers have built-in support for it, and consumer-level digital cameras don’t produce imagery in the format. In 2007, Adobe Photoshop CS3 stopped including the JPEG 2000 export module in a typical installation.

But because of the smaller file size, flexibility, and more pleasing artifact appearance, the medical and remote sensing communities have adopted it. Both NITF and DICOM have incorporated JPEG 2000 data into their files. NITF is the friendly format used for “national imagery.” I will let you Google that so the NSA can start tracking you.

JPEG-XR is the name that Microsoft’s HD Photo format might have if it’s standardized, which I sincerely hope it will be. JPEG-XR uses a principal components photo core transform (PCT) which I know absolutely nothing about but which promises equivalent performance to JPEG 2000 with lower computational complexity — which means you can put it on a consumer device more easily — and much better size-versus-quality performance compared to the original JPEG format. It also supports more bit depths, high dynamic range imagery, lossy and lossless encoding/decoding using the same algorithm, and wide gamut color; uses a linear light gamma making it possibly suitable to replace RAW formats or enable post-CRT workflows; and can store bucketloads of metadata including EXIF and XMP.

JPEG-Plus. And then there’s JPEG+, which you might reasonably call JPEG – 20% because it’s essentially the same as the original DCT-based JPEG with a modest file-size performance improvement and some claims about better visual appearance. I’m not holding my breath for it; but given the 29+ processes that made up the original JPEG standard, what’s an extra one that no one will implement?

Update: For posterity, the PCT stands for “Photo Core Transform” not “Principal Component Transform”. Thomas Richter said this about it on sci.image.processing:

The transform is an overlapped 4×4 block transform that is related to a traditional DCT scheme, or at least approximates it closely. The encoding is a simple adaptive huffman with a move-to-front list defining the scanning order, and an inter-block prediction for the DC and the lowest-frequency AC path of the transformation.

Some parts are really close to H264 I-frame compression, i.e. the idea to use a pyramidal transformation scheme and transform low-passes again (here with the same, in H264 with a simpler transformation).

The good part is that lossy and lossless use the same transformation. The bad part is that the quantizer is the same for all frequencies, meaning there is no CSF adaption, and the entropy coder back-end is not state of the art.

Update 3-February-2009: JPEG-XR has advanced to “draft standard balloting,” which means it’s very likely it will become a standard (unless everyone hates it, of course). More info…

Posted in Computing, File Formats, Fodder for Techno-weenies, Photography | 8 Comments

Microsoft HD Photo

For my 9-5, I wrangle file formats, making it possible for people to read images into MATLAB and then do something useful with them. I’m always on the look out for what’s new, and there’s always something new. (I only joke about employing a legion of developers in Bangalore to develop new formats so that I can stay employed.) Lately I’ve been looking at Microsoft’s HD Photo format (a.k.a. Windows Media Photo, or WMP, format).

Once upon a time you could count on Microsoft’s image file formats to suck. Just take BMP as an example: a more-or-less unpublished format that changed between revisions of Windows and which you were supposed to access primarily through Windows API calls. File I/O people (like me) needed to have the bits of information that we could crib together handy so that we could figure out what datatype a given piece of data should have because the file didn’t carry that information in it. Nor was there a data dictionary to programmitcally look it up. They couldn’t have designed it worse if they had tried; but they clearly didn’t do any (good) design on the format. My professional opinion: totally sucked.

But perhaps they’ve learned the error of their ways with their new HD Photo format, which they have submitted for standardization. It appears that they’ve talked to a lot of digital image users about their needs and given a lot of thought to the format. I’m about halfway through the WinHEC 2007: HD Photo Implementation Guidelines presentation, which certainly says all the right things.

My only worry is that they’re saying so many of the right things that they’re possibly heading toward a couple of bad outcomes. Let’s call these the unintended consequences of designing a “good” format:

  1. The “Kitchen Sink Problem” (a.k.a. DICOM) — Everybody gets what they want, including things of questionable utility like CMYK + alpha and 5-6-5 or 10-10-10 RGB encoding. The problem is that format readers and writers have to decide what to support or use somebody else’s (possibly license-encumbered) code to read a simple file. The imperfect solution to this — which is what DICOM, NITF, and TIFF have done — is to create a mechanism for specifying compliance level.
  2. The “Infinite Configurability Problem” (a.k.a. JPEG-2000) — As a file I/O developer, you either have to take everything that comes your way or you have to tell pick what you think is important and hope that you pick the right set of features. HD Photo supports settings for chroma subsampling, overlap processing, frequency or spatial decoding priority, alpha interleaved by pixel or as a separate component, spatial transformations, gamut-management as part of the format, etc. These all have their place, but it might be a bit much. And JPEG-2000 has shown us that when you make a format too smart, it takes a long time to get adopted.

This brings me ’round to a thought I’ve had a lot recently. Image file formats have become platforms for working with data rather than containers for communicating images. The important thing is to get the data down on disk as quickly as possible and then change the interpretation of the pixels later. It looks like HD Photo might be self-contained, but some formats (like DNG or various HDR formats, including HD Photo) don’t include all of the information about how the image should look. If you add Adobe’s special sauce while reading the image in the DNG file, you’ll get a beautifully rendered image. If you use another vendor’s tools or provide other raw conversion settings, you will likely get an image that looks rather different. Similarly, there is no one correct rendering of an HDR image, and I don’t know of any formats that specify a preferred way of tone mapping.

It’s a brave new world of image file formats. We file format developers live in interesting times.

Posted in Computing, File Formats, Fodder for Techno-weenies, Photography | 5 Comments