How a picture can be represented as a collection of numbers

James Milch, Eastman Kodak Company

All sorts of images have "gone digital" in the last five years: digital television, digital photos, and digital movies. The essential element of all these new technologies is the translation of a scene or picture into a collection of numbers-a digital image. These numbers can then be stored, modified, transmitted, and finally converted back into something we can see. The physical device that creates the digital image is usually called a camera or a scanner; the physical device that converts the digits into something we can see is called a printer or display.

The first step in creating a digital image is to select the area or scene to be captured. Sometimes this is obvious (e.g. a printed photograph). In a camera, the photographer uses the viewfinder to "frame" the picture. For simplicity, digital images usually represent a rectangular area of view. Figure 1 shows an example.

Figure 1.

Every scene is a pattern of light and dark regions. Our brains interpret the pattern and thus we see people, buildings, pillars, roofs, and lions. The process of creating a digital image ignores the objects in the picture. It just deals with dark and light. (This tutorial will cover only black-and-white pictures. The extension of the information here to color pictures is explained in another tutorial.)

The second step in creating a digital image is to divide the rectangular image into little squares. Figure 2 shows the general idea: many small squares, all the same size. (But the squares shown here are much too big, as will become clear in the next step.) Each square is called a "pixel". This is an abbreviation for the phrase "picture element".

Figure 2
Figure 2.

So far, there are no numbers in this description-just rectangles and squares. The third step creates the numbers that describe the image. This is how it is done:

  • Find the darkest part of the image. Call that level of brightness zero.
  • Find the lightest part of the image. Assign a positive integer, like 1000, to that level of brightness.
  • Finally, for every pixel, average the level of brightness in the part of the scene covered by that pixel, and assign a number between 0 and 1000 to the average value. (No fractions or decimals, just whole numbers!)

Now the picture is "gone". It has been replaced by the numbers associated with every pixel. These numbers represent the average brightness in each pixel. Therefore, the pixels should be very small, and there should be many of them. The squares in Figure 2 are fine for the sky area, but elsewhere they are so large that there is significant brightness variation. The average values for each pixel will not represent the picture very faithfully.

In Figure 3, an enlargement of one part of the picture (the heads of the people in the lower left corner) is shown with more practical pixel sizes. These pixels are almost small enough to faithfully represent the detail in the picture. The entire picture is covered with these small pixels.

Figure 3
Figure 3.

There are many small variations on this "digitization" process. Often the range of 0 to 255 is used because each of these numbers can be stored in one byte of memory. Sometimes zero is the brightest pixel and 1000 is the darkest. There are even cases in which the pixel is shaped like a rectangle, rather than a square. None of these variations change the basic idea, but it is important to know just how the digitization was done. (See the discussion of "metadata" below.)

There are two important decisions buried in process described above. They determine how faithfully, or accurately, the digital image represents the real scene. The two decisions are:

  1. How many pixels are needed?
  2. How many different values are needed to cover the range of brightness in the pixels without the steps between them becoming visible?

The number of pixels that are needed and the range of values that must be available for each pixel determine how much digital space is needed to hold the digital image and how much time is needed to handle it. Both should be as small as possible, without degrading the image quality. This is such an important issue that a new area of computer science, image compression, was invented to squeeze digital images into small spaces. You can read a separate tutorial on image compression.

Each of the two questions can be very deep. The right answers depend on the construction of the camera or scanner, the content of the picture, the nature of human vision, and the intended use of the digital image. There is no single, perfect answer. In practice, one set of answers has been chosen for each major category of digital image and everyone uses that standard approach for the category.

First, how many pixels are needed to represent the image well? This is sometimes called the "resolution" of the image. (There are many meanings of the word "resolution" as applied to digital images. The number of pixels is one way to measure of the spatial resolution of the image.) Video images have only 307,000 pixels (640 pixels across the width and 480 pixels down the height of the rectangle). A mid-range digital camera provides about 5 million pixels, or 5 megapixels (2500 across and 2000 down). A digital image from medical x-ray system contains about 9 megapixels.

Remember that the pixel value is the average of the brightness in its little square. Intuitively, then, any detail in the scene which is smaller than a pixel will not be captured in the digital image. It will just be averaged away. In fact, objects must be several pixels wide to be reproduced well in the digital image. The precise answer to this question comes from careful application of an advanced mathematical theory called Fourier Analysis to the imaging system. The smaller the details in the picture, and the better the quality of all the physical system components, the more pixels are needed to do a good job.

Second, how many different values are needed to cover the range of brightness in the picture? This is related to the "pixel depth" of the image. Modern digital systems all are built on binary arithmetic (base 2), so the number of values is usually a power of 2. Common choices are 256 levels, 1024 levels, 4096 levels, or even 65000 levels (pixel depths of 8, 10, 12, or 16 bits). Video images use 256 levels. A digital image from a medical x-ray system usually keeps at least 4096 levels of brightness.

The best pixel depth for a specific case depends very much on how the digital image will be viewed. The human eye can only distinguish about 100 different levels of lightness at the same time. That should mean that 255 different values in the image is plenty. But it is not quite that simple. First, the primary purpose of the image may not be direct human viewing. Second, the image may get modified (see the tutorial on image processing) before it gets viewed. And third, the 100 different levels of lightness are not equally spaced. Humans can see more levels in the darker parts of a picture than in the lighter parts.

The cure for the first two issues is simple: store more than 255 levels. (This is the reason that medical images use more than 8 bits.) There is a neat solution for the third problem that is used by many digital systems. Instead of storing the average brightness in a pixel, they store the square root (approximately) of the brightness. For example, if the brightest part of the image is stored as 255, a pixel that is only 10% as bright is stored as 80. (Do the math to convince yourself that the answer is 80, instead of 25, as one might expect.) This gives more levels in the darker part of the picture.

The result of the process described so far is a collection of numbers, one for each pixel. These numbers are called the "image data". Usually they are listed in some standard way, like "start in the upper left-hand corner and go across left to right, then do the next line down from left to right". The order makes no difference, as long as the user of the image knows how it was done. This is one more piece of information (like the number of levels used, or the pixel count) that describes the digital image. All this extra information is called "metadata". When digital cameras create an image, they store lots of information as metadata, like the time the picture was taken, the distance of the subject from the camera, and the exposure time.

This tutorial describes the creation of a digital image from a real picture or scene. There is actually a quite different approach that is often used for digital images that are created entirely inside a computer. A "graphical description language" (sometimes called a "page description language") gives step-by-step instructions for creating the image. Graphical description languages work best for images made up only of text characters and geometrical forms. There is more information on this method of describing an image in another tutorial.

Here is a summary of the process for creating a digital image, for the specific case of a 5 megapixel digital camera:

  • Use the zoom control and the viewfinder to identify the part of the scene that will be recorded.
  • Divide the scene into pixels: 2500 across the width of the scene and 2000 across the height.
  • Measure the average brightness in each pixel and calculate its square root. Record the number 255 for the brightest pixels, the number 0 where there was no light in the scene, and a proportional number in between for each of the other pixels.
  • List all 5,000,000 numbers, starting in the upper left-hand corner.
  • Tack on a few extra numbers that will tell everyone how to interpret this long list of numbers.

Now all the information about that scene is expressed in a series of numbers. The numbers can be stored in any sort of digital memory and transmitted by many different devices. The numbers don't change, so the image never changes. Every copy is a perfect copy. The image can be changed intentionally by a process called digital image processing. You can read about this powerful tool in another tutorial.

Published: 8/27/2009