15-01-2013, 01:58 PM
Fundamentals of Embedded Video Processing
Fundamentals of Embedded.pdf (Size: 3.03 MB / Downloads: 105)
Human Visual Perception
Let’s start by discussing a little physiology. As we’ll see, understanding how our eyes work has paved an important path in the evolution of video and imaging.
Our eyes contain 2 types of vision cells: rods and cones. Rods are primarily sensitive to light intensity as opposed to color, and they give us night vision capability. Cones, on the other hand, are not tuned to intensity, but instead are sensitive to wavelengths of light between 400nm(violet) and 770nm(red). Thus, the cones provide the foundation for our color perception.
There are 3 types of cones, each with a different pigment that’s either most sensitive to red, green or blue energy, although there’s a lot of overlap between the three responses. Taken together, the response of our cones peaks in the green region, at around 555 nm. This is why, as we’ll see, we can make compromises in LCD displays by assigning the Green channel more bits of resolution than the Red or Blue channels.
The discovery of the Red, Green and Blue cones ties into the development of the trichromatic color theory, which states that almost any color of light can be conveyed by combining proportions of monochromatic Red, Green and Blue wavelengths.
Because our eyes have lots more rods than cones, they are more sensitive to intensity rather than actual color. This allows us to save bandwidth in video and image representations by subsampling the color information.
Our perception of brightness is logarithmic, not linear. In other words, the actual intensity required to produce a 50% gray image (exactly between total black and total white) is only around 18% of the intensity we need to produce total white. This characteristic is extremely important in camera sensor and display technology, as we’ll see in our discussion of gamma correction. Also, this effect leads to a reduced sensitivity to quantization distortion at high intensities, a trait that many media encoding algorithms use to their advantage.
Broadcast TV – NTSC and PAL
Analog video standards differ in the ways they encode brightness and color information. Two standards dominate the broadcast television realm – NTSC and PAL. NTSC, devised by the National Television System Committee, is prevalent in Asia and North America, whereas PAL (“Phase Alternation Line”) dominates Europe and South America. PAL developed as an offshoot of NTSC, improving on its color distortion performance. A third standard, SECAM, is popular in France and parts of eastern Europe, but many of these areas use PAL as well. Our discussions will center on NTSC systems, but the results relate also to PAL‐based systems.
Video Resolution
Horizontal resolution indicates the number of pixels on each line of the image, and vertical resolution designates how many horizontal lines are displayed on the screen to create the entire frame. Standard definition (SD) NTSC systems are interlaced‐scan, with 480 lines of active pixels, each with 720 active pixels per line (i.e., 720x480 pixels). Frames refresh at a rate of roughly 30 frames/second (actually 29.97 fps), with interlaced fields updating at a rate of 60 fields/second (actually 59.94 fields/sec).
High definition systems (HD) often employ progressive scanning and can have much higher horizontal and vertical resolutions than SD systems. We will focus on SD systems rather than HD systems, but most of our discussion also generalizes to the higher frame and pixel rates of the high‐definition systems.
When discussing video, there are two main branches along which resolutions and frame rates have evolved. These are computer graphics formats and broadcast video formats. Table 1 shows some common screen resolutions and frame rates belonging to each category. Even though these two branches emerged from separate domains with different requirements (for instance, computer graphics uses RGB progressive‐scan schemes, while broadcast video uses YCbCr interlaced schemes), today they are used almost interchangeably in the embedded world. That is, VGA compares closely with the NTSC “D‐1” broadcast format, and QVGA parallels CIF. It should be noted that although D‐1 is 720 pixels x 486 rows, it’s commonly referred to as being 720x480 pixels (which is really the arrangement of the NTSC “DV” format used for DVDs and other digital video).
Color Spaces
There are many different ways of representing color, and each color system is suited for different purposes. The most fundamental representation is RGB color space.
RGB stands for “Red‐Green‐Blue,” and it is a color system commonly employed in camera sensors and computer graphics displays. As the three primary colors that sum to form white light, they can combine in proportion to create most any color in the visible spectrum. RGB is the basis for all other color spaces, and it is the overwhelming choice of color space for computer graphics.
Gamma Correction
“Gamma” is a crucial phenomenon to understand when dealing with color spaces. This term describes the nonlinear nature of luminance perception and display. Note that this is a twofold manifestation: the human eye perceives brightness in a nonlinear manner, and physical output devices (such as CRTs and LCDs) display brightness nonlinearly. It turns out, by way of coincidence, that human perception of luminance sensitivity is almost exactly the inverse of a CRT’s output characteristics.
Stated another way, luminance on a display is roughly proportional to the input analog signal voltage raised to the power of gamma. On a CRT or LCD display, this value is ordinarily between 2.2 and 2.5. A camera’s precompensation, then, scales the RGB values to the power of (1/gamma).
The upshot of this effect is that video cameras and computer graphics routines, through a process called “gamma correction,” prewarp their RGB output stream both to compensate for the target display’s nonlinearity and to create a realistic model of how the eye actually views the scene. Figure 1 illustrates this process.