天天看点

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

嵌入式视频处理基本原理part1

默认分类 2009-12-15 09:02:55 阅读108 评论0   字号:大中小 订阅

Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )

嵌入式视频处理基本原理(第1部分,共5部分)

By David Katz and Rick Gentile, ADI公司

As consumers, we’re intimately familiar with video systems in many embodiments.  However, from the embedded developer’s viewpoint, video represents a tangled web of different resolutions, formats, standards, sources and displays.   

作为消费者,我们对于各种形式的视频系统都已经了如指掌。不过,从嵌入式开发者的角度来看,视频技术就好象是一个具有不同的分辨率、格式、标准、信源和显示的复杂网络。

In this series, we will strive to untangle some of this intricate web, focusing on the most common circumstances you’re likely to face in today’s media processing systems.  After reviewing the basics of video, we will discuss some common scenarios you may encounter in embedded video design and provide some tips and tricks for dealing with challenging video design issues.  

在本章中,我们只对视频中的某些方面进行具体阐述,这些方面都是当今多媒体处理系统中比较常见的问题。首先,简要介绍视频方面的基本知识,然后,将会重点讨论一些在嵌入式多媒体设计中常见的问题,同时,也将针对一些具有挑战性的视频设计问题,提供一些技巧与窍门。

Human Visual Perception  

人类视觉感知

Let’s start by discussing a little physiology.  As we’ll see, understanding how our eyes work has paved an important path in the evolution of video and imaging.

首先来讨论一些简单的生理学问题。正如我们将看到的那样,对我们的眼睛的工作原理的理解为视频和成像技术的发展铺设出一条重要的道路。

Our eyes contain 2 types of vision cells: rods and cones.  Rods are primarily sensitive to light intensity as opposed to color, and they give us night vision capability. Cones, on the other hand, are not tuned to intensity, but instead are sensitive to wavelengths of light between 400nm(violet) and 770nm(red).  Thus, the cones provide the foundation for our color perception.  

眼睛包含两种视觉细胞:杆状细胞和视锥细胞。杆状细胞主要对亮度信息敏感,而对颜色信息不敏感,它们使我们具备夜视能力。与此相反,视锥细胞对亮度并不敏感,但对400nm(紫光)~770nm(红光)波长范围内的光比较敏感。因此,这些视锥细胞使我们能够感知色彩。

There are 3 types of cones, each with a different pigment that’s either most sensitive to red, green or blue energy, although there’s a lot of overlap between the three responses. Taken together, the response of our cones peaks in the green region, at around 555 nm.  This is why, as we’ll see, we can make compromises in LCD displays by assigning the Green channel more bits of resolution than the Red or Blue channels.  

视锥细胞有3种,每一种都带有不同的色素,分别对红光、绿光或者蓝光波长敏感,虽然这3种细胞的响应特性有重叠区域。总的说来,视锥细胞对波长在555nm左右的绿光区域最为敏感。这也就是为什么在LCD显示器中,绿色通道的分辨率高于红色和蓝色通道。

The discovery of the Red, Green and Blue cones ties into the development of the trichromatic color theory, which states that almost any color of light can be conveyed by combining proportions of monochromatic Red, Green and Blue wavelengths. 

红色、绿色和蓝色视锥细胞的发现大大促进了三色理论的发展,该理论认为,任何一种有色光,可以通过不同比例的红光、绿光和蓝光的组合生成。

Because our eyes have lots more rods than cones, they are more sensitive to intensity rather than actual color.  This allows us to save bandwidth in video and image representations by subsampling the color information.

由于人眼含有的杆状细胞的数量要远多于视锥细胞,故眼睛对亮度的敏感度要高于对色彩的敏感度。这使得我们可以借助对色彩信息的子采样来节省视频和图像信息的带宽。

Our perception of brightness is logarithmic, not linear. In other words, the actual intensity required to produce a 50% gray image (exactly between total black and total white) is only around 18% of the intensity we need to produce total white.  This characteristic is extremely important in camera sensor and display technology, as we’ll see in our discussion of gamma correction. Also, this effect leads to a reduced sensitivity to quantization distortion at high intensities, a trait that many media encoding algorithms use to their advantage.  

我们对亮度的感受特性是对数性的,而非线性的。换句话说,用于产生50%灰度图(恰好在全黑和全白之间的正中)所需的实际的光强仅为我们需要产生全白图像所需的光强的18%。这一特性在相机传感器和显示技术中尤为重要,正如我们将在后面的伽马校正中讨论的。此外,这一效应还将导致人眼对高亮度环境下的量化失真的感知度下降,导致这一特性被许多媒体编码算法所利用。

Another visual novelty is that our eyes adjust to the viewing environment, always creating their own reference for white, even in low-lighting or artificial-lighting situations.  Because camera sensors don’t innately act the same way, this gives rise to a white balance control in which the camera picks its reference point for absolute white.

视觉方面的另一新奇之处在于,人眼可以适应环境,创建自己的白光参考,即使在低照明或者人工照明的情况下也是如此。因为摄像传感器自身并不具有这一特性,因此它需要使用参考量作为绝对白色,并对传感器进行调整,这一过程称为称为白平衡控制。

The eye is less sensitive to high-frequency information than low-frequency information. What’s more, although it can detect fine details and color resolution in still images, it cannot do so for rapidly moving images. As a result, transform coding (DCT, FFT, etc.) and low-pass filtering can be used to reduce total bandwidth needed to represent an image or video sequence.   

人眼对高频信息的敏感性要低于对低频信息。而且,虽然它可以检测出静态图像中细节和彩色部分的分辨率,但对于快速移动的图像,却无法做到这一点。于是,人们可以利用变换编码(DCT、FFT等)以及低通滤波技术来降低呈现一幅图像或者视频序列时所需的总带宽。

Our eyes can notice a “flicker” effect at image update rates less than 50-60 times per second, or 50-60 Hz, in bright light. Under dim lighting conditions, this rate drops to about 24 Hz. Additionally, we tend to notice flicker in large uniform regions more so than in localized areas. These traits have important implications for interlaced video, refresh rates and display technologies.  

当图像的刷新速率低于50~60次/s时,我们的眼睛会感受到一种亮光“闪烁”的效应。在光线较暗的情况下,该频率值降低到24Hz。此外,我们更倾向于观察到大而均匀的区域内的闪烁,相比之下,对局部区域的闪烁敏感度较低。这些特性对于隔行视频、刷新速率和显示技术具有重要的潜在作用。

What’s a video signal? 

何谓视频信号?

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 1: Composition of Luma signal    

图1  亮度信号的组成

图中:Breakdown of Luma Signal——亮度信号的分类,Back Porch——后沿,Horizontal syc——水平同步,White level——白色级,Grey Level——灰色级,Black Level——黑色级

At its root, a video signal is basically just a two-dimensional array of intensity and color data that is updated at a regular frame rate, conveying the perception of motion. On conventional cathode-ray tube (CRT) TVs and monitors, an electron beam modulated by the analog video signal shown in Figure 1 illuminates phosphors on the screen in a top-bottom, left-right fashion.  Synchronization signals embedded in the analog signal define when the beam is actively “painting” phosphors and when it is inactive, so that the electron beam can retrace from right to left to start on the next row, or from bottom to top to begin the next video field or frame.  These synchronization signals are represented in Figure 2.    

从根本上来说,一个视频信号基本上只是由亮度和色彩数据构成的2维阵列,该阵列以一定帧率的刷新变化来描述运动。在传统的阴极射线管(CRT)电视和显示器中,屏幕上的磷粉由一个电子束从上到下、从左到右的方式激发产生光亮。该电子束是由一个如图1所示的模拟视频信号调制生成。嵌入该模拟信号中的同步信号,决定了电子束什么时候激亮磷粉,什么时候停止操作。这样电子束可以在下一行由右向左回程扫描,或者从下到上开始对下一帧视频场或帧信号进行扫描。这些同步信号如图2所示。

HSYNC is the horizontal synchronization signal. It demarcates the start of active video on each row (left to right) of a video frame. Horizontal Blanking is the interval in which the electron gun retraces from the right side of the screen back over to the next row on the left side.     

HSYNC是水平同步信号。它界定了视频帧每一行中(从左到右)有效视频的起始位置。水平消隐为电子枪从屏幕右侧回扫至下一行左侧的时间间隔。

VSYNC is the vertical synchronization signal. It defines the start (top to bottom) of a new video image. Vertical Blanking is the interval in which the electron gun retraces from the bottom right corner of the screen image back up to the top left corner.   VSYNC是垂直同步信号。它定义了一个新的视频图像的起始位置(从上到下)。垂直消隐为电子枪从屏幕图像的右下角返回左上角所需的时间间隔。

FIELD distinguishes, for interlaced video, which field is currently being displayed.  This signal is not applicable for progressive-scan video systems.   

FIELD用于在隔行视频信号中区分出目前所显示的场。该信号并不适用于逐行扫描视频系统。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 2: Typical timing relationships between HSYNC, VSYNC, FIELD

图2  HSYNC、VSYNC、FIELD信号之间的时序关系

The transmission of video information originated as a display of relative luminance from black to white – thus was born the black-and-white television system. The voltage level at a given point in space correlates to the brightness level of the image at that point.  

视频信息的传输起源于由黑到白的相关亮度显示,黑白电视系统也是这样产生的。在空间中的一个给定点处的电压水平则与该点图像的亮度水平相关。

When color TV became available, it had to be backward-compatible with B/W systems, so the color burst information was added on top of the existing luminance signal, as shown in Figure 3. Color information is also called chrominance. We’ll talk more about it in our discussion on color spaces (in part 2 of this series).   

当彩色电视出现后,它必须保证与黑白电视的后向兼容,因此彩色脉冲信息被添加到已有的亮度信号顶部,如图3所示。色彩信息也被称为色度。我们将在关于色彩空间的讨论中更多的探讨这一问题(见该系列文章的第2部分)。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 3: Analog video signal with color burst

图3   带色同步信号的模拟视频信号

图中:Luma Channle——亮度通道,Chroma Channel——色度通道,Composite Video signal——复合视频信号。

Color Burst Demodulation Reference Signal——色彩脉冲解调参考信号

Broadcast TV – NTSC and PAL    

广播电视——NTSC和PAL制式

Analog video standards differ in the ways they encode brightness and color information. Two standards dominate the broadcast television realm – NTSC and PAL.  NTSC, devised by the National Television System Committee, is prevalent in Asia and North America, whereas PAL (“Phase Alternation Line”) dominates Europe and South America.  PAL developed as an offshoot of NTSC, improving on its color distortion performance. A third standard, SECAM, is popular in France and parts of eastern Europe, but many of these areas use PAL as well.  Our discussions will center on NTSC systems, but the results relate also to PAL-based systems.    

模拟视频标准的区别在于它们各自对亮度和彩色信息的编码方式。目前广播电视领域占统治地位的是两种标准——NTSC和PAL。NTSC由美国国家电视系统委员会提出,在亚洲和北美广泛使用,而PAL是NTSC的一个分支,在欧洲和南美占据统治地位。另外一种制式,SECAM,则在法国和东欧部分地区流行,不过,在这些地区中,许多也都采用PAL。我们的讨论将集中在NTSC制上,但讨论的结果也适用于基于PAL制的系统。

Video Resolution  

视频分辨率

Horizontal resolution indicates the number of pixels on each line of the image, and vertical resolution designates how many horizontal lines are displayed on the screen to create the entire frame.  Standard definition (SD) NTSC systems are interlaced-scan, with 480 lines of active pixels, each with 720 active pixels per line (i.e., 720x480 pixels).  Frames refresh at a rate of roughly 30 frames/second (actually 29.97 fps), with interlaced fields updating at a rate of 60 fields/second (actually 59.94 fields/sec).   

水平分辨率是指图像每行的像素个数,而垂直分辨率则是指显示完整一帧时屏幕上出现的水平线的数量。标清NTSC系统采用隔行扫描方式,具有480线有效像素,每条线上有720个有效的像素(即总计720×480像素)。

High definition systems (HD) often employ progressive scanning and can have much higher horizontal and vertical resolutions than SD systems.  We will focus on SD systems rather than HD systems, but most of our discussion also generalizes to the higher frame and pixel rates of the high-definition systems.   

高清系统常常采用逐行扫描方式,其水平和垂直分辨率要远高于标清系统。我们将专注于标清系统,而非高清系统,但我们讨论的大部分,也将推广到具有更高帧和像素传输率的高清系统。

When discussing video, there are two main branches along which resolutions and frame rates have evolved.  These are computer graphics formats and broadcast video formats.  Table 1 shows some common screen resolutions and frame rates belonging to each category.  Even though these two branches emerged from separate domains with different requirements (for instance, computer graphics uses RGB progressive-scan schemes, while broadcast video uses YCbCr interlaced schemes), today they are used almost interchangeably in the embedded world.  That is, VGA  compares closely with the NTSC “D-1” broadcast format, and QVGA parallels CIF.  It should be noted that although D-1 is 720 pixels x 486 rows, it’s commonly referred to as being 720x480 pixels (which is really the arrangement of the NTSC “DV” format used for DVDs and other digital video). 

在讨论视频技术时,分辨率和帧速率的提升是沿着两条主要的分支发展的,即计算机图形图像格式和广播视频格式。表1给出了各种常见的屏幕分辨率和帧率的比较。即使这两路分支源于不同的领域,而且要求也不同(例如,计算机图形显示使用RGB逐行扫描方法,而广播视频则使用YCbCr隔行扫描方法),如今在嵌入式领域,它们在使用上几乎可以是互换的。也就是说,VGA与NTSC“D-1”广播格式相当, QVGA对应的则是CIF。应该注意的是,虽然D-1是720像素×480行格式,但它通常被称为720×480像素(这实际上是针对DVD和其他数字视频的NTSC“DV”格式)。

Table 1: Graphics vs Broadcast standards    

表1  图形图像与广播标准之间的对比

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)
视频源 视频标准 水平分辨率(像素) 垂直分辨率(像素) 总像素
广播 QCIF 176 144 25344
计算机图形 QVGA 320 240 76800
广播 CIF 352 288 101376
计算机图形 VGA 640 480 307200
广播 NTSC 720 480 345600
广播 PAL 720 576 414720
计算机图形 SVGA 800 600 480000
计算机图形 XGA 1024 768 786432
广播 HDTV(720P) 1280 720 921600
计算机图形 SXGA 1280 1024 1310720
计算机图形 UXGA 1600 1200 1920000
计算机图形 QXGA 2048 1536 3145728

Interlaced vs. Progressive Scanning  

隔行和逐行扫描

Interlaced scanning originates from early analog television broadcast, where the image needed to be updated rapidly in order to minimize visual flicker, but the technology available did not allow for refreshing the entire screen this quickly.  Therefore, each frame was “interlaced,” or split into two fields, one consisting of odd-numbered scan lines, and the other composed of even-numbered scan lines, as depicted in Figure 4.  The frame refresh rate for NTSC/(PAL) was set at approximately 30/(25) frames/sec. Thus, large areas flicker at 60 (50) Hz, while localized regions flicker at 30 (25) Hz.  This was a compromise to conserve bandwidth while accounting for the eye’s greater sensitivity to flicker in large uniform regions.   

隔行扫描方式源于早期的模拟电视广播技术,这种技术需要对图像进行快速扫描,以便最大限度地降低视觉上的闪烁感,但是当时可以运用的技术并不能以如此之快的速度对整个屏幕进行刷新。于是,将每帧图像进行“交错”排列或分为两场,一个由奇数扫描线构成,而另一个由偶数扫描线构成,如图4所示。NTSC/(PAL)的帧刷新速率设定为约30/(25)帧/秒。于是,大片图像区域的刷新率为60(50)Hz,而局部区域的刷新率为30(25)Hz,这也是出于节省带宽的折中考虑,因为人眼对大面积区域的闪烁更为敏感。

Not only does some flickering persist, but interlacing also causes other artifacts.  For one, the scan lines themselves are often visible. Because each NTSC field is a snapshot of activity occurring at 1/60 second intervals, a video frame consists of two temporally different fields.  This isn’t a problem when you’re watching the display, because it presents the video in a temporally appropriate manner.  However, converting interlaced fields into progressive frames (a process known as “deinterlacing”), can cause jagged edges when there’s motion in an image. Deinterlacing is important because it’s often more efficient to process video frames as a series of adjacent lines.   

隔行扫描方式不仅会产生闪烁现象,也会带来其它问题。例如,扫描线本身也常常可见。因为NTSC中每场信号就是1/60s时间间隔内的快照,故一幅视频帧通常包括两个不同的时间场。当正常观看显示屏时,这并不是一个问题,因为它所呈现的视频在时间上是近似一致的。然而,当画面中存在运动物体时,把隔行场转换为逐行帧(即解交织过程),会产生锯齿边缘。解交织过程非常重要,因为将视频帧作为一系列相邻的线来处理,这将带来更高的效率。

With the advent of digital television, progressive (that is, non-interlaced) scan has become a very popular input and output video format for improved image quality.  Here, the entire image updates sequentially from top to bottom, at twice the scan rate of a comparable interlaced system. This eliminates many of the artifacts associated with interlaced scanning. In progressive scanning, the notion of two fields composing a video frame does not apply.    

随着数字电视的出现,逐行(即非隔行)扫描已经成为一种具有更高图像品质的流行的输入和输出视频格式。在这种方式下,整幅图像将从上到下依次刷新,其扫描速率约为相应隔行系统的扫描速率的两倍,这消除了隔行扫描产生的许多弊病。在逐行扫描中,由两场信号来表示一帧视频的方式不再使用。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 4: Interlaced Scan vs Progressive Scan illustration

图4:隔行扫描与逐行扫描方式的对比

图中:486 Lines:One Frame——486线:1帧

Line——行,Interlaced:Frame is split into 2 field——隔行:图像帧被分离为两个视场;

Progressivew:Frame is displayed in sequence as a single field——逐行:图像帧作为一个视场依序显示;

Now that we’ve briefly discussed the basis for video signals and some common terminology, we’re almost ready to move to the really interesting stuff – digital video.  We’ll get to that in the next installment of this series.  

我们已经简要地讨论了视频信号的基础以及某些常用的术语,接下来,我们将在下一章节开始讨论令人感兴趣的部分——数字视频技术。

In this article, the second installment in a five-part series, we’ll discuss the basic concepts of digital video.  Before we do this, however, we need to explain a few things about color spaces.  

本文是5部分系列文章的第2部分,在本文中,我们将讨论数字视频的基本概念。在讨论前,我们需要对与颜色空间有关的几个问题进行一番讲解。

Color Spaces 

颜色空间

There are many different ways of representing color, and each color system is suited for different purposes.  The most fundamental representation is RGB color space.   

颜色的表达有多种不同的方式,每一种颜色系统所适合的用途都各不相同。最基本的一种表达方式为RGB颜色空间。

RGB stands for “Red-Green-Blue,” and it is a color system commonly employed in camera sensors and computer graphics displays. As the three primary colors that sum to form white light, they can combine in proportion to create most any color in the visible spectrum.  RGB is the basis for all other color spaces, and it is the overwhelming choice of color space for computer graphics.   

RGB代表“红-绿-蓝,”它是相机传感器和计算机图形显示方面常用的一种颜色系统。由于这三种原色相加起来可以形成白光,故可以通过将各原色按不同比例进行调和的办法来形成可见光谱区的大多数颜色。RGB是所有其他颜色空间的基础,在计算机图形学中,它是颜色空间的首选。

Gamma Correction     

Gamma校正

“Gamma” is a crucial phenomenon to understand when dealing with color spaces. This term describes the nonlinear nature of luminance perception and display.  Note that this is a twofold manifestation:  the human eye perceives brightness in a nonlinear manner, and physical output devices (such as CRTs and LCDs) display brightness nonlinearly.  It turns out, by way of coincidence, that human perception of luminance sensitivity is almost exactly the inverse of a CRT’s output characteristics. 

在处理与颜色空间有关的问题时,“Gamma(g)”是一种需要弄懂的关键现象。该术语描述了人们对亮度的感受和显示本身存在的非线性。请注意,这种现象表现在两方面:人眼对亮度的感受是非线性的,而物理输出设备(例如CRT和LCD)对亮度的显示也是非线性的。人们发现,可谓巧合的是,人的视觉对亮度的灵敏度特性几乎恰好与CRT的输出特性相反。

Stated another way, luminance on a display is roughly proportional to the input analog signal voltage raised to the power of gamma.  On a CRT or LCD display, this value is ordinarily between 2.2 and 2.5.   A camera’s precompensation, then, scales the RGB values to the power of (1/gamma).   

换句话说,显示器的亮度大约与输入的模拟信号电压的g次方成正比。在CRT或者LCD显示器上,该值一般为2.2~2.5。因此,相机的预补偿功能,是让RGB的量值按照1/g次方的关系来变化。

The upshot of this effect is that video cameras and computer graphics routines, through a process called “gamma correction,” prewarp their RGB output stream both to compensate for the target display’s nonlinearity and to create a realistic model of how the eye actually views the scene.  Figure 1 illustrates this process. 

该效应所带来的影响是,视频摄像机和计算机图形学程序,通过一种被称为“Gamma校正”的流程,可以预先对其RGB输出流进行预校正,以便补偿所针对的显示器的非线性,并就眼睛实际感受场景的方式形成一种有现实意义的模型。图1示出了这样一种流程。

Gamma-corrected RGB coordinates are referred to as R’G’B’ space, and the luma value Y’ is derived from these coordinates. Strictly speaking, the term “luma” should only refer to this gamma-corrected luminance value, whereas the true “luminance” Y is a color science term formed from a weighted sum of R, G, and B (with no gamma correction applied).  

经过Gamma校正后的RGB坐标被称为R’G’B 空间,其中亮度值Y可以从这些座标中提取出来。严格来讲,“Luma”一词应该仅指这类经过“Gamma校正”的亮度值,而真正的“亮度(luminance)”Y是一个颜色科学方面的术语,它是从R、G和B的加权和(未经过Gamma校正)所获得的。

Often when we talk about YCbCr and RGB color spaces in this series, we are referring to gamma-corrected components – in other words, Y’CbCr or R’G’B’.  However, because this notation can be distracting and doesn’t affect the substance of our discussion, and since it’s clear that gamma correction needs to take place at sensor and/or display interfaces to a processor, we will confine ourselves to the YCbCr/RGB nomenclature even in cases where gamma adjustment has been applied.  The exception to this convention is when we discuss actual color space conversion equations.   

在本系列文章中,当我们谈论YCbCr和RGB颜色空间时,我们是指经过Gamma校正的分量,换句话说,Y’CbCr或者R’G’B’。不过,因为该表示方法会造成我们的困惑,而且并不影响我们的讨论,既然Gamma校正必须在传感器和/或显示器与处理器的接口上执行,我们就将仅限于采用YCbCr/RGB的说法,即便在完成Gamma校正之后也是如此。这一约定的一个例外,是对实际的颜色空间变换方程进行讨论的时候。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 1:  Gamma correction linearizes the intensity produced for a given input amplitude

图1.  Gamma 校正可实现针对一个给定的输入幅值所产生的光强信号进行线性化

图中:Linear input——线性输入;Display causes nonlinear output intensity——显示器造成了输出强度的非线性;Linear input——线性输入;Gamma Correction——Gamma校正;Linear output intensity on display——显示器上线性化的输出强度特性。

While RGB channel format is a natural scheme for representing real-world color, each of the three channels is highly correlated with the other two.  You can see this by independently viewing the R, G, and B channels of a given image – you’ll be able to perceive the entire image in each channel. Also, RGB is not a preferred choice for image processing because changes to one channel must be performed in the other two channels as well, and each channel has equivalent bandwidth.

虽然RGB通道格式是呈现现实世界颜色的一种自然而然的方案,但3个通道中的每一个都与另外两个高度相关。你独立观看一幅特定图像的R、G和B通道,就可以发现这一点——你在每个通道中都能感受到整幅图像。另外,RGB并非图像处理的最佳选择,因为如果要变动一个通道,则也必须在另外两个通道进行更改,而且每个通道的带宽相同。

To reduce required transmission bandwidths and increase video compression ratios, other color spaces were devised that are highly uncorrelated, thus providing better compression characteristics than RGB does. The most popular ones – YPbPr, YCbCr, and YUV -- all separate a luminance component from two chrominance components.  This separation is performed via scaled color difference factors (B’-Y’) and (R’-Y’).  The Pb/Cb/U term corresponds to the (B’-Y’) factor, and the Pr/Cr/V term corresponds to the (R’-Y’) parameter. YPbPr is used in component analog video, YUV applies to composite NTSC and PAL systems, and YCbCr relates to component digital video. 

为了减少所需要的传输带宽并提高视频压缩比,人们提出了其他的颜色空间方案,这些变量是高度非相关的,从而能提供优于RGB的压缩特性。其中最流行的一些方案——YPbPr、YCbCr和YUV——全都是将亮度信号分量与两个色度分量分离开。这种分离运算是借助等比例缩放的色差因子(B’-Y’)与(R’-Y’)来实施的。Pb/Cb/U等项对应着(B’-Y’)因子,Pr/Cr/V等项对应于(R’-Y’)参数。YPbPr用于分量化的模拟视频中,YUV则适用于复合的NTSC和PAL系统,YCbCr则与分量化数字视频有关。

Separating luminance and chrominance information saves image processing bandwidth.  Also, as we’ll see shortly, we can reduce chrominance bandwidth considerably via subsampling, without much loss in visual perception. This is a welcome feature for video-intensive systems. 

亮度和色度信息的分离,可以节省图像处理带宽。另外,正如我们马上就会看到的那样,我们可以通过子采样的方法,在视觉效果不会出现较大损失的前提下,大大减小色度信号带宽。这对于需大量处理视频数据的系统来说,是一个受欢迎的特色。

As an example of how to convert between color spaces,  the following equations illustrate translation between 8-bit representations of Y’CbCr and R’G’B’ color spaces, where Y’, R’, G’ and B’ normally range from 16-235, and Cr and Cb range from 16-240.

作为在两个颜色空间之间进行转换的一个实例,如下的方程示出了如何在Y’CbCr和R’G’B’颜色空间的8bit表示法之间进行转换的方法,式中,Y’,R’,G’ 和 B’通常的变化范围是16~235,而Cr 和Cb的变化范围是16~240。

Y’ = (0.299)R? + (0.587)G? + (0.114)B?

Cb = -(0.168)R? - (0.330)G? + (0.498)B? + 128

Cr = (0.498)R? - (0.417)G? - (0.081)B? + 128

R? = Y’ + 1.397(Cr - 128)

G? = Y’ - 0.711(Cr - 128) - 0.343(Cb - 128)

B? = Y’ + 1.765(Cb - 128)

Chroma subsampling  

色度子采样

With many more rods than cones, the human eye is more attuned to brightness and less to color differences. As luck (or really, design) would have it, the YCbCr color system allows us to pay more attention to Y, and less to Cb and Cr.  As a result, by subsampling these chroma values, video standards and compression algorithms can achieve large savings in video bandwidth.   

由于人眼的杆状细胞要多于锥状细胞,故对于亮度的敏感能力要优于对色差的敏感能力。幸运的是(或者事实上通过设计可实现的是),YCbCr颜色系统允许我们将更多的注意力投向Y,而对Cb和Cr的关注程度不那么高。于是,通过对这些色度值进行子采样的方法,视频标准和压缩算法可以大幅度缩减视频带宽。

Before discussing this further, let’s get some nomenclature straight. Before subsampling, let’s assume we have a full-bandwidth YCbCr stream.  That is, a video source generates a stream of pixel components in the form of Figure 2a.  This is called “4:4:4 YCbCr.” This notation looks rather odd, but the simple explanation is this:  the first number is always ‘4’, corresponding historically to the ratio between the luma sampling frequency and the NTSC color subcarrier frequency. The second number corresponds to the ratio between luma and chroma within a given line (horizontally): if there’s no downsampling of chroma with respect to luma, this number is ‘4.’ The third number, if it’s the same as the second digit, implies no vertical subsampling of chroma. On the other hand, if it’s a 0, there is a 2:1 chroma subsampling between lines. Therefore, 4:4:4 implies that each pixel on every line has its own unique Y, Cr and Cb components.

在对该问题进行进一步讨论之前,让我们先解析一些术语。假设在进行子采样前,我们拥有一路全带宽的YCbCr流,即视频源产生了一路像素分量构成的数据流,其形式如图2a所示。这被称为“4:4:4 YCbCr”。这一表示方法看起来甚为离奇,但是可以简单的解释如下:第一个数始终是‘4’,在历史上对应着亮度采样频率与NTSC彩色子载波频率之比。第二个数字对应着在某条给定的水平线上的亮度和色度之比:如果并未针对亮度信号来对色度进行下采样,则该数为‘4’。第三个数,如果与第二个数相同的话,则意味着没有垂直方向上的色度子采样。另一方面,如果它等于0,则两条线之间存在一次2:1的色度子采样。于是,4:4:4意味着在每条线上的像素都有自己独有的Y、Cr和Cb信号分量。

Now, if we filter a 4:4:4 YCbCr signal by subsampling the chroma by a factor of 2 horizontally, we end up with 4:2:2 YCbCr. ‘4:2:2’ implies that there are 4 luma values for every 2 chroma values on a given video line. Each (Y,Cb) or (Y,Cr) pair represents one pixel value. Another way to say this is that a chroma pair coincides spatially with every other luma value, as shown in Figure 2b.  Believe it or not, 4:2:2 YCbCr qualitatively shows little loss in image quality compared with its 4:4:4 YCbCr source, even though it represents a savings of 33% in bandwidth over 4:4:4 YCbCr.  As we’ll discuss soon, 4:2:2 YCbCr is a foundation for the ITU-R BT.601 video recommendation, and it is the most common format for transferring digital video between subsystem components.   

现在,如果我们通过在水平方向上对色度信号进行因数为2的子采样,对一个4:4:4 YCbCr信号进行滤波,于是将得到4:2:2 YCbCr。‘4:2:2’意味着在给定的视频行上对应2个色度值有4个亮度值。每个(Y,Cb) 或者 (Y,Cr)对都代表着一个像素值。另一种表述方法是:一个色度信号对(chroma pair)在空间上与每隔一个亮度所选取的亮度值相重合,如图2b所示。无论您相信与否, 若将4:2:2 YCbCr与对应 的4:4:4 YCbCr 原始信号进行比较,则会发现在图像质量方面出现的损失很小,虽然它相对于4:4:4 YCbCr而言在带宽方面缩小了33%。正如我们马上要讨论的那样,4:2:2 YCbCr 是ITU-R BT.601视频推荐方案的一个基础,它是数字视频信号在不同的子系统组件之间进行传输时最常用的一种格式。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 2: a) 4:4:4 vs. b) 4:2:2 YCbCr pixel sampling  

图2: a) 4:4:4与 b) 4:2:2 YCbCr像素采样的比较

图中:Sampling——采样,3bytes per pixel——每像素3个字节,2bytes per pixel——每个像素2个字节。

Note that 4:2:2 is not the only chroma subsampling scheme. Figure 3 shows others in popular use. For instance, we could subsample the chroma of a 4:4:4 YCbCr stream by a factor of 4 horizontally, as shown in Figure 3c, to end up with a 4:1:1 YCbCr stream.  Here, the chroma pairs are spatially coincident with every fourth luma value. This chroma filtering scheme results in a 50% bandwidth savings.  4:1:1 YCbCr is a popular format for inputs to video compression algorithms and outputs from video decompression algorithms. 

应注意的是,4:2:2并非是唯一的色度子采样方案。图3示出了得到广泛应用的其他一些方案。例如,我们可以如图3c所示的那样,以比例因数“4”来对一个4:4:4 YCbCr流的水平行上的色度进行子采样,从而得到4:1:1 YCbCr流。这里,色度对在空间上与每个以4 为间隔的亮度值相一致。这一色度过滤方案可以节省50%的带宽。4:1:1 YCbCr是一种在视频压缩算法的输入和视频解压缩算法的输出方面广受欢迎的格式。

Another format popular in video compression/uncompression is 4:2:0 YCbCr, and it’s more complex than the others we’ve described for a couple of reasons.  For one, the Cb and Cr components are each subsampled by 2 horizontally and vertically.  This means we have to store multiple video lines in order to generate this subsampled stream.  What’s more, there are 2 popular formats for 4:2:0 YCbCr.  MPEG-2 compression uses a horizontally co-located scheme (Figure 3d, top), whereas MPEG-1 and JPEG algorithms use a form where the chroma are centered between Y samples (Figure 3d, bottom).

另外一种在视频压缩/解压缩方面得到广泛应用的算法是4:2:0 YCbCr,由于几个原因,它要比我们已经阐述过的其他方案更为复杂。其中一个原因是,Cb和Cr分量均在水平和垂直方向上受到了因数为2的子采样。这意味着我们不得不存储多条视频线,才能产生出该子采样流。而且,对应4:2:0 YCbCr信号还存在两种广受欢迎的格式。MPEG-2压缩使用了一种水平方向上共位的方案(图3d,顶图),而MPEG-1和JPEG算法所采用的格式则是一种色度信号位于Y采样之间的中点上的方案(图3d,底图)。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 3: (a) YCbCr 4:4:4 stream and its chroma-subsampled derivatives (b) 4:2:2 (c) 4:1:1 (d) 4:2:0 

图3:(a) YCbCr 4:4:4流以及经过色度子采样后的导出信号;(b)4:2:2;(c)4:1:1; (d) 4:2:0

图中:Luma——亮度,Chroma——色度,Total bytes——总字节数;

Luma +Chroma Component——亮度+色度分量,Cosited——共位的;Interstitial——间插的。

左下脚说明文字:为了清楚起见,描述了逐行扫描,亮度/色度分量都具有1个字节的长度;分量的标记对应着串序化的次序,而并非空间位置。

Digital Video 

数字视频

Before the mid-1990’s, nearly all video was in analog form.  Only then did forces like the advent of MPEG-2 compression, proliferation of streaming media on the Internet, and the FCC’s adoption of a Digital Television (DTV) Standard create a “perfect storm” that brought the benefits of digital representation into the video world. These advantages over analog include better signal-to-noise performance, improved bandwidth utilization (fitting several digital video channels into each existing analog channel), and reduction in storage space through digital compression techniques.   

在上世纪90年代中期以前,几乎所有的视频都采用了模拟格式。正是在那之后,诸如MPEG-2压缩的出现、互联网上流媒体的繁荣以及FCC采用“数字电视(DTV)”标准等推动力形成了一场“完美风暴”,将数字视频表现方式的优点呈现给整个视频世界。这些超越模拟信号的优点包括:更好的信噪比性能,更好的带宽利用(可以将若干路数字视频通道融入每个现有的视频频道中),而且通过数字压缩技术来减少存储空间。

At its root, digitizing video involves both sampling and quantizing the analog video signal.  In the 2D context of a video frame, sampling entails dividing the image space, gridlike, into small regions and assigning relative amplitude values based on the intensities of color space components in each region. Note that analog video is already sampled vertically (discrete number of rows) and temporally (discrete number of frames per second).  

从根本上来说,视频的数字化同时包括了对模拟视频信号的采样和量化。在视频帧的2D框架之中,采样最终体现为将栅格状的图像空间划分为小块的区域,并根据每个区域中颜色空间分量的强度来为其分配相对幅值。请注意,模拟视频信号已经被从垂直方向上(离散的多行扫描)和时间上(每秒分立的多帧图像)做了采样。

Quantization is the process that determines these discrete amplitude values assigned during the sampling process.  8-bit video is common in consumer applications, where a value of 0 is darkest (total black) and 255 is brightest (white),for each color channel (R,G,B or YCbCr).  However, it should be noted that 10-bit and 12-bit quantization per color channel is rapidly entering mainstream video products, allowing extra precision that can be useful in reducing received image noise by avoiding roundoff error. 

量化是指在采样过程中设法确定这些离散幅值的过程。8bit视频是消费类应用中常用的格式,在每个颜色通道(R、G、B或者 YCbCr)中,0代表最暗(全黑),255代表最亮(白)。不过,应该指出的是,每单色通道10bit 和12bit的量化水平,也正在迅速融入主流的视频产品中,从而带来更高的精度,这对于避免其截断误差从而降低接收到的图像噪声来说,是非常有效的。

The advent of digital video provided an excellent opportunity to standardize, to a large degree, the interfaces to NTSC and PAL systems.  When the ITU (International Telecommunication Union) met to define recommendations for digital video standards, it focused on achieving a large degree of commonality between NTSC and PAL formats, such that the two could share the same coding formats.    

数字视频的出现,在很大程度上为NTSC和PAL系统接口的标准化提供了极佳的机会。当ITU(国际电信联盟)开会以确定关于数字视频标准方面的推荐方案时,它把重点放在如何在NTSC和PAL格式之间实现高度的共享性,这样使得两种标准都可以采用同一种编码格式。

They defined 2 separate recommendations – ITU-R BT.601 and ITU-R BT.656.  Together, these two define a structure that enables different digital video system components to interoperate.  Whereas BT.601 defines the parameters for digital video transfer, BT.656 defines the interface itself.   

他们确定了2种独立的推荐方案-ITU-R  BT.601和ITU-R BT656。这两种方案合起来,就定义了一种结构,该结构能让不同的数字视频系统部件实现互操作。鉴于BT.601定义了数字视频传输所用的参数,BT.656定义了接口本身。

ITU-R BT.601 (formerly CCIR-601)    

ITU-R BT.601(前 CCIR-601)

BT.601 specifies methods for digitally coding video signals, using the YCbCr color space for better use of channel bandwidth. It proposes 4:2:2 YCbCr as a preferred format for broadcast video.  Synchronization signals (HSYNC, VSYNC, FIELD) and a clock are also provided to delineate the boundaries of active video regions.   Figure 4 shows typical timing relationships between sync signals, clock and data.   

BT.601规定了对视频信号进行数字化编码的方法,它利用了YCbCr颜色空间,以更好地利用通道带宽。它建议将4:2:2 YCbCr作为广播视频的首选格式。同时也提供了同步信号(HSYNC,VSYNC,FIELD)和时钟信号,以便划定有效视频区的边界。图4示出了同步信号、时钟和数字信号之间典型的时序关系。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 4: Common Digital Video Format Timing

图4  常见的数字视频格式的时序关系

Video Data——视频数据,Clock——时钟信号,format——格式,Digital RGB format——数字RGB格式。Video Data——视频数据,Clock——时钟。

Each BT.601 pixel component (Y, Cr, or Cb) is quantized to either 8 or 10 bits, and both NTSC and PAL have 720 pixels of active video per line. However, they differ in their vertical resolution.  While 30 frames/sec NTSC has 525 lines (including vertical blanking, or retrace, regions), the 25 frame/sec rate of PAL is accommodated by adding 100 extra lines, or 625 total, to the PAL frame.   

每个BT.601像素分量(Y、Cr或 Cb)被量化为8或者10bit信息,NTSC和PAL的有效视频画面中每行有720个像素 。不过,它们在垂直分辨率方面存在差异。30帧/s的NTSC有525行(线)(包括垂直消隐或者回扫区),而通过在PAL帧上添加100条线,或者说,使之总共达到625条线,使PAL的25帧/s的速率适应同样的标准。

BT.601 specifies Y with a nominal range from 16 (total black) to 235 (total white). The color components Cb and Cr span from 16 to 240, but a value of 128 corresponds to no color. Sometimes, due to noise or rounding errors, a value might dip outside the nominal boundaries, but never all the way to 0 or 255.  

BT.601规定Y值的额定值范围从16(全黑)一直到235(全白)。颜色分量Cb和Cr则从16到240,但128的量值对应着无颜色。有时,由于噪声或者截断误差的存在,一个量值可能会超出额定值边界之外,但永远不会取值到0或者255。

ITU-R BT.656 (formerly CCIR-656)   

ITU-R BT.656 (前 CCIR-656)

Whereas BT.601 outlines how to digitally encode video, BT.656 actually defines the physical interfaces and data streams necessary to implement BT.601.  It defines both bit-parallel and bit-serial modes.  The bit-parallel mode requires only a 27 MHz clock (for NTSC 30 frames/sec) and 8 or 10 data lines (depending on the pixel resolution).  All synchronization signals are embedded in the data stream, so no extra hardware lines are required. 

BT.601规划了对视频进行数字编码的方法,而BT.656则实际定义了实施BT.601所必需的物理接口和数据流。它同时定义了位并行和位串行模式。位并行模式只需要27MHz的时钟(在NTSC 30 帧/s条件下)以及8或10条连线(具体取决于像素的分辨率)。所有的同步化信号都嵌入到数据流中,因此无需额外添加硬件连线。

The bit-serial mode requires only a multiplexed 10 bit/pixel serial data stream over a single channel, but it involves complex synchronization, spectral shaping and clock recovery conditioning.  Furthermore, the bit clock rate runs close to 300 MHz, so it can be challenging to implement bit-serial BT.656 in many systems.  For our purposes, we’ll focus our attention on the bit-parallel mode only. 

位串行模式只需要在单个通道上传输一路复用化的10bit/像素串行数据流,不过它需要运用复杂的同步化、频谱整形和时钟恢复调理等技术手段。此外,其位时钟速率接近300MHz,因此要在很多系统中实施基于采用串行位形式的BT.656是极富挑战性的任务。从我们的目标出发,我们将把注意力仅放在位并行模式上。

The frame partitioning and data stream characteristics of ITU-R BT.656 are shown in Figures 5 and 6, respectively, for 525/60 (NTSC) and 625/50 (PAL) systems.   

图5和图6示出了ITU-R  BT. 656中分别针对525/60 (NTSC)和625/50 (PAL)系统的帧划分方法和数据流的特性

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 5: ITU-R BT.656 Frame Partitioning

图5  ITU-R BT.656 帧划分

图中:horizontal blanking——水平消隐,Vertical Blancking——垂直消隐,Field——场,Active Video——有效的视频,

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 6: ITU-R BT.656 Data Stream

图6: ITU-R BT.656数据流

End of active video——有效视频终点;horizontal blanking——水平消隐,Start of active video——有效视频起点,Start of next line——下一行的起点。Digital Video Stream——数字视频流。Control Byte——控制字节。

In BT.656, the Horizontal (H), Vertical (V), and Field (F) signals are sent as an embedded part of the video data stream in a series of bytes that form a control word. The Start of Active Video (SAV) and End of Active Video (EAV) signals indicate the beginning and end of data elements to read in on each line. SAV occurs on a 1-to-0 transition of H, and EAV begins on a 0-to-1 transition of H. An entire field of video is comprised of Active Video + Horizontal Blanking (the space between an EAV and SAV code) and Vertical Blanking (the space where V = 1). 

在BT.656标准中,水平(H)、垂直(V)和场(F)信号作为嵌入到视频数据流中的一串字节来发送,这一串字节构成了一个控制字。有效视频起点(SAV)和有效视频终点(EAV)信号指示了每行读入的数据单元的开始和结束。SAV出现在H发生1-0切换时,EAV出现在H发生0-1切换时。整个视频场由有效的视频+水平消隐(EAV和SAV代码之间的空间)以及垂直消隐(V=1的空间)组成。

A field of video commences on a transition of the F bit. The “odd field” is denoted by a value of F = 0, whereas F = 1 denotes an even field. Progressive video makes no distinction between Field 1 and Field 2, whereas interlaced video requires each field to be handled uniquely, because alternate rows of each field combine to create the actual video image.  

视频的场从F位的切换开始。“奇数场”由F=0表示,而F=1则表示偶数场。逐行扫描的视频并不区分场1和场2,而隔行扫描的视频则要求专门对每个场进行独立的处理,因为每个场交替的扫描行组合起来最终形成实际的视频图像。

The SAV and EAV codes are shown in more detail in Figure 7. Note there is a defined preamble of three bytes (0xFF, 0x00, 0x00 for 8-bit video, or 0x3FF, 0x000, 0x000 for 10-bit video), followed by the XY Status word, which, aside from the F (Field), V (Vertical Blanking) and H (Horizontal Blanking) bits, contains four protection bits for single-bit error detection and correction. Note that F and V are only allowed to change as part of EAV sequences (that is, transitions from H = 0 to H = 1).  Also, notice that for 10-bit video, the two additional bits are actually the least-significant bits, not the most-significant bits. 

图7更为详细地示出了SAV和EAV代码。请注意,视频数据有一个由三个字节构成的前导码(8bit视频是0xFF, 0x00,0x00,而10bit视频则是0x3FF, 0x000, 0x000),后面跟随着XY状态字,这个字除了包含F (场), V (垂直消隐) 和 H (水平消隐)位之外,还包含了4个保护位,以实现单位错误的检测和纠正。请注意,F和V只能作为EAV序列的一部分来变化(即,从H = 0切换到H = 1)。此外,请注意,对于10bit视频来说,增加的两位实际上是最低位,而不是最高位。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 7: SAV/EAV Preamble codes  

图7  SAV/EAV 前导码

图中:8-bit Data——8位数据,10bit Data——10位数据

The bit definitions are as follows:   各个数位的定义如下:

? F = 0 for Field 1             

? F = 0 ,场 1

? F = 1 for Field 2             

? F = 1,  场 2

? V = 1 during Vertical Blanking    

? V = 1 垂直消隐期间

? V = 0 when not in Vertical Blanking   

? V = 0  未在垂直消隐期内

? H = 0 at SAV                                  

? H = 0  @ SAV

? H = 1 at EAV                                  

? H = 1 @ EAV

? P3 = V XOR H  

? P2 = F XOR H

? P1 = F XOR V

? P0 = F XOR V XOR H

The vertical blanking interval (the time during which V=1) can be used to send non-video information, like audio, teletext, closed-captioning, or even data for interactive television applications.  BT.656 accommodates this functionality through the use of ancillary data packets.  Instead of the “0xFF, 0x00, 0x00” preamble that normally precedes control codes, the ancillary data packets all begin with a “0x00, 0xFF, 0xFF” preamble.  

 垂直消隐间隔(V=1的时间)可以被用来发送非视频的信息,如音频、文字电视广播(teletext)、字幕(closed-captioning)或者甚至交互电视应用所需的数据。BT.656借助辅助性的数据包实现这些功能。这些辅助性数据包并未采用通常在控制代码前的那些“0xFF, 0x00, 0x00”前导码,而是以“0x00, 0xFF, 0xFF”前导码为开头。

Assuming that ancillary data is not being sent, during horizontal and vertical blanking intervals the (Cb, Y, Cr, Y, Cb, Y, …) stream is (0x80, 0x10, 0x80, 0x10, 0x80, 0x10…).  Also, note that because the values 0x00 and 0xFF hold special value as control preamble demarcators, they are not allowed as part of the active video stream.  In 10-bit systems, the values (0x000 through 0x003) and (0x3FC through 0x3FF) are also reserved, so as not to cause problems in 8-bit implementations.    

假定并不发送辅助数据,则在水平和垂直消隐间隔期间,(Cb, Y, Cr, Y, Cb, Y, …)流是(0x80, 0x10, 0x80, 0x10, 0x80, 0x10…)。另外,请注意,由于0x00 和 0xFF等量值专门用于控制前导码,故不能用作有效视频流的一部分。在10bit系统中,(0x000 到 0x003)以及(0x3FC 到0x3FF)等量值也都被专门留出,以免给8bit的视频引用造成问题。

So that’s a wrap on our discussion of digital video concepts.  In the next series installment, we’ll turn our focus to a systems view of video, covering how video streams enter and exit embedded systems.  

上述就是我们关于数字视频的概念介绍。在下一部分中,我们将把注意力放在如何从系统角度来考察视频技术上,讨论视频流是如何进入和输出嵌入式系统的。

In this article, the third installment in a five-part series, we’ll look at video flows from a system level, discussing the types of video sources and displays that comprise an embedded video application.   

本文是5部分系列文章的第3部分,我们将从系统层次来考察视频流,讨论嵌入式视频应用所包含的各种视频源和显示装置。

Figure 1 shows a typical end-to-end embedded digital video system.  In one case, a video source feeds into a media processor (after being digitized by a video decoder, if necessary).  There, it might be compressed via a software encode operation before being stored locally or sent over the network.  

图1描述了一个典型的端到端嵌入式数字视频系统。在这种情况下,一个视频源被输入到一个媒体处理器中(必要时,可经过视频解码器的数字化处理)。此时,可以通过软件编码操作来对其进行压缩,然后将其存储到本地或者通过网络进行传输。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 1: System Video Flow for Analog/Digital Sources and Displays

图1:  模拟/数字源和显示器的系统视频流

Outside World——媒体处理器外部,Media Processor——媒体处理器

Video Soures——视频源,Video Displays——视频显示设备,Digital LCD panel——数字LCD平板,Digital CMOS sensor——数字CMOS传感器,Analog Video camera or CCD——模拟视频摄像机或者CCD,HW Decode(A/D converter)r——硬件解码器(A/D转换器),Storage media——存储介质,TV or Monitor—— 电视机或监视器,Digital  LCD panel——数字LCD平板显示,H/W Encoder(D/A转换器)——硬件编码器(D/A 转换器),Media Processor——媒体处理器,SW Encoder(Compression)——软件编码器(压缩),SW Decoder——软件解码器(解压缩),

In an opposite flow, a compressed stream is retrieved from a network or mass storage.  It is then decompressed via a software decode operation and sent directly to a digital output display (like a TFT-LCD panel), perhaps being first converted to analog form by a video encoder for display on a conventional CRT. 

与此过程相反,我们可以从网络或硬盘存储设备得到一段经过压缩的码流。然后通过软件解码操作实现解压缩,并直接传送到一个视频输出显示设备器上(如TFT-LCD平板),或者通过视频编码器转换为模拟信号,从而在传统的CRT上显示。

Keep in mind that compression/decompression represent only a subset of possible video processing algorithms that might run on the media processor. Still, for our purposes, they set a convenient template for discussion. Let’s examine in more detail the video-specific portions of these data flows.   

需要注意的是,压缩/解压缩仅仅代表了媒体处理器上可实现的视频处理算法中的一部分。不过,就我们的目标而言,它们为我们的讨论提供了一个合适的模板。接下来,让我们更为详细的探讨这些数据流中专门针对视频的部分。

Analog Video Sources  

模拟视频源

Embedded processor cores cannot deal with analog video directly. Instead, the video must be digitized first, via a video decoder. This device converts an analog video signal (e.g., NTSC, PAL, CVBS, S-Video) into a digital form (usually of the ITU-R BT.601/656 YCbCr or RGB variety).  This is a complex, multi-stage process.  It involves extracting timing information from the input, separating luma from chroma, separating chroma into Cr and Cb components, sampling the output data, and arranging it into the appropriate format. A serial interface such as SPI or I2C configures the decoder’s operating parameters. Figure 2 shows a block diagram of a representative video decoder.  

嵌入式处理器内核无法直接对模拟视频信号进行处理。视频信号必须首先通过视频解码器数字化,将模拟视频信号(例如,NTSC、PAL、CVBS、S-Video)转换为数字信号形式(通常是ITU-R BT.601/656 YCbCr或者RGB)。这是一个复杂的、多级的处理过程,包括从输入信号中提取时间信息、亮度与色度的分离、色度信息分离为Cr和Cb分量、输出数据的采样,以及为其分配适当的格式等。通过串行接口,如SPI或者I2C,可以对解码器的操作参数进行配置。图2所示的是视频解码器的典型方框图。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 2: Block diagram of ADV7183B video decoder

图2   ADV7183B视频解码器,

图中:Analog IN——模拟输入,Input MUX——输入复用器,Data Pre-processor——数据预处理器,Decimation and Downsampling Filters——抽取和下采样滤波器,SYNC and CLK——同步与时钟,SYNC Processing and CLOCK Generation——同步处理和时钟生成,Serial Interface Control and VB1 Data——串行接口控制和VBI数据,Chroma Digital Fine Clamp——Chroma数字精细嵌位电路,LUMA Digital Fine Clamp——LUMA数字精细嵌位电路,LUMA Filter——LUMA滤波器,Gain Control——增益控制,LUMA Resample——LUMA重采样,LUMA 2D COMB (4H MAX)——LUMA 2D COMB,SYNC Extract——同步提取,Line Length Predictor—数据线长度预测,Resample Control——重采样控制,AV Code Insertion——AV码插入,FSC Recovery——FSC 恢复,Chroma Demod—Chroma解调,Chroma Filter——Chroma滤波器,Gain Control——增益控制,Chroma Resample——Chroma重采样,Chroma  2D COMB (4H MAX)——Chroma 2D COMB(4H MAX),VBI Data Recovery——VBI数据恢复,Macrovision Detection——Macrovision 检测,Global Control——全局控制,Standard Autodetection——标准自动检测,Synthesized LLC Control——同步LLC控制,Free Run Output Control——自由运行输出控制,Output Formatter——输出格式化。

Digital Video Sources 

数字视频源

Camera sources today are overwhelmingly based on either Charge-Coupled Device (CCD) or CMOS technology.  Both of these technologies convert light into electrical signals, but they differ in how this conversion occurs.  

当今的视频信号源基本上都是基于电荷耦合设备(CCD)或者CMOS技术基础上的。这些技术都可以将光转换为电信号,但它们在转换机理方面存在差异。

CMOS sensors ordinarily output a parallel digital stream of pixel components in either YCbCr or RGB format, along with horizontal and vertical synchronization and a pixel clock. Sometimes, they allow for an external clock and sync signals to control the transfer of image frames out from the sensor.   

CMOS传感器一般会输出并行的数字信号流,该信号流通常包括YCbCr 或者 RGB格式的像素分量,以及水平/垂直同步和像素时钟。有时,它们还允许采用一路外部的时钟和同步信号,以控制图像帧从传感器向外部传输。

CCDs, on the other hand, usually hook up to an "Analog Front End" (AFE) chip, such as the AD9948, that processes the analog output signal, digitizes it, and generates appropriate timing to scan the CCD array. A processor supplies synchronization signals to the AFE, which needs this control to manage the CCD array. The digitized parallel output stream from the AFE might be in 10-bit, or even 12-bit, resolution per pixel component.   

另一方面,CCD往往连接到 “模拟前端”(AFE)芯片上,如AD9948,该类芯片处理模拟输出信号,对其进行数字化,并产生恰当的时序信号来扫描CCD阵列。AFE的同步信号则由处理器来提供,AFE需要利用该路控制信号来管理CCD阵列。经过AFE数字化后的并行输出流可以达到每像素分量10bit、甚至12bit的分辨率。

For a more detailed discussion on tradeoffs between CMOS and CCD sensors, as well as an overview of a typical image processing pipeline, please refer to the following article: 

欲了解关于在CMOS和CCD传感器之间如何进行折衷的更为详细的讨论,以及典型的图像处理流水线操作方面的综述,请参考如下的文章:

http://www.videsignline.com/howto/sensorsoptics/189600793

Analog video displays  

模拟视频显示

Video Encoder  

视频编码器

A video encoder converts a digital video stream into an analog video signal.  It typically accepts a YCbCr or RGB video stream in either ITU-R BT.656 or BT.601 format and converts to a signal compliant with one of several different output standards (e.g., NTSC, PAL, SECAM).  A host processor controls the encoder via a 2- or 3-wire serial interface like SPI or I2C, programming such settings as pixel timing, input/output formats, and luma/chroma filtering. Figure 3 shows a block diagram of a representative encoder. Video encoders commonly output in one or more of the following analog formats: 

视频编码器用于将数字视频流转换为一路模拟视频信号,输入一般为ITU-R.656或者BT.601格式的YCbCr或者RGB视频流,根据不同的输出标准(如NTSC、PAL、SECAM)对信号进行转换。一个主控处理器可以通过2线或者3线串行接口(SPI 或者I2C)来对编码器进行控制,如对像素时序、输入/输出格式以及亮度/色度滤波等设置进行编程。图3所示的是典型编码器的结构框图。视频编码器比较常见的模拟输出格式如下:

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 3: Block diagram of ADV7179 video encoder

图3:ADV7179视频编码器的结构框图。

图中Power Management Control——电源管理控制,Sleep Mode——休眠模式,Interpolator——内插器,CGMS and WSS Insertion Block ——CGMS 与 WSS插入模块,Teletext Insertion Block——Teletext插入模块,Interpolator——内插器,Programmable Luminance Filter——可编程亮度滤波器,

Programmable Chrominance Filter——可编程色度滤波器,YCrCb to YUV Matrix——YCrCb至YUV转换矩阵,Video Timing Generator——视频时序发生器,I2C MPU Port——I2C MPU端口,Real-time Control Circuit——实时控制电路,Sin/Cos DDS Block——正弦/余弦DDS模块,Voltage Reference Circuit——电压基准电路,YUV to RGB Matrix——YUV至RGB转换矩阵,Multiplexer——复用器,Voltage

Reference Circuits——电压基准电路

CVBS – This acronym stands for Composite Video Baseband Signal (or Composite Video Blanking and Syncs.)  Composite video connects through the ubiquitous yellow RCA jack shown in Figure 4a.  It contains Luma, Chroma, Sync and Color Burst information all on the same wire. 

CVBS:复合视频基带信号(或复合视频消隐与同步)。复合的视频是通过图4a所示的专用黄色RCA接头来连接的。它将亮度、色度、同步和色彩脉冲信息整合到一根电缆内。

S Video, using the jack shown in Figure 4b, sends the luma and chroma content separately. Separating the brightness information from the color difference signals dramatically improves image quality, which accounts for the popularity of S Video connections on today’s home theater equipment.  

S Video:使用图4b所示的接头进行连接,可以分别传送亮度和色度内容。将亮度信息与色差信号分离开来,可以大幅改善图像质量,这也正是S Video连接在当今的家庭影院系统中流行的原因。

Component Video – Also known as YPbPr, this is the analog version of YCbCr digital video.  Here, the luma and each chroma channel are brought out separately, each with its own timing. This offers maximum image quality for analog transmission. Component connections are very popular on higher-end home theatre system components like DVD players and A/V receivers (Figure 4c).

分量视频,也称为YPbPr,这是YCbCr数字视频的的模拟版本。在这种视频中,每个亮度与色度通道都是单独提取、输出的,每路都带有自己的时序。这就保证了模拟传输后图像的高品质。分量连接在高端家用影院系统组件,如DVD播放器和A/V接收机中,是非常常见的(图4c)。

Analog RGB has separate channels for Red, Green and Blue signals.  This offers similar image quality to Component Video, but it’s normally used in the computer graphics realm, whereas Component Video is primarily employed in the consumer electronics arena.  RGB connectors are usually of the BNC variety, shown in Figure 4d.  

模拟RGB具有分离的红、绿、蓝信号通道。这可以提供类似于分量视频的图像质量,但它一般用于计算机图形图像领域,而分量视频则主要应用于消费类电子方面。RGB连接器往往是BNC插座的改型,如图4d所示。

嵌入式视频处理基本原理part1Fundamentals of Embedded Video Processing   (Part 1 of a 5-part series   )嵌入式视频处理基本原理(第1部分,共5部分)

Figure 4: Common Analog Video Connectors  

图4:常见的模拟视频连接器

Cathode Ray Tubes (CRTs)   

阴极射线管(CRT)

On the display side, RGB is the most popular interface to computer monitors and LCD panels.  Most older computer monitors accept analog RGB inputs on 3 separate pins from the PC video card and modulate 3 separate electron gun beams to generate the image. Depending on which beam(s) excite a phosphor point on the screen, that point will glow either red, green, blue, or some combination of these colors. This is different from analog television, where a composite signal, one that includes all color information superimposed on a single input, modulates a single electron beam. Newer computer monitors use DVI, or Digital Visual Interface, to accept RGB information in both digital and analog formats.     

在显示端,RGB是计算机监视器和LCD平板显示最常用的接口。大多数较老式的计算机监视器都是用3根独立的插针,从PC视频卡和调制三相电子枪波束接收模拟RGB输入产生所需的图像。电子束激励屏幕上的磷点,则该点将发出红、绿、蓝或者三色的组合,具体的发光则取决于是哪路电子束击中了该点。这与模拟电视不同,后者使用的是一种复合信号,将所有色彩信息分层整合为一个单一的输入,并调制成一个单一的电子束。最近出现的计算机显示器使用DVI即数字可视接口,来接收数字和模拟的RGB信息。

The main advantages to CRTs are that they’re very inexpensive and can produce more colors than a comparably sized LCD panel. Also, unlike LCDs, they can be viewed from any angle.   On the downside, CRTs are very bulky, emit considerable electromagnetic radiation, and can cause eyestrain due to their refresh-induced flicker.   

CRT的主要优势在于它们是廉价的,而且可以产生的颜色也要比同等尺寸的LCD平板要丰富。此外,与LCD不同的是,CRT的可视角度非常宽广。CRT的不足在于,体积较为庞大,电磁辐射较强,由于刷新时的闪烁效应,也会导致眼睛疲劳。

Liquid Crystal Display (LCD) Panels  

液晶显示(LCD)平板

There are two main categories of LCD technology: passive matrix and active matrix.  In the former (whose common family members include STN, or “Super Twisted Nematic,” derivatives), a glass substrate imprinted with rows forms a “liquid crystal sandwich” with a substrate imprinted with columns. Pixels are constructed as row-column intersections.  Therefore, to activate a given pixel, a timing circuit energizes the pixel’s column while grounding its row.  The resultant voltage differential untwists the liquid crystal at that pixel location, which causes it to become opaque and block light from coming through. 

LCD技术主要有两类:无源阵列和有源阵列。在前者中(最常用的一种类型是STN,或超级螺旋向列及其衍生种类),往往采用由一块印刷出行电极引线结构的玻璃衬底与另一块印刷出列电极引线结构的玻璃衬底共同组成的“液晶三明治”结构。这些行列交叉点构成了像素点。于是,为了激活特定的像素,时序电路将像素点处的列上电,而将行接地。所形成的电压差则使得该像素点上的液晶产生翻转,于是该点变得不透明,阻止光线穿过。

Straightforward as it is, passive matrix technology does have some shortcomings.  For one, screen refresh times are relatively slow (which can result in “ghosting” for fast-moving images).  Also, there is a tendency for the voltage at a row-column intersection to “bleed” over into neighboring pixels, partly untwisting the liquid crystals and blocking some light from passing through the surrounding pixel area. To the observer, this blurs the image and reduces contrast.  Moreover, viewing angle is relatively narrow.  

无源阵列技术虽然简单,但也存在一些缺点。例如,屏幕的刷新时间较慢(这会造成快速物体显示时的重影现象)。此外,行-列交点处的电压也容易泄露到相邻的像素点上,这会在一定程度上造成周围区域像素的液晶变得不透明,阻碍光线的通过。对于观察者而言,图像会变得模糊不清,且对比度变低,另外,可视角度也相对较小。

Active matrix LCD technology improves greatly upon passive technology in these respects. Basically, each pixel consists of a capacitor and transistor switch. This arrangement gives rise to the more popular term, “Thin-Film Transistor (TFT) Display.” To address a particular pixel, its row is enabled, and then a voltage is applied to its column.  This has the effect of isolating only the pixel of interest, so others in the vicinity don’t turn on.  Also, since the current to control a given pixel is reduced, pixels can be switched at a faster rate, which leads to faster refresh rates for TFTs over passive displays. What’s more, modulating the voltage level applied to the pixel allows many discrete levels of brightness.  Today, it is common to have 256 levels, corresponding to 8 bits of intensity.  

在这些方面,有源阵列LCD技术可以很大程度上改进无源技术的缺点。从基本结构来看,每个像素点都由一个电容和晶体管开关构成,这种结构使之获得了一个更常用的名称“薄膜晶体管(TFT)显示”。为了对特定的像素进行定位,需要使能其所在行,然后向其所在列施加一个电压,这样可以带来隔离感兴趣的像素点的效果,而周围的其它像素都不会被影响。由于控制特定像素的电流被降低,该像素点的开关速度也更高,这就使得TFT技术具备了高于无源显示的刷新速率。此外,对施加到像素点上电压高低的调制也可以实现对多种亮度级的显示。如今,对于8bit的亮度信息,其显示的亮度级通常有256个。

Connecting to a TFT-LCD panel can be a confusing endeavor due to all of the different components involved. First, there’s the panel itself, which houses an array of pixels arranged for strobing by row and column, referenced to the pixel clock frequency. 

由于涉及多种不同的组件,连接TFT-LCD的任务繁杂,令人感到混乱不堪。首先,考虑平板本身,需要根据像素时钟频率,对像素阵列的行和列加载脉冲。

The backlight is often a CCFL (Cold Cathode Fluorescent Lamp), which excites gas molecules to emit bright light while generating very little heat. Other reasons for their suitability to LCD panel applications are their durability, long life, and straightforward drive requirements.  LEDs are also a popular backlight method, mainly for small- to mid-sized panels. They have the advantages of low cost, low operating voltage, long life, and good intensity control. However, for larger panel sizes, LED backlights can draw a lot of power compared with comparable CCFL solutions. 

TFT-LCD的背光常常是CCFL(冷阴极荧光灯),其内部的气体分子被激发并发光,而相应产生的热量很少。它们适合于LCD平板显示的其他原因还包括:耐用性、长寿命和简单易行的驱动要求。LED也是一种流行的背光方法,主要用于小到中等尺寸的平板,它们的优点在于成本低、工作电压低、寿命长、亮度控制性好等。不过,当平板尺寸更大时,与CCFL相比,LED背光的功耗就显得过高。

An LCD controller contains most of the circuitry needed to convert an input video signal into the proper format for display on the LCD panel.  It usually includes a timing generator that controls the synchronization and pixel clock timing to the individual pixels on the panel.  However, in order to meet LCD panel size and cost requirements, sometimes timing generation circuitry needs to be supplied externally in a “Timing Generator” or “Timing ASIC” chip. In addition to the standard synchronization and data lines, timing signals are needed to drive the individual rows and columns of the LCD panel.  Sometimes, spare general-purpose PWM (pulse-width modulation) timers on a media processor can substitute for this separate chip, saving system cost.   

一个LCD控制器包含了将一路输入视频信号转换为LCD平板所需格式的大部分电路,它往往包括一个时序发生器,用于控制平板上各像素的同步和时钟信号的时序。不过,为了满足LCD平板在尺寸和成本方面的要求,有时需要由“时钟发生器”或者“时序发生ASIC”芯片从外部提供时序生成电路。除了实现标准的同步以及数据线外,驱动LCD平板上各行和列也需要定时信号。有时,媒体处理器上富余的通用性PWM(脉宽调制),可以取代这一分立芯片,降低系统的成本。

Additional features of LCD controller chips are things like on-screen display support, graphics overlay blending, color lookup tables, dithering and image rotation.  The more elaborate chips can be very expensive, often surpassing the cost of the processor to which they’re connected.

 LCD控制芯片还提供屏幕显示支持、图形层叠混合、颜色速查表、抖动与图像旋转等其他一些特色功能。结构复杂的芯片,其价格也极为昂贵,常常超出了与之相连的处理器的成本。

An LCD driver chip is necessary to generate the proper voltage levels to the LCD panel.  It serves as the “translator” between the output of the LCD Controller and the LCD Panel.  The rows and columns are usually driven separately, with timing controlled by the timing generator. Liquid crystal must be driven with periodic polarity inversions, because a dc current will stress the crystal structure and ultimately deteriorate it.  Therefore, the voltage polarity applied to each pixel varies either on a per-frame, per-line, or per-pixel basis, depending on the implementation. 

为了给LCD平板提供恰当的电平,需要采用合适的LCD驱动器芯片。它起到LCD控制输出和LCD平板之间的“转换器”的功能。行与列往往是分开驱动的,其时序由时序信号发生器来控制。液晶必须用周期性的极性翻转信号来驱动,因为直流电流会给液晶结构带来应力,并最终使之损坏。于是,施加到每个像素的电压的极性必须满足基于每帧、每行或者每个像素变化一次的要求,具体采取何种方式则取决于实现方案。

With the trend toward smaller, cheaper multimedia devices, there has been a push to integrate these various components of the LCD system.  Today, integrated TFT-LCD modules exist that include timing generation and drive circuitry, requiring only a data bus connection, clocking/synchronization lines, and power supplies. The electrical interface on an integrated TFT-LCD display module is straightforward.  It typically consists of data lines, synchronization lines, power supply lines, and a clock.  Some panels are also available with a composite analog video input, instead of parallel digital inputs. 

随着多媒体设备向着更小型、更廉价的方向发展,促使人们将上述这些LCD系统部件集成起来。如今,集成的TFT-LCD模块包含时序信号产生与驱动电路,只需要一路数据总线连接、时钟/同步化线和电源。一个集成化的TFT-LCD显示模块的电气接口简单易懂,它一般包含数据线、同步线、电源线和一路时钟线。有些平板除了支持并行的视频数字输入信号外,还支持复合模拟视频输入。

OLED (Organic Light-Emitting Diode) Displays    

OLED(有机发光二极管)显示

The “Organic” in OLED refers to the material that’s encased between two electrodes.  When charge is applied through this organic substance, it emits light. This display technology is still very new, but it holds promise by improving upon several deficiencies in LCD displays.  For one, it’s a self-emissive technology and does not require a backlight.  This has huge implications for saving panel power, cost and weight – an OLED panel can be extremely thin. Additionally, it can support a wider range of colors than comparable LCD panels can, and its display of moving images is also superior to that of LCDs. What’s more, it supports a wide viewing angle and provides high contrast. OLEDs have an electrical signaling and data interface similar to that of TFT LCD panels. 

OLED中的“有机”是对夹在两个电极之间的材料而言。当电荷穿过这一有机材料时,这种有机物就会发光。该显示技术仍属一种新生技术,但它有望大大改善LCD显示的若干不足。例如,它是一种自发光的技术,无需背光,这就意味着将大大降低显示器的功耗、成本及其重量——一个OLED平板可以非常轻薄。此外,它可以比LCD平板提供更丰富的色彩,对运动图像的显示效果也优于LCD。此外,它支持更宽的视角和更高的对比度。OLED的电子时序发生器和数据接口与TFT LCD平板类似。

For all its advantages, so far the most restrictive aspect of the OLED display is its limited lifetime.  The organic material breaks down after a few thousand hours of use, although this number has now improved in some displays to over 10,000 hours – quite suitable for many portable multimedia applications.  It is here where OLEDs have their brightest future -- in cellphones, digital cameras, and the like.  However, it is also quite possible that we’ll ultimately see televisions or computer monitors based on OLED technology in the near future.  For the time being, as LCD panel technology keeps improving, the OLED mass production timeline gets pushed out incrementally. 

尽管具有上述种种优点,目前限制OLED显示应用的最主要因素是其有限的寿命。在使用几千小时后,有机材料就会击穿,尽管现在某些显示器的这一量值已经超过了10 000小时——非常适合于许多便携式多媒体应用。因此,OLED在手机、数码相机等产品中,具有广泛的应用前景。然而,我们相信在不久的将来,会看到基于OLED技术的电视或者计算机显视器。目前,随着LCD技术的不断发展,OLED在大规模生产方面的安排与规划也日趋清晰。

Now that we’ve covered the basics of connecting video streams within a system, it’s time to take a look inside the processor, to see how it handles video efficiently.  This will be the subject of the next article in this series. 

既然我们已经阐述了系统内部的视频流连接方面的基础知识,下面就该对处理器的内部进行一番考察,了解其是如何能有效处理视频信号的。这将是本系列文章下一部分的主题。