27-09-2016, 04:23 PM
1456449210-SkeletonRatecontrolsurveyreviseddaft01backup2522016430PM.docx (Size: 366.37 KB / Downloads: 7)
Abstract
In this paper, we present an overview of rate control in HEVC and an insight to the rate control in SHVC. We introduce rate control in video coding where we outline the basic operating principle of a rate control algorithm, its corresponding inputs and outputs and performance measures. We describe the components of a typical rate control algorithm and delve into each component. In summary, we explain the structure of a rate control algorithm. We then review some of the rate control algorithms of prior video coding standards.The features of HEVC that have driven the need for new rate control techniques due to the structural changes are outlined. A pointer to possible classification criteria is given. We then classify rate control algorithms into categories and study them independently. We try to identify a discerning feature in each algorithm based on application. An introduction to the scalable extension of HEVC namely SHVC is provided and an idea about the possible challenge in SHVC rate control design is highlighted.We lastlyidentify two prominent unresolved research issues in HEVC rate control and outlined possible future directions.
1. Introduction
Rate control is one of the most significant coding tools in any video coding standard. This is the tool that plays an important role in transmission of compressed video over band limited channels. With the growing multimedia traffic over the internet with a wide variety of end users requesting high quality content it is imperative that an optimal rate control algorithm be employed in any of the commercial encoders. This is not a normative coding tool of any video coding standard.However it is included as an informative part of most video coding standards in their recommendations. The rate control algorithm basically is one which regulates the output bit rate of an encoder such that it meets the bandwidth requirements of the network over which it needs to be transmitted.
A rate control algorithm first allocates a budget of bits to the group of pictures or frames or the coding units depending on the level at whichthe rate control is applied at. The allocated bits for each unit at its level are then used to calculate model parameters which in turn are used to encode the video to reach the target bit rate on which the allocation is modelled. In the theory of video compression, it is the quantization step size which determines the degree of spatial detail that is retained. Smaller the quantization step size larger the degree of spatial detail saved. The parameter which regulates the quantization step size is called the quantization parameter (QP). This QP could be manually supplied at the input of an encoder, this would encode the video nearly at constant quality, however this could dramatically vary the output bit rate which is undesirable in any commercial encoder. In order to keep the variations in output bitrate at a minimum, the QP is modelled as an output of the rate control algorithm which takes into account various characteristics of video sequences and also the network conditions.
The input to a typical rate control algorithm is the target bitrate for the video sequence to be encoded. Depending on the design of the rate control algorithm the target bits are distributed at different levels of granularity. The fundamental consideration of any rate control algorithm is the rate distortion optimization. The parameters which have helped achieve the target bit rate must also ensure that the distortion is minimized. This is a constrained optimization problem which is explained in the next section. Generally, the QP is the ultimate output of any rate control algorithm and this is used to encode the sequence to meet the target bit rate.
The two main performance metrics of any rate control algorithm are the accuracy of the bitrate achievement and the Peak Signal to Noise Ratio. The former indicates how close the final encoded bitrate is to the demanded target bit rate, while the latter indicates the quality of the encoded sequence. These performance metrics can be quantified by comparing their respective R-D performance with a known anchor. That is their values relative to the values of the anchor is taken for the performance comparison of different rate control algorithms. This relative PSNR and bitrate values are called BD-PSNR and BD-Bitrate.
Components of aRate Control Algorithm
In spite of the various approaches employed in different rate control algorithms in literature we can come up with a general structure for a rate control algorithm. This structure would in general outline the components of a rate control algorithm.
2.1 Bit allocation
As studied before the starting point of any rate control algorithm is to allocate the target bit budget among the different units of the corresponding granularity. The bit budget is a limited resource and hence the sum of the bits allocated to all the units of a video sequence should not exceed the budget.
∑_(u=1)^N▒〖r≤R〗 (1)
Bit allocation often follows a strategy. These strategies would be to find an optimal division of the budget in order to achieve a certain cost or performance function. The simplest of these would be to find an allocation that minimizes the distortion or one that maximizes the perceived quality. Once an objective function has been defined, the bit allocation works by allocating bits hierarchically from the GOP level to the frame level and from the frame level to the coding unit level keeping the constraint on budget in count. This is a critical step in a rate control algorithm and there have been variety of approaches in literature depending on the end application for the rate control algorithm. A cooperative game theoretical approach can be used if fairness and efficiency are principal criteria. I. Ahmad et al [#] proposed a cooperative game theoretical solution for the bit allocation problem in H.264. This was based on finding a Nash Bargaining solution to the cooperative games played by the macro blocks within a frame. Cheng-Hsin Hsu et al [#] proposed a dynamic programing method to find the optimal bit allocation for multilayer streams in the scalable extension of H.264. Zhongzhu Yang et al [#] proposed a buffer status based bit allocation for low delay applications which have a small buffer size constraint. Lin Sun et al [#] proposed an allocation based on the edge complexity measure of I frames. This method allocated bits according to the complexity measure of I frames. While, Shengxi Li et al [#] proposed a weight based bit allocation for certain regions of interest (facial features) in conversational videos. These are a good indication about the range of outcome based bit allocation that can be employed.
2.2 Rate distortion optimization (RDO)
As discussed earlier this is the fundamental step in any rate control application. This is closely related to the bit allocation problem as more often than not it forms the basis of bit allocation. Rate distortion optimization is to minimize the distortion in the video by increasing the bitrate without violating the constraints on the bitrate. This also has a bearing on the design of the quantizer and the weighted quantization matrix.The process of accurately deriving a rate distortion relationship is a complex one. This is not known before hand and it needs multiple passes of encoding to be able determine a QP that satisfies the optimality conditions. Hence in real applications the R-D model is estimated based on fitting the rate and distortion curves using pre-encoded sequences under different conditions.
Based on premise discussed hitherto there are different classes of rate distortion models that have been developed in literature. One class of rate control algorithms attempts to build the relationship between bitrate R and Quantization Q. These are called Q-domain R-D models [. The Q-domain R-D models are primarily of two kinds:
Linear Q-domain R-D model, where in there is a linear relationship with the bitrate and quantization parameter or step size.
R(QP)=(α*S)/QP (2)
where α is a model parameter which can be determined by linear regression and S is the complexity of the source.
Quadratic Q-domain R-D model, where there is a quadratic relation between the bitrate and the quantization parameter or step size
R(Q)=(x1*MAD)/Q+ (x2*MAD)/Q^2 +R_h (3)
where x1 and x2 are model parameters related to the video content which are updated via linear regression and MAD is the mean absolute difference value of the source and R_h is the number of header bits.
Another class of rate control algorithms model the relationship between : the percentage of zeroes among quantized transform coefficients and the bitrate R. This class of R-D models are called -domain R-D models [10-12 of [3]].
R=θ(1-ρ) (4)
where θ is a model parameter related to the video content and is the percentage of zeroes among quantized transform coefficients.
The third class which has proven to most accurately model the R-D function is the -domain domain R-D model [3]. Where is the slope of the R-D curve which has an accurate relationship with the number of bits to be used for encoding.
R=α*λ^β (5)
Further the relationship between R and D irrespective of the domain is modelled based on the probability distribution of the transformed coefficients where the Gaussian Distribution[34], Mixture Gaussian Distribution [35], Generalized Gaussian Distribution(GGD) [36] [37], or Cauchy Distribution [38] is employed.
A hyperbolic Cauchy distributed R-D relationship is expressed as below:
R(D)=C*R^(-K) (6)
2.3 Quantization parameter (QP) determination
The determination of the quantization parameter influences the actual degree of spatial detailed retained. This follows the bit allocation and modelling of the R-D function. Depending on the domain of the R-D model the quantization parameter or the step size is determined and used for subsequent encoding. The QP for each unit is clipped within a couple of units of the neighboring units to ensure quality smoothening.
All of the above components described are applied at different levels of the video content, starting from the GOP level, followed by the frame level and then at the coding unit level until there is no unit remaining to be encoded. The model parameters are recomputed after the encoding of each unit and the buffer status is updated accordingly.
3. Rate Control in Previous Coding Standards and Comparison with HEVC
Over the years, rate control has been the epicenter of research in video coding with a lot of researches attaining great success. Rate control for video coding was proposed as early as 1992.
TM5 was one of the first rate control algorithms in literature and the principle of TM5 was the distribution of the coding bits to prevent buffer overflow and underflow keeping the source complexity into account. Hierarchical bit allocation was followed as a first step in the algorithm. The QP was computed for each macroblock under consideration based on the accuracy of the bitrate achievement of the previous macroblock. Adaptive quantization follows to ensure that there are no abrupt changes in the QP to achieve uniform picture quality. The MPEG-4 verification model (VM) 8 rate control involved initialization of first and second order coefficients followed by a weighted average based target bitrate computation and QP calculation for encoding. QP was clipped between 1 and 31 in this algorithm. The test model near term 8 (TMN8) algorithm fundamentally had two steps. Bit allocation at the frame level and adaptive quantization at the macroblock level. The H.263 standard used the TMN8 rate control model. The frame to be encoded was decomposed into 16×16 macroblocks and the four 8×8 blocks within a macroblock were transformed into DCT coefficients. The coefficients were then quantized and end encoded using variable length coding. The H.264 rate control algorithm had a linear prediction model of the mean absolute difference of the prediction error. This algorithm came out with a novel scheme of breaking down the rate control scheme into the three levels: GOP level, frame level and basic unit level. Rate allocation was considered at each level according to the source complexity and the QP was computed accordingly.
HEVC or high efficiency video coding was standardized jointly by ISO/IEC MPEG and ITU-T VCEG and the standard promised a bit rate reduction of nearly 50% for a given quality. There were several differences between the predecessor H.264 and HEVC. Among them the expanded prediction and transform block sizes along with the multi depth level quad tree partitioning made the coding structure extremely flexible. HEVC can hence efficiently encode high resolution and low resolution videos. The macroblock of H.264 was replaced by Coding Unit (CU) which is composed of prediction units (PU) and transform units (TU). The coding tools like motion estimation and compensation (ME and MC), quantization, transform and entropy encoding are all applied at this level. The size of the largest coding unit and the number of predefined depth is signaled at the sequence level. The PU has block sizes ranging from N×N to 2N×2N including non-uniform block sizes. The TU block sizes range from 4×4 to 32×32. Fig 4(a) and (b) show the HEVC block partitions and quad-tree partitions of the CU and TU blocks. In a CU block PU performs ME and MC to produce a residual signal with different sizes of the prediction block. Transform and quantization for the residual information are performed in the TU block. This differentially sized coding structure it is challenging to model the residual signals for rate control applications. Since its inception as a standard, several works have been published in literature regarding the development of rate control techniques for HEVC owing to its flexible coding structure. This paper attempts to classify and discuss some of these techniques based on the goal of each rate control technique.
Conventional rate control algorithms have all been found to be useful for predecessor coding standards, all of which have a simple coding structure. These algorithms employ the same rate and distortion models across all the coding levels. They model all the transformed coefficients usually on a single PDF and for the case of HEVC, do not consider various depths of CU and the characteristics of the residual signals due to the variable size and depth of ME/MC as well as transform. The very first HEVC test model had a transplant of the rate control algorithm of the H.264 test model and this gave a reduced performance compared to the case with no rate control enabled. Additionally, the increased number of prediction modes in intra coding make it essential to treat the intra frames as a special case and have a differential bit allocation strategy. The flexible coding structure of HEVC can be leveraged to specific applications like low delay applications where certain type of frame structure is adopted. Conversational videos could use an approach that focuses on specific regions of interest in a frame. The versatility of the coding structure in essence pushes the need for new rate control algorithms for HEVC. In the next section we classify some of these algorithms and discuss them under their classification.