25-01-2013, 02:05 PM
Data Hiding in Motion Vectors of Compressed Video
Based on Their Associated Prediction Error
Data Hiding in Motion Vectors.pdf (Size: 340.94 KB / Downloads: 36)
Abstract—
This paper deals with data hiding in compressed
video. Unlike data hiding in images and raw video which operates
on the images themselves in the spatial or transformed domain
which are vulnerable to steganalysis, we target the motion vectors
used to encode and reconstruct both the forward predictive
(P)-frame and bidirectional (B)-frames in compressed video. The
choice of candidate subset of these motion vectors are based on
their associated macroblock prediction error, which is different
from the approaches based on the motion vector attributes such as
the magnitude and phase angle, etc. A greedy adaptive threshold
is searched for every frame to achieve robustness while maintaining
a low prediction error level. The secret message bitstream
is embedded in the least significant bit of both components of
the candidate motion vectors. The method is implemented and
tested for hiding data in natural sequences of multiple groups of
pictures and the results are evaluated. The evaluation is based on
two criteria: minimum distortion to the reconstructed video and
minimum overhead on the compressed video size. Based on the
aforementioned criteria, the proposed method is found to perform
well and is compared to a motion vector attribute-based method
from the literature.
Index Terms—Data hiding, motion vectors, Motion Picture Expert
Group (MPEG), steganography.
INTRODUCTION
DATA hiding [1] and watermarking in digital images
and raw video have wide literature. This paper targets
the internal dynamics of video compression, specifically the
motion estimation stage. We have chosen this stage because
its contents are processed internally during the video encoding/
decoding which makes it hard to be detected by image
steganalysis methods and is lossless coded, thus it is not prone
to quantization distortions. In the literature, most work applied
on data hiding in motion vectors relies on changing the motion
vectors based on their attributes such as their magnitude, phase
angle, etc. In [2] and [3], the data bits of the message are hidden
in some of the motion vectors whose magnitude is above a
predefined threshold, and are called candidate motion vectors
(CMVs). A single bit is hidden in the least significant bit of the
larger component of each CMV. In [4], the data is encoded as a
region where the motion estimation is only allowed to generate
motion vectors in that specified region. Using the variable
Manuscript received June 11, 2010; revised September 11, 2010; accepted
October 24, 2010. Date of publication November 01, 2010; date of current version
February 16, 2011. The associate editor coordinating the review of this
manuscript and approving it for publication was Dr. Wenjun Zeng.
The author is with the Military Technical College (MTC), Ministry of Defense,
Cairo, Egypt (e-mail: haly[at]ieee.org; h.aly[at]alumni.uottawa.ca).
Color versions of one or more of the figures in this paper are available online
the authors in [5] used every 2 bits from the message bitstream
to select one of the four sizes for the motion estimation process.
The authors in [6] and [7] embed the data in video using the
phase angle between two consecutive CMV. These CMV are
selected based on the magnitude of the motion vectors as in [2].
The message bitstream is encoded as phase angle difference in
sectors between CMV. The block matching is constrained to
search within the selected sector for a magnitude to be larger
than the predefined threshold.
The methods in [2]–[7] focused on finding a direct reversible
way to identify the CMV at the decoder and thus relied on the
attributes of the motion vectors. In this paper, we take a different
approach directed towards achieving a minimum distortion to
the prediction error and the data size overhead. This approach
is based on the associated prediction error and we are faced by
the difficulty of dealing with the nonlinear quantization process;
thus we use an adaptive threshold as discussed in Section IV.
The rest of the paper is organized as follows: in Section II
we overview the terms of video compression and decompression.
The problem definition is given in Section III along with
the evaluation criteria used in the paper. Our proposed method
is given in Section IV followed by the results and analyses in
Section V. Finally, the paper is concluded in Section VI.
BACKGROUND AND NOTATIONS
In this section, we overview lossy video compression to define
our notation and evaluation metrics. At the encoder, the intrapredicted
(I)-frame is encoded using regular image compression
techniques similar to JPEG but with different quantization
table and step; hence the decoder can reconstruct it independently.
The I-frame is used as a reference frame for encoding a
group of forward motion-compensated prediction (P)- or bidirectionally
predicted (B)-frames. In the commonly used Motion
Picture Expert Group (MPEG-2) standard [8], the video is ordered
into groups of pictures (GOPs) whose frames can be encoded
in the sequence: [I,B,B,P,B,B,P,B,B]. The temporal redundancy
between frames is exploited using block-based motion
estimation that is applied on macroblocks of size
in or and searched in target frame(s). Generally, the
motion field in video compression is assumed to be translational
with horizontal component and vertical component
and denoted in vector form by for the spatial variables
in the underlying image. The search window is constrained
by assigning limited -bits for ; in other words, both
and , which corresponds to
pixels if the motion vectors are
computed with half-pixel accuracy. An exhaustive search in the
window of size can be done to find the optimal
DATA HIDING IN MOTION VECTORS OF COMPRESSED VIDEO BASED ON THEIR ASSOCIATED PREDICTION ERROR 15
motion vector satisfying the search criterion which needs many
computations, or suboptimal motion vectors can be obtained
using expeditious methods such as three steps search, etc.; this
is based on the video encoding device processing power, the required
compression ratio, and the reconstruction quality. Since
does not represent the true motion in the video then the compensated
frame using must be associated with a
prediction error in order to be able to reconstruct
with minimum distortion at the decoder in
case of a P-frame. Similar operation is done for the B-frame but
with the average of both the forward compensation from a previous
reference frame and backward compensation from a next
reference frame. is of the size of an image and is thus lossy
compressed using JPEG compression reducing its data size. The
lossy compression quantization stage is a nonlinear process and
thus for every motion estimation method, the pair will
be different and the data size of the compressed error will
be different. The motion vectors are lossless coded and thus
become an attractive place to hide a message that can be blindly
extracted by a special decoder.
The decoder receives the pair , applies motion compensation
to form or , and decompresses to obtain a reconstructed
. Since and are different by the effect of
the quantization, then the decoder in unable to reconstruct
identically but it alternatively reconstructs . The
reconstruction quality is usually measured by the mean squared
error , represented as peak signal-to-noise ratio (PSNR),
and we denote it by .
PROBLEM DEFINITION
Data hiding in motion vectors at the encoder replaces the regular
pair , due to tampering the motion vectors, to become
, where the superscript denotes hiding. We define
data hiding in motion vectors of compressed video in the context
of super-channel [9]; the secret message is hidden in the
host video signal to produce the composite signal
. The composite signal is subject to video lossy
compression to become . The message should
survive the video lossy compression and can be identically extracted
from . This robustness constrain should have low distortion
effect on the reconstructed video as well as low effect
on the data size (bit rate). Given that can be identically extracted,
in this paper, we use two metrics to evaluate data-hiding
algorithms in compressed video which are: