27-10-2012, 06:11 PM
Detecting Steganographic Content on the Internet
Detecting Steganographic.pdf (Size: 813.88 KB / Downloads: 39)
Abstract
Steganography is used to hide the occurrence of communication. Recent suggestions in US newspapers
indicate that terrorists use steganography to communicate in secret with their accomplices. In particular,
images on the Internet were mentioned as the communication medium. While the newspaper articles
sounded very dire, none substantiated these rumors.
To determine whether there is steganographic content on the Internet, this paper presents a detection
framework that includes tools to retrieve images from the world wide web and automatically detect
whether they might contain steganographic content. To ascertain that hidden messages exist in images, the
detection framework includes a distributed computing framework for launching dictionary attacks hosted
on a cluster of loosely coupled workstations. We have analyzed two million images downloaded from eBay
auctions but have not been able to find a single hidden message.
Introduction
Steganography is the art and science of hiding the
fact that communication is taking place. Steganographic
systems can hide messages inside of images
or other digital objects. To a casual observer inspecting
these images, the messages are invisible.
In February 2000, USA Today reported that terrorists
are using steganography to hide their communication
from law enforcement [4]. According to
them, messages are being hidden in images posted
to Internet auction sides like eBay or Amazon. The
article lacked any technical information that would
allow a reader to verify these claims. Nonetheless,
the article was echoed by a number of other news
sources.1
To assess the claim that steganographic content is
regularly posted to the Internet, we need a way to
detect steganographic content in images automatically.
This paper presents a steganography detection
framework that begins with a web crawler that
downloads JPEG images from the Internet. Using
statistical analysis, a subset of images likely to contain
steganographic content is identified. The analysis
is statistical, i.e. there is no guarantee that an
identified image really contains a hidden message,
so we also describe a distributed computing framework
that launches a dictionary attack hosted on a
cluster of loosely-coupled workstations to reveal any
hidden content.
Steganography Background
The term “Information Hiding” relates to both watermarking
and steganography. Watermarking usually
refers to methods that hide information in a
data object so that the information is robust to
modifications. That means, it should be impossible
to remove a watermark without degrading the
quality of the data object.
On the other hand, steganography refers to hidden
information that is fragile. Modifications to the
cover medium may destroy it.
Watermarking and steganography differ in another
important way: while steganographic information
must never be apparent to a viewer unaware of its
presence, this feature is optional for a watermark.
Statistical Analysis
Statistical tests can reveal if an image has been
modified by steganography by testing whether an
image’s statistical properties deviate from a norm.
Some tests are independent of the data format and
just measure the entropy of the redundant data.
The simplest test measures the correlation towards
one. A more sophisticated one is Ueli Maurer’s
“Universal Statistical Test for Random Bit Generators”
[7]. We expect images with hidden data to
have a higher entropy than those without.
These simple tests are not able to decide automatically
if an image contains a hidden message. Westfeld
and Pfitzmann have observed that embedding
encrypted data into a GIF image changes the histogram
of its color frequencies [16]. One property
of encrypted data is that the one and the zero bit
are equally likely. When using the least-significant
bit method to embed encrypted data into an image
that contains the color two more often than the
color three, the color two is changed more often to
the color three than the other way around. As a result,
the difference in color frequency between two
and three has been reduced by the embedding.
The same is true for JPEG images. Instead of
measuring the color frequencies, we analyze the frequency
of the DCT coefficients. Figure 2 shows an
example where embedding a hidden messages causes
noticeable differences to the DCT coefficient histogram.
JPHide Detection
Because JPHide modifies the DCT coefficients in
a fixed order determined by a table, we rearrange
the coefficients in that order before computing the
probability of embedding. However, there are two
exceptions that influence the detection.
JPHide modifies the DCT coefficients −1, 0 and 1 in
a special way. As a result, the modifications to these
coefficients can not be detected by the 2-test. However,
simply ignoring these coefficients still allows us
to detect content embedded with JPHide. We also
ignore modifications to the second-least-significant
bits, which are not as frequent as modifications to
the least-significant bits.
Similar to JSteg, we stop computing the probability
of embedding once it falls below a certain threshold.
Related Work
Fridrich et al. analyze the security of steganographic
systems that embed information in the LSB of color
images [2]. They find that the number of pairs of
“very close” colors increases when hidden messages
have been embedded. While they are able to detect
steganographic content, they are not able to differentiate
between steganographic systems.
9 Conclusion
Steganography can be used for hidden communication.
There are widely reported rumors that images
on auction sites contain hidden messages. To verify
these claims, we developed new techniques and
software to find hidden messages on the Internet