12-07-2012, 12:21 PM
Embedding Secret Data in Html Web Page
Embedding_Secret_Data_in_HTML_Web_Page.pdf (Size: 863.08 KB / Downloads: 38)
Abstract
In this paper, we suggest a novel data hiding technique in an Html Web page.
Html Tags are case insensitive and hence an alphabet in lowercase and one in
uppercase present inside an html tag are interpreted in the same manner by the
browser, i.e., change in case in an web page is imperceptible to the browser.
We basically exploit this redundancy and use it to embed secret data inside an
web page, with no changes visible to the user of the web page, so that he can
not even suspect about the data hiding. The embedded data can be recovered
by viewing the source of the html page. This technique can easily be extended
to embed secret message inside any piece of source-code where the standard
interpreter of that language is case-insensitive.
Introduction
Some techniques for hiding data in executables are already proposed (e.g., Shin
et al [4]). In this paper we introduce a very simple technique to hide secret mes-
sage bits inside source codes as well. We describe our steganographic technique
by hiding inside html source as cover text, but this can be easily extended to any
case-insensitive language source codes. Html Tags are basically directives to the
browser and they carry information regarding how to structure and display the
data on a web page. They are not case sensitive, so tags in either case (or mixed
Contact Author
arXiv:1004.0459v1 [cs.CR] 3 Apr 2010
case) are interpreted by the browser in the same manner (e.g., \< head >" and
\< HEAD >" refers to the same thing). Hence, there is a redundancy and
we can exploit this redundancy. To embed secret message bits into html, if the
cases of the tag alphabets in html cover text are accordingly manipulated, then
this tampering of the cover text will be ignored by the browser and hence it
will be imperceptible to the user, since there will not be any visible dierence
in the web page, hence there will not be any suspect for it as well. Also, when
the web page is displayed in the browser, only the text contents are displayed,
not the tags (those can only be seen when the user does `view source'). Hence,
the secret messages will be kind of hidden to user.
Both redundancy and imperceptibility conditions for data hiding are met,
we use these to embed data in html text. If we do not tamper the html text data
that is to be displayed by the browser as web page (this html cover text is ana-
logical to the cover image, when thought in terms of steganographic techniques
in images [1, 2, 3]), the user will not even suspect about hidden data in text. We
shall only change the case of every character within these Html tags (elements)
in accordance with the secret message bits that we want to embed inside the html
web page. If we think of the browser interpreter as a function, fB : ! we
see that it is non-injective, i.e., not one to one, since fB(x) = fB(y) whenever
x 2 f`A' : : : `Z'g, y 2 f`a' : : : `z'g and Uppercase(y) = x. The extraction process
of the embedded message will also be very simple, one needs to just do `view
source' and observe the case-patterns of the text within tags and can readily
extract the secret message (and see the unseen), while the others will not know
anything.
The length (in bits) of the secret message to be embedded will be upper-
limited by the sum of size of text inside html tags (here we don't consider at-
tribute values for data embedding. In case we consider attribute values for data
embedding, we need to be more careful, since for some tags we should think of
case-sensitivity, e.g. <A HREF=\link.html">, since link le name may be case-
sensitive on some systems, whereas, attributes such as <h2 align=\center"> is
safe). If less numbers of bits to be embedded, we can embed the information
inside Header Tag specifying the length of embedded data (e.g. `<Header 25 >'
if the length of secret data to be embedded is 25 bits) that will not be shown
in the browser (optionally we can encrypt this integer value with some private
key). In order to guarantee robustness of this very simple algorithm one may
use some simple encryption on the data to be embedded.
The Algorithm for Embedding
The algorithm for embedding the secret message inside the html cover text is
very simple and straight-forward. First, we need to separate out the characters
from the cover text that will be candidates for embedding, these are the case-
insensitive text characters inside Html tags. Figure 2 shows a very simplied
automata for this purpose.