06-08-2013, 03:13 PM
Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves
Secure Content Sniffing.pdf (Size: 619.06 KB / Downloads: 12)
Abstract
Cross-site scripting defenses often focus on HTML doc-
uments, neglecting attacks involving the browser’s content-
sniffing algorithm, which can treat non-HTML content as
HTML. Web applications, such as the one that manages this
conference, must defend themselves against these attacks or
risk authors uploading malicious papers that automatically
submit stellar self-reviews. In this paper, we formulate
content-sniffing XSS attacks and defenses. We study content-
sniffing XSS attacks systematically by constructing high-
fidelity models of the content-sniffing algorithms used by
four major browsers. We compare these models with Web
site content filtering policies to construct attacks. To de-
fend against these attacks, we propose and implement a
principled content-sniffing algorithm that provides security
while maintaining compatibility. Our principles have been
adopted, in part, by Internet Explorer 8 and, in full, by
Google Chrome and the HTML 5 working group.
Introduction
For compatibility, every Web browser employs a content-
sniffing algorithm that inspects the contents of HTTP re-
sponses and occasionally overrides the MIME type provided
by the server. For example, these algorithms let browsers
render the approximately 1% of HTTP responses that lack a
Content-Type header. In a competitive browser market,
a browser that guesses the “correct” MIME type is more
appealing to users than a browser that fails to render these
sites. Once one browser vendor implements content sniffing,
the other browser vendors are forced to follow suit or risk
losing market share [1].
Attacks
In this section, we study content-sniffing XSS attacks.
First, we provide some background information. Then, we
introduce content-sniffing XSS attacks. Next, we describe a
technique for constructing models from binaries and apply
that technique to extract models of the content-sniffing
algorithm from four major browsers. Finally, we construct
attacks against two popular Web sites by comparing their
upload filters with our models.
Background
In this section, we provide background information about
how servers identify the type of content included in an HTTP
response. We do this in the context of a Web site that allows
its users to upload content that can later be downloaded by
other users, such as in a photograph sharing or a conference
management site.
Model Extraction
We investigate content-sniffing XSS attacks by extracting
high-fidelity models of content-sniffing algorithms from
browsers and Web sites. When source code is available,
we manually analyze the source code to build the model.
Specifically, we manually extract models of the content-
sniffing algorithms from the source code of two browsers,
Firefox 3 and Google Chrome, and the upload filter of two
Web sites, Wikipedia [13] and HotCRP [14].
Extracting models from Internet Explorer 7 and Sa-
fari 3.12 is more difficult because their source code is
not available publicly. We could use black-box testing to
construct models by observing the outputs generated from
selected inputs, but models extracted by black-box testing
are often insufficiently accurate for our purpose. For exam-
ple, the Wine project [15] used black-box testing and docu-
mentation [16] to re-implement Internet Explorer’s content-
sniffing algorithm, but Wine’s content-sniffing algorithm
differs significantly from Internet Explorer’s content-sniffing
algorithm. For example, the Wine signature for HTML
contains just the <html tag instead of the 10 tags we find in
Internet Explorer’s content-sniffing algorithm by white-box
exploration.
Content-Sniffing Algorithms
We analyze the content-sniffing algorithms used by four
browsers: Internet Explorer 7, Firefox 3, Safari 3.1, and
Google Chrome. We discover that the algorithms follow
roughly the same design but that subtle differences between
the algorithms have dramatic consequences for security. We
compare the algorithms on several key points: the number
of bytes used by the algorithm, the conditions that trigger
sniffing, the signatures themselves, and restrictions on the
HTML signature. We also discuss the “fast path” we observe
in one browser.
Security Analysis
Avoiding privilege escalation protects Web sites that re-
strict the values of the Content-Type header they attach
to untrusted content because the browser will not upgrade
attacker-supplied content to HTML (or another dangerous
type) and will not run the attacker’s malicious JavaScript.
Unfortunately, avoiding privilege escalation is insufficient
to protect all sites that filter uploads. For example, if a
site serves content without a Content-Type header (e.g.,
if the site stores uploaded files in the file system and the
Web server does not recognize the file extension), then the
browser might sniff the uploaded content as HTML, opening
the site up to attack.
Conclusions
Browser content-sniffing algorithms have long been one of
the least-understood facets of the browser security landscape.
In this paper, we study content-sniffing XSS attacks and
defenses. To understand content-sniffing XSS attacks, we
use string-enhanced white-box exploration and source code
inspection to construct high-fidelity models of the content-
sniffing algorithms used by Internet Explorer 7, Firefox 3,
Safari 3.1, and Google Chrome. We use these models to
construct attacks against two Web applications: HotCRP and
Wikipedia.
We describe two defenses for these attacks. For Web sites,
we provide a filter based on our models that blocks content-
sniffing XSS attacks. To protect sites that do not deploy our
filter, we propose two design principles for securing browser
content-sniffing algorithms: avoid privilege escalation and
use prefix-disjoint signatures. We evaluate the security of
these principles in a threat model based on case studies,
and we evaluate the compatibility of these principles using
Google’s search database and metrics from over a billion of
HTTP responses.