25-01-2013, 03:16 PM
DoubleGuard: Detecting Intrusions in Multitier Web Applications
DoubleGuard.pdf (Size: 1.38 MB / Downloads: 171)
Abstract
Internet services and applications have become an inextricable part of daily life, enabling communication and the
management of personal information from anywhere. To accommodate this increase in application and data complexity, web services
have moved to a multitiered design wherein the webserver runs the application front-end logic and data are outsourced to a database
or file server. In this paper, we present DoubleGuard, an IDS system that models the network behavior of user sessions across both
the front-end webserver and the back-end database. By monitoring both web and subsequent database requests, we are able to ferret
out attacks that an independent IDS would not be able to identify. Furthermore, we quantify the limitations of any multitier IDS in terms
of training sessions and functionality coverage. We implemented DoubleGuard using an Apache webserver with MySQL and
lightweight virtualization. We then collected and processed real-world traffic over a 15-day period of system deployment in both
dynamic and static web applications. Finally, using DoubleGuard, we were able to expose a wide range of attacks with 100 percent
accuracy while maintaining 0 percent false positives for static web services and 0.6 percent false positives for dynamic web services.
Index Terms—Anomaly detection, virtualization, multitier web application.
INTRODUCTION
WEB-DELIVERED services and applications have increased
in both popularity and complexity over the past few
years. Daily tasks, such as banking, travel, and social
networking, are all done via the web. Such services typically
employ a webserver front end that runs the application user
interface logic, as well as a back-end server that consists of a
database or file server. Due to their ubiquitous use for
personal and/or corporate data, web services have always
been the target of attacks. These attacks have recently
become more diverse, as attention has shifted from attacking
the front end to exploiting vulnerabilities of the web
applications [6], [5], [1] in order to corrupt the back-end
database system [40] (e.g., SQL injection attacks [20], [43]). A
plethora of Intrusion Detection Systems (IDSs) currently
examine network packets individually within both the
webserver and the database system. However, there is very
little work being performed on multitiered Anomaly
Detection (AD) systems that generate models of network
behavior for both web and database network interactions. In
such multitiered architectures, the back-end database server
is often protected behind a firewall while the webservers are
remotely accessible over the Internet. Unfortunately, though
they are protected from direct remote attacks, the back-end
systems are susceptible to attacks that use web requests as a
means to exploit the back end.
To protect multitiered web services, Intrusion detection
systems have been widely used to detect known attacks by
matching misused traffic patterns or signatures [34], [30],
[33], [22]. A class of IDS that leverages machine learning can
also detect unknown attacks by identifying abnormal network
traffic that deviates from the so-called “normal”
behavior previously profiled during the IDS training phase.
Individually, the web IDS and the database IDS can detect
abnormal network traffic sent to either of them. However, we
found that these IDSs cannot detect cases wherein normal
traffic is used to attack the webserver and the database
server. For example, if an attacker with nonadmin privileges
can log in to a webserver using normal-user access
credentials, he/she can find a way to issue a privileged
database query by exploiting vulnerabilities in the webserver.
Neither the web IDS nor the database IDS would detect
this type of attack since the web IDS would merely see typical
user login traffic and the database IDS would see only the
normal traffic of a privileged user. This type of attack can be
readily detected if the database IDS can identify that a
privileged request from the webserver is not associated with
user-privileged access. Unfortunately, within the current
multithreaded webserver architecture, it is not feasible to
detect or profile such causal mapping between webserver
traffic and DB server traffic since traffic cannot be clearly
attributed to user sessions.
In this paper, we present DoubleGuard, a system used to
detect attacks in multitiered web services. Our approach can
create normality models of isolated user sessions that include
both the web front-end (HTTP) and back-end (File or SQL)
network transactions. To achieve this, we employ a lightweight
virtualization technique to assign each user’s web
session to a dedicated container, an isolated virtual computing
environment. We use the container ID to accurately
associate the web request with the subsequent DB queries.
Thus, DoubleGuard can build a causal mapping profile by
taking both the webserver and DB traffic into account.
. M. Le and A. Stavrou are with the Department of Computer Science,
George Mason University, Research Hall, 4400 University Drive, Fairfax,
VA 22030. E-mail: {mlep, astavrou}[at]gmu.edu.
. B.B. Kang is with the Department of Applied Information Technology,
George Mason University, Research Hall, Room 431, 4400 University
Drive, Fairfax, VA 22030. E-mail: bkang5[at]gmu.edu.
Manuscript received 1 Dec. 2010; revised 4 Apr. 2011; accepted 4 Oct. 2011;
published online 10 Nov. 2011.
For information on obtaining reprints of this article, please send e-mail to:
tdsc[at]computer.org, and reference IEEECS Log Number
TDSCSI-2010-12-0228.
Digital Object Identifier no. 10.1109/TDSC.2011.59.
1545-5971/12/$31.00 2012 IEEE Published by the IEEE Computer Society
We have implemented our DoubleGuard container
architecture using OpenVZ [14], and performance testing
shows that it has reasonable performance overhead and is
practical for most web applications. When the request rate is
moderate (e.g., under 110 requests per second), there is
almost no overhead in comparison to an unprotected vanilla
system. Even in a worst case scenario when the server was
already overloaded, we observed only 26 percent performance
overhead. The container-based web architecture not
only fosters the profiling of causal mapping, but it also
provides an isolation that prevents future session-hijacking
attacks. Within a lightweight virtualization environment, we
ran many copies of the webserver instances in different
containers so that each one was isolated from the rest. As
ephemeral containers can be easily instantiated and destroyed,
we assigned each client session a dedicated container
so that, even when an attacker may be able to compromise
a single session, the damage is confined to the compromised
session; other user sessions remain unaffected by it.
Using our prototype, we show that, for websites that do
not permit content modification from users, there is a direct
causal relationship between the requests received by the
front-end webserver and those generated for the database
back end. In fact, we show that this causality-mapping model
can be generated accurately and without prior knowledge of
web application functionality. Our experimental evaluation,
using real-world network traffic obtained from the web and
database requests of a large center, showed that we were able
to extract 100 percent of functionality mapping by using as
few as 35 sessions in the training phase. Of course, we also
showed that this depends on the size and functionality of the
web service or application. However, it does not depend on
content changes if those changes can be performed through a
controlled environment and retrofitted into the training
model. We refer to such sites as “static” because, though they
do change over time, they do so in a controlled fashion that
allows the changes to propagate to the sites’ normality
models.
In addition to this static website case, there are web
services that permit persistent back-end data modifications.
These services, which we call dynamic, allow HTTP
requests to include parameters that are variable and depend
on user input. Therefore, our ability to model the causal
relationship between the front end and back end is not
always deterministic and depends primarily upon the
application logic. For instance, we observed that the backend
queries can vary based on the value of the parameters
passed in the HTTP requests and the previous application
state. Sometimes, the same application’s primitive functionality
(i.e., accessing a table) can be triggered by many different
webpages. Therefore, the resulting mapping between web
and database requests can range from one to many,
depending on the value of the parameters passed in the
web request.
To address this challenge while building a mapping model
for dynamic webpages, we first generated an individual
training model for the basic operations provided by the web
services. We demonstrate that this approach works well
in practice by using traffic from a live blog where we
progressively modeled nine operations. Our results show
that we were able to identify all attacks, covering more than
99 percent of the normal traffic as the training model is
refined.
RELATED WORK
A network Intrusion Detection System can be classified into
two types: anomaly detection and misuse detection. Anomaly
detection first requires the IDS to define and characterize
the correct and acceptable static form and dynamic behavior
of the system, which can then be used to detect abnormal
changes or anomalous behaviors [26], [48]. The boundary
between acceptable and anomalous forms of stored code and
data is precisely definable. Behavior models are built by
performing a statistical analysis on historical data [31], [49],
[25] or by using rule-based approaches to specify behavior
patterns [39]. An anomaly detector then compares actual
usage patterns against established models to identify
abnormal events. Our detection approach belongs to anomaly
detection, and we depend on a training phase to build the
correct model. As some legitimate updates may cause model
drift, there are a number of approaches [45] that are trying to
solve this problem. Our detection may run into the same
problem; in such a case, our model should be retrained for
each shift.
Intrusion alerts correlation [47] provides a collection of
components that transform intrusion detection sensor alerts
into succinct intrusion reports in order to reduce the number
of replicated alerts, false positives, and nonrelevant positives.
It also fuses the alerts from different levels describing a single
attack, with the goal of producing a succinct overview of
security-related activity on the network. It focuses primarily
on abstracting the low-level sensor alerts and providing
compound, logical, high-level alert events to the users.
DoubleGuard differs from this type of approach that
correlates alerts from independent IDSs. Rather, Double-
Guard operates on multiple feeds of network traffic using a
single IDS that looks across sessions to produce an alert
without correlating or summarizing the alerts produced by
other independent IDSs.