23-02-2013, 02:15 PM
DoubleGuard: Detecting Intrusions In Multi-tier
Web Applications
DoubleGuard.pdf (Size: 635.41 KB / Downloads: 72)
Abstract—
Internet services and applications have become an
inextricable part of daily life, enabling communication and
the management of personal information from anywhere. To
accommodate this increase in application and data complexity,
web services have moved to a multi-tiered design wherein the web
server runs the application front-end logic and data is outsourced
to a database or file server.
In this paper, we present DoubleGuard, an IDS system that
models the network behavior of user sessions across both the
front-end web server and the back-end database. By monitoring
both web and subsequent database requests, we are able to
ferret out attacks that an independent IDS would not be able to
identify. Furthermore, we quantify the limitations of any multitier
IDS in terms of training sessions and functionality coverage.
We implemented DoubleGuard using an Apache web server
with MySQL and lightweight virtualization. We then collected
and processed real-world traffic over a 15-day period of system
deployment in both dynamic and static web applications. Finally,
using DoubleGuard, we were able to expose a wide range of
attacks with 100% accuracy while maintaining 0% false positives
for static web services and 0.6% false positives for dynamic web
services.
INTRODUCTION
Web-delivered services and applications have increased in
both popularity and complexity over the past few years. Daily
tasks, such as banking, travel, and social networking, are
all done via the web. Such services typically employ a web
server front-end that runs the application user interface logic,
as well as a back-end server that consists of a database or
file server. Due to their ubiquitous use for personal and/or
corporate data, web services have always been the target of
attacks. These attacks have recently become more diverse, as
attention has shifted from attacking the front-end to exploiting
vulnerabilities of the web applications [6], [5], [1] in order to
corrupt the back-end database system [40] (e.g., SQL injection
attacks [20], [43]). A plethora of Intrusion Detection Systems
(IDS) currently examine network packets individually within
both the web server and the database system. However, there
is very little work being performed on multi-tiered Anomaly
Detection (AD) systems that generate models of network
behavior for both web and database network interactions. In
such multi-tiered architectures, the back-end database server
is often protected behind a firewall while the web servers are
remotely accessible over the Internet. Unfortunately, though
they are protected from direct remote attacks, the back-end
systems are susceptible to attacks that use web requests as a
means to exploit the back-end.
To protect multi-tiered web services, Intrusion detection
systems (IDS) have been widely used to detect known attacks
by matching misused traffic patterns or signatures [34], [30],
[33], [22]. A class of IDS that leverages machine learning can
also detect unknown attacks by identifying abnormal network
traffic that deviates from the so-called “normal” behavior
previously profiled during the IDS training phase. Individually,
the web IDS and the database IDS can detect abnormal
network traffic sent to either of them. However, we found
that these IDS cannot detect cases wherein normal traffic is
used to attack the web server and the database server. For
example, if an attacker with non-admin privileges can log in to
a web server using normal-user access credentials, he/she can
find a way to issue a privileged database query by exploiting
vulnerabilities in the web server. Neither the web IDS nor
the database IDS would detect this type of attack since the
web IDS would merely see typical user login traffic and the
database IDS would see only the normal traffic of a privileged
user. This type of attack can be readily detected if the database
IDS can identify that a privileged request from the web server
is not associated with user-privileged access. Unfortunately,
within the current multi-threaded web server architecture, it is
not feasible to detect or profile such causal mapping between
web server traffic and DB server traffic since traffic cannot be
clearly attributed to user sessions.
In this paper, we present DoubleGuard, a system used to
detect attacks in multi-tiered web services. Our approach can
create normality models of isolated user sessions that include
both the web front-end (HTTP) and back-end (File or SQL)
network transactions. To achieve this, we employ a lightweight
virtualization technique to assign each user’s web session to
a dedicated container, an isolated virtual computing environment.
We use the container ID to accurately associate the web
request with the subsequent DB queries. Thus, DoubleGuard
can build a causal mapping profile by taking both the web
sever and DB traffic into account.
We have implemented our DoubleGuard container architecture
using OpenVZ [14], and performance testing shows that
it has reasonable performance overhead and is practical for
most web applications. When the request rate is moderate (e.g.,
under 110 requests per second), there is almost no overhead
in comparison to an unprotected vanilla system. Even in a
worst case scenario when the server was already overloaded,
we observed only 26% performance overhead. The containerbased
web architecture not only fosters the profiling of causal
mapping, but it also provides an isolation that prevents future
session-hijacking attacks. Within a lightweight virtualization
environment, we ran many copies of the web server instances
in different containers so that each one was isolated from
the rest. As ephemeral containers can be easily instantiated
and destroyed, we assigned each client session a dedicated
container so that, even when an attacker may be able to
compromise a single session, the damage is confined to the
compromised session; other user sessions remain unaffected
by it.
Using our prototype, we show that, for websites that do
not permit content modification from users, there is a direct
causal relationship between the requests received by the frontend
web server and those generated for the database backend.
In fact, we show that this causality-mapping model
can be generated accurately and without prior knowledge of
web application functionality. Our experimental evaluation,
using real-world network traffic obtained from the web and
database requests of a large center, showed that we were
able to extract 100% of functionality mapping by using as
few as 35 sessions in the training phase. Of course, we also
showed that this depends on the size and functionality of the
web service or application. However, it does not depend on
content changes if those changes can be performed through a
controlled environment and retrofitted into the training model.
We refer to such sites as “static” because, though they do
change over time, they do so in a controlled fashion that allows
the changes to propagate to the sites’ normality models.
In addition to this static website case, there are web services
that permit persistent back-end data modifications. These
services, which we call dynamic, allow HTTP requests to
include parameters that are variable and depend on user input.
Therefore, our ability to model the causal relationship between
the front-end and back-end is not always deterministic and
depends primarily upon the application logic. For instance,
we observed that the back-end queries can vary based on the
value of the parameters passed in the HTTP requests and the
previous application state. Sometimes, the same application’s
primitive functionality (i.e., accessing a table) can be triggered
by many different web pages. Therefore, the resulting mapping
between web and database requests can range from one to
many, depending on the value of the parameters passed in the
web request.
To address this challenge while building a mapping model
for dynamic web pages, we first generated an individual
training model for the basic operations provided by the web
services. We demonstrate that this approach works well in
practice by using traffic from a live blog where we progressively
modeled nine operations. Our results show that we were
able to identify all attacks, covering more than 99% of the
normal traffic as the training model is refined.
RELATED WORK
A network Intrusion Detection System (IDS) can be classified
into two types: anomaly detection and misuse detection.
Anomaly detection first requires the IDS to define and characterize
the correct and acceptable static form and dynamic
behavior of the system, which can then be used to detect
abnormal changes or anomalous behaviors [26], [48]. The
boundary between acceptable and anomalous forms of stored
code and data is precisely definable. Behavior models are built
by performing a statistical analysis on historical data [31],
[49], [25] or by using rule-based approaches to specify behavior
patterns [39]. An anomaly detector then compares actual
usage patterns against established models to identify abnormal
events. Our detection approach belongs to anomaly detection,
and we depend on a training phase to build the correct model.
As some legitimate updates may cause model drift, there are
a number of approaches [45] that are trying to solve this
problem. Our detection may run into the same problem; in
such a case, our model should be retrained for each shift.
Intrusion alerts correlation [47] provides a collection of
components that transform intrusion detection sensor alerts
into succinct intrusion reports in order to reduce the number
of replicated alerts, false positives, and non-relevant positives.
It also fuses the alerts from different levels describing a single
attack, with the goal of producing a succinct overview of
security-related activity on the network. It focuses primarily
on abstracting the low-level sensor alerts and providing
compound, logical, high-level alert events to the users. DoubleGuard
differs from this type of approach that correlates
alerts from independent IDSes. Rather, DoubleGuard operates
on multiple feeds of network traffic using a single IDS that
looks across sessions to produce an alert without correlating or
summarizing the alerts produced by other independent IDSs.
An IDS such as [42] also uses temporal information to
detect intrusions. DoubleGuard, however, does not correlate
events on a time basis, which runs the risk of mistakenly
considering independent but concurrent events as correlated
events. DoubleGuard does not have such a limitation as it uses
the container ID for each session to causally map the related
events, whether they be concurrent or not.
Since databases always contain more valuable information,
they should receive the highest level of protection.
Therefore, significant research efforts have been made on
database IDS [32], [28], [44] and database firewalls [21].
These softwares, such as Green SQL [7], work as a reverse
proxy for database connections. Instead of connecting to
a database server, web applications will first connect to a
database firewall. SQL queries are analyzed; if they’re deemed
safe, they are then forwarded to the back-end database server.
The system proposed in [50] composes both web IDS and
database IDS to achieve more accurate detection, and it also
uses a reverse HTTP proxy to maintain a reduced level of
service in the presence of false positives. However, we found
that certain types of attack utilize normal traffics and cannot
be detected by either the web IDS or the database IDS. In
such cases, there would be no alerts to correlate.
Some previous approaches have detected intrusions or vulnerabilities
by statically analyzing the source code or executables
[52], [24], [27]. Others [41], [46], [51] dynamically
track the information flow to understand taint propagations and
detect intrusions. In DoubleGuard, the new container-based
web server architecture enables us to separate the different
information flows by each session. This provides a means
of tracking the information flow from the web server to the
database server for each session. Our approach also does not
require us to analyze the source code or know the application
logic. For the static web page, our DoubleGuard approach does
not require application logic for building a model. However, as
we will discuss, although we do not require the full application
logic for dynamic web services, we do need to know the basic
user operations in order to model normal behavior.
In addition, validating input is useful to detect or prevent
SQL or XSS injection attacks [23], [36]. This is orthogonal to
the DoubleGuard approach, which can utilize input validation
as an additional defense. However, we have found that Double-
Guard can detect SQL injection attacks by taking the structures
of web requests and database queries without looking into the
values of input parameters (i.e., no input validation at the web
sever).
Virtualization is used to isolate objects and enhance security
performance. Full virtualization and para-virtualization are
not the only approaches being taken. An alternative is a
lightweight virtualization, such as OpenVZ [14], Parallels Virtuozzo
[17], or Linux-VServer [11]. In general, these are based
on some sort of container concept. With containers, a group of
processes still appears to have its own dedicated system, yet
it is running in an isolated environment. On the other hand,
lightweight containers can have considerable performance
advantages over full virtualization or para-virtualization. Thousands
of containers can run on a single physical host. There
are also some desktop systems [37], [29] that use lightweight
virtualization to isolate different application instances. Such
virtualization techniques are commonly used for isolation and
containment of attacks. However, in our DoubleGuard, we
utilized the container ID to separate session traffic as a way
of extracting and identifying causal relationships between web
server requests and database query events.