07-01-2014, 04:58 PM
A Framework for Constructing Features and Models for Intrusion Detection Systems
Framework for Constructing.pdf (Size: 182.64 KB / Downloads: 14)
ABSTRACT
Intrusion detection (ID) is an important component of infrastructure protection mechanisms.
Intrusion detection systems (IDSs) need to be accurate, adaptive, and extensible. Given these
requirements and the complexities of today’s network environments, we need a more system-
atic and automated IDS development process rather than the pure knowledge encoding and
engineering approaches. This article describes a novel framework, MADAM ID, for Mining
Audit Data for Automated Models for Intrusion Detection. This framework uses data mining
algorithms to compute activity patterns from system audit data and extracts predictive
features from the patterns. It then applies machine learning algorithms to the audit records
that are processed according to the feature definitions to generate intrusion detection rules.
Results from the 1998 DARPA Intrusion Detection Evaluation showed that our ID model was
one of the best performing of all the participating systems. We also briefly discuss our
experience in converting the detection models produced by off-line data mining programs to
real-time modules of existing IDSs.
INTRODUCTION
As network-based computer systems play increasingly vital roles in modern
society, they have become the target of intrusions by our enemies and
criminals. In addition to intrusion prevention techniques, such as user
authentication and authorization, encryption, and defensive programming,
intrusion detection is often used as another wall to protect computer
systems.
The two main intrusion detection techniques are misuse detection and
anomaly detection. Misuse detection systems, for example, IDIOT [Kumar
and Spafford 1995] and STAT [Ilgun et al. 1995], use patterns of well-
known attacks or weak spots of the system to match and identify known
intrusions. For example, a signature rule for the “guessing password
attack” can be “there are more than four failed login attempts within two
minutes.” Misuse detection techniques in general are not effective against
novel attacks that have no matched rules or patterns yet. Anomaly detec-
tion (sub)systems, for example, the anomaly detector of IDES [Lunt et al.
1992], flag observed activities that deviate significantly from the estab-
lished normal usage profiles as anomalies, that is, possible intrusions. For
example, the normal profile of a user may contain the averaged frequencies
of some system commands used in his or her login sessions
A SYSTEMATIC FRAMEWORK
A basic premise for intrusion detection is that when audit mechanisms are
enabled to record system events, distinct evidence of legitimate activities
and intrusions will be manifested in the audit data. Because of the sheer
volume of audit data, both in the amount of audit records and in the
number of system features (i.e., the fields describing the audit records),
efficient and intelligent data analysis tools are required to discover the
behavior of system activities.
Data mining generally refers to the process of extracting useful models
from large stores of data [Fayyad et al. 1996]. The recent rapid develop-
ment in data mining has made available a wide variety of algorithms,
drawn from the fields of statistics, pattern recognition, machine learning,
and databases.
Association Rules
There is empirical evidence that program executions and user activities
exhibit frequent correlations among system features. For example, certain
privileged programs only access certain system files in specific directories
[Ko et al. 1994], programmers edit and compile C files frequently, and so
on. These consistent behavior patterns should be included in normal usage
profiles.
FEATURE CONSTRUCTION
We use the mined frequent episodes, which also contain associations among
the features, from audit records as guidelines to construct temporal statis-
tical features for building classification models. This process involves first
identifying the intrusion-only patterns, then parsing these patterns to
define features accordingly. In this section, we use network connection data
as an example to illustrate the feature construction process.
Raw tcpdump output is first summarized into network connection records
using preprocessing programs, where each record has a set of intrinsic
features. For example, the duration, service, src_host and dst_host (source
and destination hosts), src_port (source port), src_bytes and dst_bytes
(number of data bytes), a flag indicating normal or error status according to
the protocols, and so on, are intrinsic features of a single connection.
Experiments on BSM Data
The DARPA data also contains Solaris BSM (Basic Security Module)
[SunSoft 1995] audit data for a designated host, pascal. In this section, we
describe our experiments in building host-based intrusion detection models
using BSM data. The purpose of these experiments was to show that our
algorithms for pattern mining and feature construction are not specific to a
particular audit data source, for example, tcpdump. We also wanted to
investigate whether combining models from tcpdump and BSM can result
in better detection performance.