21-05-2012, 01:21 PM
Detecting Network Intrusions through the Data Mining of Network Packet Data Using the ACT Algorithm
Detecting Network Intrusions through the.docx (Size: 20.77 KB / Downloads: 38)
Introduction
Data mining is the automatic extraction of patterns, associations, and anomalies from large data sets. An interesting application is the use of data mining to detect network intrusions through the analysis of network packet data or computer audit data. With the rise of the internet this timely detection of intrusions and attacks has grown in importance.
A common technique in data mining is to use classification and regression trees (CART) [Breiman, 1984]. A naïve application of CART and related techniques did not prove successful at detecting network. This seems to be due to several factors, the most important of which are:
1. Network intrusions are very rare and detecting rare events is much harder than detecting more common events.
2. The size of the data analyzed was very large. Sampling the data reduced the performance our system, while working with the entire data set requires some sort of data management facility. Growing a single tree on the entire data was problematic.
Averaging Classification Trees (ACT)
In this section, we briefly review the ACT and ART algorithms, following [Grossman, 1997b]. Here is the basic set up:
1. We are given a collection of objects x1, …, xn with attributes xj[1], …, xj[m].
2. The algorithm builds a finite, rooted tree, each node of which, except for the leaves, is labeled with a decision involving one or more attributes. In the simplest case, a binary tree is built by labeling each node with a threshold comparison of the form: is attribute j < a threshold c? If so, the object is assigned to the left child, if not to the right child. Initially all the objects are assigned to the root of the tree.
Detecting Intrusions Using ACT
In this section, we follow [Bodek, 1997] and outline the approach we used for detecting network intrusions using ACT:
1. Instead of using ACT to model individual packets directly, we used ACT to model windows containing 100 packets. We broke each packet into 17 attributes, so in essence we modeled the network using attributes of length 1700.
2. We collected network packet data of the network of interest during a "normal" period. That is a period without network intrusions. Intrusions are modeled simply as statistical deviations from the "normal" behavior of the network.
3. Next, we defined a cover for the data set and then used the ACT algorithm to average the collection of trees built from the cover.
4. As a final step, certain anomalous behavior was filtered out since it represented normal, but unusual behavior of the system. For example, network traffic created by the network monitor's use of the network file system.
Experimental Study
In this section, we follow [Bodek, 1997] and describe an experimental study of our algorithm for network intrusions.
For this study, we used a parallel implementation of the ACT algorithm available in Version 1.1 of the PATTERN™ Data Mining System developed by Magnify, Inc. We performed the study during the Supercomputing 96 conference, which took place in Pittsburgh from November 16 to 23. During the conference, the Magnify network was monitored continuously for network intrusions and the results were displayed in real time on the floor of the conference.