08-05-2012, 03:06 PM
Data Mining for Network Intrusion Detection: How to Get Started
10.1.1.102.8556.pdf (Size: 52.21 KB / Downloads: 62)
Network Intrusion Detection: What is it?
Intrusion detection starts with instrumentation of a computer network for data collection.
Pattern-based software ‘sensors’ monitor the network traffic and raise ‘alarms’ when the traffic
matches a saved pattern. Security analysts decide whether these alarms indicate an event serious
enough to warrant a response. A response might be to shut down a part of the network, to phone
the internet service provider associated with suspicious traffic, or to simply make note of unusual
traffic for future reference.
If the network is small and signatures are kept up to date, the human analyst solution to
intrusion detection works well. But when organizations have a large, complex network the
human analysts quickly become overwhelmed by the number of alarms they need to review. The
sensors on the MITRE network, for example, currently generate over one million alarms per day.
And that number is increasing. This situation arises from ever increasing attacks on the network,
as well as a tendency for sensor patterns to be insufficiently selective (i.e., raise too many false
alarms). Commercial tools typically do not provide an enterprise level view of alarms generated
by multiple sensor vendors. Commercial intrusion detection software packages tend to be
signature-oriented with little or no state information maintained. These limitations led us to
investigate the application of data mining to this problem.
Intrusion Detection before Data Mining
When we first began to do intrusion detection on our network, we didn’t focus on data
mining, but rather on more fundamental issues: How would the sensors perform? How much data
would we get? How would we display the data? What kind of data did we want to see, and what
queries would be best to highlight that data? Next, as the data came in, sensor tuning, incident
investigation, and system performance commanded our attention. The analyst team grew to
handle the load, and training and team coordination were the issues of the day. But the level of
reconnaissance and attack on the internet was constantly increasing, along with the amount of
data we were collecting and putting in front of our analysts. We began to suspect that our system
was inadequate for detecting the most dangerous attacks—those performed by adversaries using
attacks that are new, stealthy, or both. So we considered data mining with two questions in mind:
Σ Can we develop a way to minimize what the analysts need to look at daily?
Σ Can data mining help us find attacks that the sensors and analysts did not find?
Data Mining: What is it?
Data mining is, at its core, pattern finding. Data miners are experts at using specialized
software to find regularities (and irregularities) in large data sets. Here are a few specific things
that data mining might contribute to an intrusion detection project:
Σ Remove normal activity from alarm data to allow analysts to focus on real attacks
Σ Identify false alarm generators and “bad” sensor signatures
Σ Find anomalous activity that uncovers a real attack
Σ Identify long, ongoing patterns (different IP address, same activity)
To accomplish these tasks, data miners use one or more of the following techniques:
Σ Data summarization with statistics, including finding outliers
Σ Visualization: presenting a graphical summary of the data
Σ Clustering of the data into natural categories [Manganaris et al., 2000]
Σ Association rule discovery: defining normal activity and enabling the discovery of
anomalies [Clifton and Gengo, 2000; Barbara et al., 2001]
Σ Classification: predicting the category to which a particular record belongs [Lee and
Stolfo, 1998]
Start by Making Your Requirements Realistic
The seductive vision of automation is that it can and will solve all your problems, making
human involvement unnecessary. This is a mirage in intrusion detection. Human analysts will
always be needed to monitor that the automated system is performing as desired, to identify new
categories of attacks, and to analyze the more sophisticated attacks. In our case, our primary
concern was relieving the analyst’s day to day burden.
Real-time automated response is very desirable in some intrusion detection contexts. But
this puts a large demand on database performance. The database must be fast enough to record
alarms and produce query results simultaneously. Real time scoring of anomaly, or classification
models is possible, but this should not be confused with real-time model building. There is
research in this area [Domingos and Hulten, 2000], but data mining is not currently capable of
learning from large amounts real-time, dynamically changing data. It is better suited to batch
processing of a number of collected records. Therefore, we adopted a daily processing regime,
rather than an hourly or minute-by-minute scheme.
Choose a Broad and Capable Project Staff
Your staff will need skills in three areas: network security, data mining, and database
application development.
Σ Of course the security staff need a solid grounding in networking and intrusion detection,
but they also need to be able to tackle big, abstract problems.