20-09-2012, 04:57 PM
Data Mining &Intrusion Detection
Data Mining &Intrusion.ppt (Size: 1.78 MB / Downloads: 155)
What is an intrusion?
An intrusion can be defined as “any set of actions that attempt to compromise the:
Integrity
confidentiality, or
availability
of a resource”.
Intrusion Detection System
combination of software and hardware that attempts to perform intrusion detection raises the alarm when possible intrusion happens.
IDS Categories
Intrusion detection systems are split into two groups:
Anomaly detection systems
Identify malicious traffic based on deviations from established normal network.
Misuse detection systems
Identify intrusions based on a known pattern (signatures) for the malicious activity.
Misuse Detection
Goal of Intrusion Detection Systems (IDS):
To detect an intrusion as it happens and be able to respond to it.
False positives:
A false positive is a situation where something abnormal (as defined by the IDS) happens, but it is not an intrusion.
Too many false positives
User will quit monitoring IDS because of noise.
False negatives:
A false negative is a situation where an intrusion is really happening, but IDS doesn't catch it.
Data Mining vs. KDD
Knowledge Discovery in Databases (KDD): The whole process of finding useful information and patterns in data
Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process
Data mining is the core of the knowledge discovery process
Why Can Data Mining Help?
Learn from traffic data
Supervised learning: learn precise models from past intrusions
Unsupervised learning: identify suspicious activities
Maintain models on dynamic data
Correlation of suspicious events across network sites
Helps detect sophisticated attacks not identifiable by single site analyses
Analysis of long term data (months/years)
Uncover suspicious stealth activities (e.g. insiders leaking/modifying information)
Intrusion Detection
Traditional intrusion detection system IDS tools (e.g. SNORT) are based on signatures of known attacks
Limitations
Signature database has to be manually revised for each new type of discovered intrusion
They cannot detect emerging cyber threats
Substantial latency in deployment of newly created signatures across the computer system
Frequent pattern mining
Patterns that occur frequently in a database
Mining Frequent patterns – finding regularities
Process of Mining Frequent patterns for intrusion detection
Phase I: mine a repository of normal frequent itemsets for attack-free data
Phase II: find frequent itemsets in the last n connections and compare the patterns to the normal profile
Sequential Pattern Analysis
Models sequence patterns
(Temporal) order is important in many situations
Time-series databases and sequence databases
Frequent patterns (frequent) sequential patterns
Sequential patterns for intrusion detection
Capture the signatures for attacks in a series of packets
Neural classification: HIDE
“A hierarchical network intrusion detection system using statistical processing and neural network classification” by Zheng et al.
Five major components
Probes collect traffic data
Event preprocessor preprocesses traffic data and feeds the statistical model
Statistical processor maintains a model for normal activities and generates vectors for new events
Neural network classifies the vectors of new events
Post processor generates reports
Clustering
Clustering Approaches
Partitioning algorithms
– Partition the objects into k clusters
– Iteratively reallocate objects to improve the clustering
Hierarchy algorithms
– Agglomerative: each object is a cluster, merge clusters to form larger ones
– Divisive: all objects are in a cluster, split it up into smaller clusters
Mining Data Streams for Intrusion Detection
Maintaining profiles of normal activities
The profiles of normal activities may drift
Identifying novel attacks
Identifying clusters and outliers in traffic data streams
Reduce the future alarm load by writing filtering rules that automatically discard well-understood false positives