Data Stream Intrusion Alert Aggregation for Distributed Heterogeneous Sources

**seminar flower** · 05-10-2012, 02:56 PM

Data Stream Intrusion Alert Aggregation for Distributed Heterogeneous Sources

.pdf

online intrusion base paper.pdf (Size: 114.94 KB / Downloads: 33)

Abstract

The proposal of this work present an efficient intrusion alert aggregation strategy
for distributed heterogeneous sources. The primary objective is to generate meta-alerts
using probabilistic technique with offline and online alert aggregation. The proposed
approach has the distinct properties i.e., a generative modeling approach using probabilistic
methods. Assume that attack instances can be regarded as random processes producing
alerts. Model these processes using approximate maximum likelihood parameter estimation
techniques. Thus, the beginning as well as the completion of attack instances can be
detected. It is a data stream approach, i.e., each observed alert is processed only a few
times. Thus, it can be applied on-line and under harsh timing constraints.
The main objective is to identify and to cluster different alert produced by low-level
intrusion detection systems, firewalls, etc. belonging to a specific attack instance which has
been initiated by an attacker at a certain point in time. Thus, meta-alerts can be generated
for the clusters that contain all the relevant information whereas the amount of data (i.e.,
alerts) can be reduced substantially. Meta-alerts are the basis for reporting to security
experts or for communication within a distributed intrusion detection system. With
benchmark data sets the proposal of this work demonstrate that it is possible to achieve
reduction rates of up to 97% while the number of missing meta-alerts is extremely low. In
addition, meta-alerts are generated with a delay of typically only a few seconds after
observing the first alert belonging to a new attack instance.

Introduction

IDS usually focus on detecting attack types, but not on distinguishing between different attack
instances. In addition, even low rates of false alerts could easily result in a high total number of false
alerts if thousands of network packets or log file entries are inspected. As a consequence, the IDS create many alerts at a low level of abstraction. It is extremely difficult for a human security expert to
inspect this flood of alerts, and decisions that follow from single alerts might be wrong with a
relatively high probability. Alerts may originate from low-level IDS such as those mentioned above,
from firewalls, etc. Alerts that belong to one attack instance must be clustered together and meta-alerts
must be generated for these clusters. It is a generative modeling approach using probabilistic methods.
Assuming that attack instances can be regarded as random processes “producing” alerts.

Problem Definition for on Line Intrusion Attacks

A “perfect” IDS should be situation aware in the sense that at any point in time it should “know” what
is going on in its environment regarding attack instances (of various types) and attackers. In this article
we make an important step towards this goal by introducing and evaluating a new technique for alert
aggregation. Alerts may originate from low-level IDS such as those mentioned above, from firewalls,
etc. Alerts that belong to one attack instance must be clustered together and meta-alerts must be
generated for these clusters. The main goal is to reduce the amount of alerts substantially without
loosing any important information which is necessary to identify ongoing attack instances. We want to
have no missing meta-alerts, but in turn we accept false or redundant meta-alerts to a certain degree.
This problem is not new, but current solutions are typically based on a quite simple sorting of alerts,
e.g., according to their source, destination, and attack type. Under real conditions such as the presence
of classification errors of the low-level IDS (e.g., false alerts), uncertainty with respect to the source of
the attack due to spoofed IP addresses, or wrongly adjusted time windows, for instance, such an
approach fails quite often.

Review of Literature

Most existing IDS are optimized to detect attacks with high accuracy. However, they still have various
disadvantages that have been outlined in a number of publications and a lot of work has been done to
analyze IDS in order to direct future research [5]. One step in the presented correlation approach is
attack thread reconstruction, which can be seen as a kind of attack instance recognition. No clustering
algorithm is used, but a strict sorting of alerts within a temporal window of fixed length according to
the source, destination, and attack classification (attack type). In [7], a similar approach is used to
eliminate duplicates, i.e., alerts that share the same quadruple of source and destination address as well
as source and destination port. In addition, alerts are aggregated (online) into pre-defined clusters (socalled
situations) in order to provide a more condensed view of the current attack situation. The
definition of such situations is also used in [8] to cluster alerts. In [9], alert clustering is used to group
alerts that belong to the same attack occurrence.

Alert Generation

Comments on the information contained in alerts, the objects aggregated, and on their format. Sensors
determine the values of attributes used as input for the detectors and for alert clustering. Attributes in
an event that are independent of a particular attack instance are used for classification. Attributes are
(or might be) dependent on the attack instance used in an alert aggregation process to distinguish
different attack instances. Clearly dependent such as the source IP address which can identify the
attacker. Clearly independent such as the destination port which usually is 80 in case of web based
attacks. Both such as the destination port which can be a hint to the attacker’s actual target service as
well as an attack tool specifically designed to target a particular service only

Offline Alert Aggregation

Off-line algorithm for alert aggregation will be extended to a data stream algorithm for on-line
aggregation one or several attackers launch several attack instances belonging to various attack types.
The attack instances each cause a number of alerts with various attribute values. The task of the alert
aggregation is to estimate the assignment to instances by using the unlabeled observations only and by
analyzing the cluster structure in the attribute space. Reconstruct the attack situation. Meta-alerts
generated are basically an abstract description of the cluster of alerts assumed to originate from one
attack instance. The amount of data is reduced substantially without loosing important information.
Different potentially problematic situations: False alerts are not recognized as such and wrongly
assigned to clusters. Acceptable as long as the number of false alerts is comparably low. True alerts are
wrongly assigned to clusters (not really problematic as long as the majority of alerts belonging to that
cluster is correctly assigned, no attack instance is missed).

Conclusion

Intrusion alert aggregates varies subtask of intrusion detection. Different alerts produced by low-level
intrusion detection systems, firewalls, etc is evaluated to make the system more foolproof. The project
identifies and cluster intrusion alerts to make the segregation of various attacks being generated. To
improve the efficacy of intrusion detection system, meta-alerts are generated which contain all the
relevant information The experiments demonstrated the broad applicability of the proposed on-line
alert aggregation approach. The simulation conducted for two different data sets and showed that
machine learning based detectors, conventional signature based detectors, and even firewalls can be
used as alert generators. In all cases, the amount of data could be reduced substantially. Although there
are situations as described in clusters that are wrongly split the instance detection rate, none or only
very few attack instances were missed. Run-time and component creation delay are well-suited for an
on line application.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	SECRET DATA HIDING IN IMAGE USING ENCRYPTION AND DECRYPTION KEY PPT	study tips	1	983	09-09-2017, 10:07 AM Last Post: jaseela123
	Analysis of an Anomaly-based Intrusion Detection System for Wireless Sensor Networks	study tips	1	1,049	06-09-2017, 01:16 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.