A Lightweight Algorithm for Message Type Extraction in System Application Logs pdf

**project girl** · 31-01-2013, 04:34 PM

A Lightweight Algorithm for Message Type Extraction in System Application Logs

.pdf

1A Lightweight Algorithm.pdf (Size: 868.87 KB / Downloads: 159)

Abstract

Message type or message cluster extraction is an important task in the analysis of system logs in computer
networks. Defining these message types automatically facilitates the automatic analysis of system logs. When the
message types that exist in a log file are represented explicitly, they can form the basis for carrying out other automatic
application log analysis tasks. In this paper, we introduce a novel algorithm for carrying out message type extraction
from event log files. IPLoM, which stands for Iterative Partitioning Log Mining, works through a 4-step process. The
first 3 steps hierarchically partition the event log into groups of event log messages or event clusters. In its 4th and
final stage, IPLoM produces a message type description or line format for each of the message clusters. IPLoM is
able to find clusters in data irrespective of the frequency of its instances in the data, it scales gracefully in the case
of long message type patterns and produces message type descriptions at a level of abstraction, which is preferred
by a human observer. Evaluations show that IPLoM outperforms similar algorithms statistically significantly.

INTRODUCTION

This The goal of autonomic computing as espoused by IBM’s senior vice president of research, Paul Horn in
March 2001 can be defined as the goal of building self-managing computing systems [1]. The four key concepts of
self-management in autonomic computing are self-configuration, self-optimization, self-healing and self-protection.
Given the increasing complexity of computing infrastructure which is stretching to its limits the human capability
to manage it, the goal of autonomic computing is a desirable one. However, it is a long term goal, which must first
start with the building of computing systems, which can automatically gather and analyze information about their
states to support decisions made by human administrators [1].
Event logs generated by applications that run on a system consist of independent lines of text data, which contain
information that pertains to events that occur within a system. This makes them an important source of information
to system administrators in fault management and for intrusion detection and prevention. With regard to autonomic
systems, these two tasks are important cornerstones for self-healing and self-protection, respectively. Therefore as
we move toward the goal of building systems that are capable of self-healing and self-protection, an important step
would be to build systems that are capable of automatically analyzing the contents of their log files, in addition to
measured system metrics [2], [3], to provide useful information to the system administrators.

Previous Work

Data clustering as a technique in data mining or machine learning is a process whereby entities are sorted into
groups called clusters, where members of each cluster are similar to each other and dissimilar from members
of other groups. Clustering can be useful in the interpretation and classification of datasets too large to analyze
manually. Clustering therefore can be a useful first step in the automatic analysis of event logs.

The IPLoM Algorithm

The IPLoM algorithm is designed as a log data clustering algorithm. It works by iteratively partitioning a set of
log messages used as training exemplars. At each step of the partitioning process the resultant partitions come closer
to containing only log messages which are produced by the same line format. At the end of the partitioning process
the algorithm attempts to discover the line formats that produced the lines in each partition. These discovered
partitions and line formats are the output of the algorithm.

RESULTS

Our goal in the design of IPLoM was threefold. The first was to design an algorithm that is able to find all
message types that may exist in a given log file. The second was to give every message type an equal chance of
being found irrespective of the frequency of its instances in the data. Our third was to design an algorithm that will
produce message types at an abstraction level preferred by a human observer. We therefore begin our discussion in
this section by first describing the setup of our experiments in Section IV-A and then providing results that show
how these goals have been met using a default scenario for running the algorithm i.e. when we want to find all
message types in Section IV-B. We also provide results on resource consumption (CPU and Memory) for the SLCT,
Loghound and IPLoM in Section IV-B. In sections IV-C and IV-D we show how varying the line support threshold
(FST) using low absolute counts and percentage values respectively affects the results of the algorithms. SLCT,
Loghound and Teiresias need a line support threshold to produce clusters, while IPLoM does not. For SLCT and
Loghound this support value can be specified either as a percentage of the number of events in the event log or as an
absolute value. For this reason we run two sets of experiments using support values specified as percentages and as
absolute values.

CONCLUSION AND FUTURE WORK

Due to the size and complexity of sources of information used by system administrators in fault management,
it has become imperative to find ways to manage these sources of information automatically. Application logs are
one such source. We present our work on designing a novel algorithm for message type extraction from log files,
IPLoM. So far there is no standard approach to tackling this problem in the literature [9]. Message types are semantic
groupings of system log messages.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Internetworking connectionless and connection-oriented networks pdf	project girl	1	1,151	13-09-2017, 11:03 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.