Detecting large-scale system problems by mining console logs

**project girl** · 01-02-2013, 04:38 PM

Detecting large-scale system problems by mining console logs

ABSTRACT

Today’s large-scale Internet services run in large server clusters in data centers and cloud computing environments. These system architectures enable highly scalable Internet services at a relatively low cost. However,detecting and diagnosing problems in such systems bring new challenges for both system developers and operators. One significant problem is that as the system scales, the amount of information operators need to process goes far beyond the level that can be handled manually, and thus there is a huge demand for automatic processing of monitoring data. Much work has been done on automatic problem detection and diagnosis in such systems.
Researchers and operators have been using all kinds of monitoring data, from the simplest numerical metrics such as resource utilization counts (Lakhina et al.,2004; Cohen et al., 2005; Bodik et al., 2010) to system events (Hellerstein et al., 2002; Ma & Hellerstein, 2001) to more detailed tracing such as execution paths (Chen et al., 2002; Chen & Brewer, 2004). However, console logs, the debugging information built into almost every piece of software, are rarely studied by either operators or the research community. Since the dawn of programming, developers have used everything from printf to complex logging and monitoring libraries (Fonseca et al., 2007; Gulcu, 2002) to record program variable values, trace execution, report runtime statistics, and even printing out full-sentence messages designed to be read by a human—usually by the developer.
However, modern large-scale services usually combine large open-source components authored by hundreds of developers, and the people scouring the logs—part integrator, part developer, part operator, and charged with fixing the problem are usually not the people who chose what to log or why.
Furthermore, even in well-tested code, many operational problems are dependent on the deployment and runtime environment and cannot be easily reproduced by the developer. Thus, it is unavoidable that people other than the original developers need to source logs from time to time when diagnosing problems.
Our goal is to provide them with better tools to extract value from the console logs. As logs are too large to examine manually and too unstructured to analyze automatically, operators typically create ad hoc scripts to search for keywords such as “error” or “critical,” but this has been shown to be insufficient for determining problems (Jiang et al., 2009; Oliner & Stearley, 2007). Rule-based processing (Prewett, 2003) is an improvement, but the operators’ lack of detailed knowledge about specific components and their interactions makes it difficult to write rules that pick out the most relevant sets of events for problem detection.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Data Mining: What is Data Mining? Report	project girl	1	2,262	21-09-2017, 11:47 AM Last Post: jaseela123
	INCREMENTAL MINING USING FREQUENT PATTERN TREE	project topics	1	10,061,816	13-09-2017, 09:40 AM Last Post: jaseela123
	Problems and solutoions to WiFi security	seminar addict	1	14,828	02-09-2017, 09:41 AM Last Post: jaseela123
	Report on Data Mining Technique	study tips	1	986	31-08-2017, 12:45 PM Last Post: jaseela123
	A visual computing environment for very large scale biomolecularmodeling	computer science crazy	0	10,949,947	25-08-2017, 09:32 PM Last Post: computer science crazy
	Seminar Report On ASSOCIATION MINING	Computer Science Clay	0	13,214,371	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	SEMIANR REPORT ON MapReduce: Simplified Data Processing On Large Clusters	super	0	10,340,510	25-08-2017, 09:32 PM Last Post: super
	ATOMIC SCALE MEMORY AT A SILICON SURFACE	seminar projects crazy	0	15,271,370	25-08-2017, 09:32 PM Last Post: seminar projects crazy
	Seminar Report On SCALE-FREE NETWORK	Computer Science Clay	0	16,956,049	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	Fast Query Point Movement Techniques for Large CBIR Systems	electronics seminars	0	10,100,175	25-08-2017, 09:32 PM Last Post: electronics seminars

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.