13-05-2013, 04:27 PM
Generalization-Based Mining of Plan Databases by Divide-and-Conquer
Generalization-Based Mining.docx (Size: 487.46 KB / Downloads: 26)
INTRODUCTION
To show how generalization can play an important role in mining complex databases,
we examine a case of mining significant patterns of successful actions in a plan database
using a divide-and-conquer strategy.
A plan consists of a variable sequence of actions. A plan database, or simply a
planbase, is a large collection of plans. Plan mining is the task of mining significant
patterns or knowledge from a planbase. Plan mining can be used to discover travel
patterns of business passengers in an air flight database or to find significant patterns
from the sequences of actions in the repair of automobiles. Plan mining is different
from sequential pattern mining, where a large number of frequently occurring
sequences are mined at a very detailed level. Instead, plan mining is the extraction
of important or significant generalized (sequential) patterns from a planbase.
Let’s examine the plan mining process using an air travel example.
Example 10.4 An air flight planbase. Suppose that the air travel planbase shown in Table 10.1 stores
customer flight sequences, where each record corresponds to an action in a sequential
database, and a sequence of records sharing the same plan number is considered as one
plan with a sequence of actions. The columns departure and arrival specify the codes of
the airports involved. Table 10.2 stores information about each airport.
There could be many patterns mined from a planbase like Table 10.1. For example,
we may discover that most flights fromcities in the Atlantic United States toMidwestern
cities have a stopover at ORD in Chicago, which could be because ORD is the principal
hub for several major airlines. Notice that the airports that act as airline hubs (such
as LAX in Los Angeles, ORD in Chicago, and JFK in New York) can easily be derived
from Table 10.2 based on airport size. However, there could be hundreds of hubs in a
travel database. Indiscriminate mining may result in a large number of “rules” that lack
substantial support, without providing a clear overall picture.
SEQUENTIAL DATA MINING:
Sequence mining is a topic of data mining concerned with finding statistically relevant patterns between data examples where the values are delivered in a sequence.[1] It is usually presumed that the values are discrete, and thus time series mining is closely related, but usually considered a different activity. Sequence mining is a special case of structured data mining.
There are several key traditional computational problems addressed within this field. These include building efficient databases and indexes for sequence information, extracting the frequently occurring patterns, comparing sequences for similarity, and recovering missing sequence members. In general, sequence mining problems can be classified as string mining which is typically based on string processing algorithms and itemset mining which is typically based on association rule learning.