20-03-2014, 12:22 PM
DATA MINING PROCESS
DATA MINING.pdf (Size: 759.57 KB / Downloads: 18)
INTRODUCTION
Dramatic advances in data capture, processing power, data transmission, and
storage capabilities are enabling organizations to integrate their various databases
into data warehouses. Data warehousing is defined as a process of centralized data
management and retrieval. Data warehousing, like data mining, is a relatively new
term although the concept itself has been around for years. Data warehousing
represents an ideal vision of maintaining a central repository of all organizational
data. Centralization of data is needed to maximize user access and analysis.
Dramatic technological advances are making this vision a reality for many
companies. And, equally dramatic advances in data analysis software are allowing
users to access this data freely. The data analysis software is what supports data
mining.
Necessity Of Data Mining
Data mining is primarily used today by companies with a strong consumer focus -
retail, financial, communication, and marketing organizations. It enables these
companies to determine relationships among "internal" factors such as price,
product positioning, or staff skills, and "external" factors such as economic
indicators, competition, and customer demographics. And, it enables them to
determine the impact on sales, customer satisfaction, and corporate profits. Finally,
it enables them to "drill down" into summary information to view detail
transactional data.
STATISTICA TOOL:
STATISTICA is a statistics and analytics software package developed by StatSoft .
STATISTICA provides data analysis, data management, data mining, and data
visualization procedures. STATISTICA product categories include Enterprise (for use
across a site or organization), Web-Based (for use with a server and web browser),
Concurrent Network Desktop, and Single-User Desktop.
Different packages of analytical techniques are available in six product lines:
(1) Desktop , (2) Data Mining , (3) Enterprise , (4) Web-Based , (5) Connectivity
and Data Integration Solutions, and (6) Power Solutions .
DATA SET
A data set (or dataset) is a collection of data, usually presented in tabular form.
Each column represents a particular variable. Each row corresponds to a given
member of the data set in question. It lists values for each of the variables, such as
height and weight of an object. Each value is known as a datum. The data set may
comprise data for one or more members, corresponding to the number of rows.
Non tabular data sets can take the form of marked up strings of characters, such as
an XML file.
DATA DUPLICATION
In computing , data duplication is a specialized data compression technique for eliminating
coarse-grained redundant data. The technique is used to improve storage utilization and can
also be applied to network data transfers to reduce the number of bytes that must be sent
across a link. In the duplication process, unique chunks of data, or byte patterns, are
identified and stored during a process of analysis. As the analysis continues, other chunks are
compared to the stored copy and whenever a match occurs, the redundant chunk is replaced
with a small reference that points to the stored chunk.