Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

**project girl** · 13-11-2012, 02:07 PM

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

ABSTRACT

Preparing a data set for analysis is generally the most
time consuming task in a data mining project, requiring
many complex SQL queries, joining tables, and
aggregating columns. Existing SQL aggregations have
limitations to prepare data sets because they return one
column per aggregated group. In general, a significant
manual effort is required to build data sets, where a
horizontal layout is required. We propose simple, yet
powerful, methods to generate SQL code to return
aggregated columns in a horizontal tabular layout,
returning a set of numbers instead of one number per
row. This new class of functions is called horizontal
aggregations. Horizontal aggregations build data sets
with a horizontal denormalized layout (e.g., pointdimension,
observation variable, instance-feature), which
is the standard layout required by most data mining
algorithms. We propose three fundamental methods to
evaluate horizontal aggregations: CASE: Exploiting the
programming CASE construct; SPJ: Based on standard
relational algebra operators (SPJ queries); PIVOT: Using
the PIVOT operator, which is offered by some DBMSs.
Experiments with large tables compare the proposed
query evaluation methods. Our CASE method has similar
speed to the PIVOT operator and it is much faster than
the SPJ method. In general, the CASE and PIVOT
methods exhibit linear scalability, whereas the SPJ
method does not.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	DESIGN AND PERFORMANCE ANALYSIS OF OPTICAL CDMA SYSTEM USING NEWLY DESIGNED MULTIWAVE	project girl	1	1,270	15-09-2017, 01:34 PM Last Post: jaseela123
	A Study on Comparative Analysis of Risk and Return with reference to Selected stocks	project maker	1	767	14-09-2017, 10:03 AM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.