Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis Project

**seminar code** · 13-09-2014, 01:55 PM

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis Project Report

.pdf

Horizontal Aggregations.pdf (Size: 265.16 KB / Downloads: 19)

Abstract

Preparing a data set for analysis is generally the
most time consuming task in a data mining project, requiring
many complex SQL queries, joining tables and aggregating
columns. Existing SQL aggregations have limitations to prepare
data sets because they return one column per aggregated
group. In general, a significant manual effort is required to
build data sets, where a horizontal layout is required. We
propose simple, yet powerful, methods to generate SQL code
to return aggregated columns in a horizontal tabular layout,
returning a set of numbers instead of one number per row.
This new class of functions is called horizontal aggregations.
Horizontal aggregations build data sets with a horizontal
denormalized layout (e.g. point-dimension, observation-variable,
instance-feature), which is the standard layout required by
most data mining algorithms. We propose three fundamental
methods to evaluate horizontal aggregations: CASE: Exploiting
the programming CASE construct; SPJ: Based on standard
relational algebra operators (SPJ queries); PIVOT: Using the
PIVOT operator, which is offered by some DBMSs. Experiments
with large tables compare the proposed query evaluation
methods. Our CASE method has similar speed to the PIVOT
operator and it is much faster than the SPJ method. In general,
the CASE and PIVOT methods exhibit linear scalability, whereas
the SPJ method does not.

Advantages

Our proposed horizontal aggregations provide several
unique features and advantages. First, they represent a template
to generate SQL code from a data mining tool. Such SQL code
automates writing SQL queries, optimizing them and testing
them for correctness. This SQL code reduces manual work in
the data preparation phase in a data mining project. Second,
since SQL code is automatically generated it is likely to be
more efficient than SQL code written by an end user. For
instance, a person who does not know SQL well or someone
who is not familiar with the database schema (e.g. a data
mining practitioner). Therefore, data sets can be created in
less time. Third, the data set can be created entirely inside
the DBMS. In modern database environments it is common
to export denormalized data sets to be further cleaned and
transformed outside a DBMS in external tools (e.g. statistical
packages). Unfortunately, exporting large tables outside a
DBMS is slow, creates inconsistent copies of the same data and
compromises database security. Therefore, we provide a more
efficient, better integrated and more secure solution compared
to external data mining tools. Horizontal aggregations just
require a small syntax extension to aggregate functions called
in a SELECT statement. Alternatively, horizontal aggregations
can be used to generate SQL code from a data mining tool to
build data sets for data mining analysis

HORIZONTAL AGGREGATIONS

We introduce a new class of aggregations that have similar
behavior to SQL standard aggregations, but which produce
tables with a horizontal layout. In contrast, we call standard
SQL aggregations vertical aggregations since they produce
tables with a vertical layout. Horizontal aggregations just
require a small syntax extension to aggregate functions called
in a SELECT statement. Alternatively, horizontal aggregations
can be used to generate SQL code from a data mining tool to
build data sets for data mining analysis. We start by explaining
how to automatically generate SQL code

Proposed Syntax in Extended SQL

We now turn our attention to a small syntax extension to the
SELECT statement, which allows understanding our proposal
in an intuitive manner. We must point out the proposed
extension represents non-standard SQL because the columns
in the output table are not known when the query is parsed. We
assume F does not change while a horizontal aggregation is
evaluated because new values may create new result columns.
Conceptually, we extend standard SQL aggregate functions
with a “transposing” BY clause followed by a list of columns
(i.e. R1, . . . , Rk), to produce a horizontal set of numbers
instead of one number. Our proposed syntax is as follows

CONCLUSIONS

We introduced a new class of extended aggregate functions,
called horizontal aggregations which help preparing data sets
for data mining and OLAP cube exploration. Specifically,
horizontal aggregations are useful to create data sets with
a horizontal layout, as commonly required by data mining
algorithms and OLAP cross-tabulation. Basically, a horizontal
aggregation returns a set of numbers instead of a single number
for each group, resembling a multi-dimensional vector. We
proposed an abstract, but minimal

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	SURA Project Monitors Report	seminar projects maker	1	1,177	21-09-2017, 01:30 PM Last Post: jaseela123
	THREE DIMENSIONAL PASSWORD FOR MORE SECURE AUTHENTICATION A MAIN PROJECT REPORT	study tips	1	1,300	21-09-2017, 12:56 PM Last Post: jaseela123
	CONSUMER PERCEPTION TOWARDS ONLINE GROCERY STORES” PROJECT REPORT	project maker	1	1,966	21-09-2017, 12:41 PM Last Post: jaseela123
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.