14-06-2013, 02:53 PM
Optimization of Horizontal Aggregation in SQL
Optimization of Horizontal.docx (Size: 13.7 KB / Downloads: 20)
Introduction
To analyze data efficiently, Data mining systems are widely using datasets with columns in horizontal tabular layout. Preparing a data set is more complex task in a data mining project, requires many SQL queries, joining tables and aggregating columns.
Conventional RDBMS usually manage tables with vertical form. Aggregated columns in a horizontal tabular layout returns set of numbers, instead of one number per row. The system uses one parent table and different child tables, operations are then performed on the data loaded from multiple tables.
PIVOT operator, offered by RDBMS is used to calculate aggregate operations. PIVOT method is much faster method and offers much scalability. Partitioning large set of data, obtained from the result of horizontal aggregation, in to homogeneous cluster is important task in this system. K-means algorithm using SQL is best suited for implementing this operation.
Overview of Application:
• Horizontal aggregation is new class of function to return aggregated columns in a horizontal layout. Most algorithms require datasets with horizontal layout as input with several records and one variable or dimensions per columns.
• Managing large data sets without DBMS support can be a difficult task. Trying different subsets of data points and dimensions is more flexible, faster and easier to do inside a relational database with SQL queries than outside with alternative tool.
• The advantage of horizontal aggregation is, it represents a template to generate SQL code from a data mining tool. This SQL code reduces manual work in the data preparation phase in data mining related project.
• K-means clustering algorithms are used to cluster the attribute, that attribute is the result of horizontal aggregation.