04-08-2012, 01:16 PM
CORRELATION DATABASE MANAGEMENT SYSTEM
1CORRELATION DATABASE.ppt (Size: 726.5 KB / Downloads: 26)
CORRELATION DATABASE
A Correlation database is a database management system(DBMS) that is data model independent and designed to efficiently handle unplanned queries in an analytical system environment. It was developed in 2005 by database architect Joseph Foley, whose background includes more than 30 years in data warehousing and business intelligence research and development work across a variety of industries.
Unlike relational database management systems , which use a records-based storage approach, or column oriented database system which use a column-based storage method, a correlation database uses a Value-Based storage (VBS) architecture in which each unique data value is stored only once and an auto-generated indexing system maintains the context for all values.
STRUCTURE
Because a correlation DBMS stores each unique data value only once, the physical database size is significantly smaller than relational or column-oriented databases, without the use of data compression techniques. Above approximately 30GB, a correlation DBMS may become smaller than the raw data set.
The VBS model used by a CDBMS consists of three primary physical sets of objects that are stored and managed:
data dictionary (metadata);
an indexing and linking data set (additional metadata); and
the actual data values that comprise the stored information.
In the VBS model, each unique value in the raw data is stored only once; therefore, the data is always normalized at the level of unique values. This eliminates the need to normalize data sets in the logical schema.
Data values are stored together in ordered sets based on data types: all integers in one set, characters in another, etc. This optimizes the data handling processes that access the values.
Storage in CDBMS
In the VBS structure used in a CDBMS, each unique value is stored once and given an abstract (numeric) identifier, regardless of the number of occurrences or locations in the original data set. The original dataset is then constructed by referencing those logical identifiers. The correlation index may resemble the storage below. Note that the value "MN" which occurs multiple times in the data above is only included once. As the amount of repeat data grows, this benefit multiplies.
This correlation process is a form of database normalization. Just as one can achieve some benefits of column-oriented storage within an RDBMS, so too can one achieve some benefits of the correlation database through database normalization. However, in a traditional RDBMS this normalization process requires work in the form of table configuration, stored procedures, and SQL statements. We say that a database is a correlation database when it naturally expresses a fully normalized schema without this extra configuration. As a result, a correlation database may have more focused optimizations for this fully normalized structure.
This correlation process is similar to what occurs in a text-search oriented Inverted index.
Advantages
For analytical data warehouse applications, a CDBMS has several advantages over alternative database structures.
Because the database engine itself indexes all data and auto-generates its own schema on the fly while loading, it can be implemented quickly and is easy to update. There is no need for physical pre-design and no need to ever restructure the database.
A CDBMS enables creation and execution of complex queries such as associative queries (show everything that is related to x) that are difficult if not impossible to model in SQL.
The primary advantage of the CDBMS is that it is optimized for executing
ad hoc queries - queries not anticipated during the data warehouse design phase.
Disadvantages
A CDBMS has two drawbacks in comparison to database alternatives.
Unlike relational databases, which can be used in a wide variety of applications, a correlation database is designed specifically for analytical applications and does not provide transaction management features; it cannot be used for transactional processing.
Because it indexes all data during the load process, the physical load speed of a CDBMS is slower than relational or column-oriented structures.
However, because it eliminates the need for logical or physical pre-design, the overall "time to use" of a CDBMS is generally similar to or somewhat faster than alternative structures.
Conclusion
Correlation database is more suited to analytical database but lacks support for the transactional database need.
Its storage structure has advantage over traditional storage structure of RDBMS and Columnar DBMS.
It required disk space and hence less memory and less physical and logical I/O leading an increase in the performance