03-05-2014, 12:12 PM
Ontological OLAP for Integrating Energy Sensor Data
Abstract:
In this paper we propose an ontological OLAP framework to integrate distributed energy sensor data.
The OLAP data cube in the framework annotated with semantics with other supporting mechanisms
can deal with the issues of the schema inconsistency that may result from integration of heterogeneous
data sources. The proposed approach provides a way of storing, reusing and composing OLAP cubes in
order to increase system usability. A prototype of the proposed framework based on a number of
existing tools such as protégé, Jess, and FuzzyJ has been developed to demonstrate its feasibility.
Introduction
Global warming is an important and serious issue faced by the modern world today. One of the main
contributing factors to global warming is emission of CO2. Increased industrialisation and our life
style of relaying heavily on increased number of household appliances and devices have drastically
increased the energy consumption per capita. This energy consumption is directly linked to the
emission of CO2. Initiatives like United Nation sponsored Kyoto Protocol [60] which legally binds the
member states to reduce greenhouse gases and European Climate and Energy Package agreed by EU to
reduce greenhouse emission in EU by 20% by 2020 [61] came into existence aiming at addressing the
issue of global warming.
These initiatives led to explore methods of reducing domestic energy waste and promoting efficient
use of energy. One of such methods is to put consumer in control of their energy usage by allowing
them to monitor their energy consumption in real time. Various energy monitoring devices started to
find their way into market in order to implement this method.
Data Warehouse
Data warehouse can be generally described as a decision-support tool that collects its data from
operational databases and various external sources, transforms them into information and makes that
information available to decision-makers in a consolidated and consistent manner [1, 2]. A data
warehouse organizes, analyzes and synthesizes huge amounts of raw data - so that it is intelligible and
useful to the users [3]. A modern data warehouse is a collection of information stored in a way that
improves access to data [4, 5, 6, 7, 8].
The best practices that were implemented as part of the successful data warehousing projects within
various sectors (government agencies, military institutions and organizations directly supporting them)
are presented in [9]. A generic framework for summarizing distributed data streams to be loaded into a
data warehouse is proposed in [10]. The proposed approach limits the load of the central server hosting
the data warehouse and optimizes the sampling rates assigned to each input stream by minimizing the
sum of square errors. This optimization is dynamic and adapts continuously with the contents of the
streams.
Multidimensional DBMS and Data Warehouse
One of the technologies often discussed in the context of the data warehouse is multidimensional
DBMS processing often called OLAP processing [4, 15, 16, 14]. Multidimensional database
management systems, or data marts, provide an information system with the structure that allows an
organization to have a flexible access to data, to slice and dice data in any number of ways, and to
dynamically explore the relationship between summary and detailed data [16, 2, 8, 14].
A data warehouse holds massive amounts of data; the multidimensional DBMS holds at least an
order of magnitude less data. A data warehouse contains data with a lengthy time horizon (from 5 to
10 years); the multidimensional DBMS holds a much shorter time horizon of data. It allows access to
its data in a constrained fashion. Multidimensional DBMS allows unfettered access to its users.
One of the interesting features of the relationship between data warehouse and multidimensional
DBMS is that data warehouse can provide a basis for a detailed data that is normally not found in
multidimensional DBMS. The data warehouse can contain a fine degree of details, which are lightly
summarized as they passed up to the multidimensional DBMS [16, 11, 7].
Distributed Warehouse Data Model
The heart of any data warehouse is its database, where all the information is stored [15, 25, 26].
Most traditional data warehouses use one of the relational products for this purpose. They can manage
extremely large amount of data (hundreds of terabytes) mainframe relational databases; such as DB2
are used for some of the world's largest data warehouses. Universal data servers such as those from
Oracle or Informix may be a good choice for medium-size warehouses because they manage a variety
of data types. Multidimensional databases are becoming increasingly popular, but they limit the size of
a warehouse to less than 5 gigabytes [27, 3].
In the record-oriented (relational) storage systems the attributes of a tuple are placed contiguously in
storage. With this row store architecture, a single disk write suffices to push all of the fields of a single
record out to disk. Hence, high performance writes are achieved, a DBMS with a row store architecture
is called a write-optimized system [28]. In contrast, systems oriented toward querying a large amount
of data should be read-optimized. Data warehouses represent one class of read-optimized system in
which periodically a bulk load of new data is performed, followed by a relatively long period of ad-hoc
queries. In such environments, a column store architecture, in which values for each single column (or
attribute) are stored contiguously, should be more efficient. With column store architecture, a DBMS
need only read values of columns required for processing a given query, and can avoid bringing into
memory irrelevant attributes.
Conclusion
In this paper, we have proposed an ontological OLAP framework to integrate distributed energy
sensor data. The data cubes in the framework are annotated with semantics and supporting mechanisms
to alleviate the issues of the schema inconsistency in data sources. The mechanisms such as rules and
procedural functions to support the mapping between data schema and to carry out data transformation
have been introduced to the framework. When a cube is generated, it can be stored in a repository for
future use. As a result, these existing OLAP cubes can be reused and composed to form a new cube.
The proposed ontological OLAP framework provides mechanisms to support necessary steps
required for composition, as the cubes may have attribute differences and missing data. The developers
can take advantages of these existing cubes and the supporting functions to produce composite
multidimensional cubes for analysis and decision making.