01-10-2016, 12:09 PM
1457241461-chalh2015.pdf (Size: 920.46 KB / Downloads: 7)
Abstract—Nowadays Big Data are becoming a popular
topic and a comparatively new technological concept focused
on many different disciplines like environmental science,
social media and networks, industry and healthcare. Data
volumes are on an upward trajectory associated with
increased data velocity, and variety. Furthermore, they are
needed to develop effective solutions to support intelligent,
proactive and predictive processes. In this paper we exploit
Big Data concepts for environmental sciences and water
resources. The aim of this article is to present the concept
and architecture of our Big Data Open Platform used for
supporting Water Resources Management. This Platform has
been designed to provide effective tools that allow water
system managers to solve complex water resources systems,
water modeling issues and help in decision making. The
Platform brings a variety of information technology tools
including stochastic aspects, high performance computing,
simulation models, hydraulic and hydrological models, grid
computing, decision tools, Big Data analysis system,
communication and diffusion system, database management,
geographic information system (GIS) and Knowledge based
expert system. The operators’ objectives of this Big Data
Open Platform are to solve and discuss water resources
problems that are featured by a huge volume of collected,
analyzed and visualized data, to analyze the heterogeneity of
data resulting from various sources including structured,
unstructured and semi-structured data, also to prevent and/or
avoid a catastrophic event related to floods and/or droughts,
through hydraulic infrastructures designed for such purposes
or strategic planning. This first paper will focus on the first
part developed and based on J2EE platform and specifically
the hypsometrical approach considered as a decision tool
allowing users to compare the effects of different current and
future management scenarios and make choice to preserve
the environment and natural resources.
I. INTRODUCTION
A. Definition and characteristics of Big Data
here is a lot of talk about Big Data in the last few
years and we have seen that it generate a lot of buzz
along with the launch of several successful Big Data
products. If we want to give a definition of what Big
Data means we will get different answers. Big Data is
defined as a “collection of data sets so large and
complex that it becomes difficult to process using onhand
database management tools or traditional data
processing applications”. The challenges include
capture, storage, search, sharing, transfer, analysis, and
visualization [1]. The data is too big, moves too fast, or
does not fit the structures of database architectures. To
gain value from this data, we have to choose an
alternative way to process it. Even though Big Data is
generally composed of structured, unstructured, semistructured
data, open and private data, as such, it is
characterized as being composed of the “three Vs”:
significant growth in the volume, velocity, and variety of
data [2].
The Three V’s are usually described successively as
follows:
Volume: refers to the mass quantities of data that
organizations are trying to harness to improve decision
making. Data volumes continue to increase at an
unprecedented rate, Big Data sizes are reported in
multiple terabytes and petabytes [3].
Variety: refers to the structural heterogeneity in a
dataset. Technological advances allow firms to use
various types of structured, semi-structured, and
unstructured data. Structured data, refers to the tabular
data found in spreadsheets or relational databases and
unstructured data refers to text, images, audio, and video
[4].
Velocity: refers to the rate at which data are generated
and the speed at which it should be analyzed and acted
upon. The proliferation of digital devices such as smart phones and sensors has led to an unprecedented rate of
data creation and is driving a growing need for real time
analytics and evidence based planning [4].
Big Data also characterized by other two Vs
refereeing to Veracity and Value.
B. Relevance of Big Data to water resources
The Environmental science especially water resources
constitute a Big Data issue and grows progressively. The
recent evolutions in web technology and computer
science provide the water resources discipline with
continuously expanding tools for data collection and
analysis that pose challenges to the design of analysis
methods, and interaction with data sets [5]. The demand
on water has increased due to population growth as a
result of economic development, while a several regions
suffer from flooding and drought, leading to water
resources mismanagement. On the other hand, climate
change exerts great impacts on water systems and caused
great changes in water resources due to its direct effects
on hydrological processes such as precipitation,
evaporation and humidity. The combination of growth
on the demand for water, climate and hydrological gap
pushed decision makers and managers of water resources
to look for strategies for effective management of water
resources[6].
The complexity of water resources problems is
characterized by the interaction of several physical
phenomena. Water problems include preservation of
water irrigation, watershed management, dam
construction for mitigation floods and/or conservation
purpose, river management, basin management,
pollution control [7]. The problems relating to water
resources are featured by:
x Huge volume of collected, analyzed and
visualized data.
x Data collected are complex in dimension, size
and heterogeneity (tsunamis of data, or Data
Deluge).
x Many different data sources, multi-scale, multimodels.
x Multidisciplinary is required.
x Spatial data, remote sensing in real time.
x Heterogeneous data resulting from various
sophisticated simulation models that can
ironically, create more of a Big Data challenge
than the experimental sciences they are supposed
to complement or replace.
x Large simulations are becoming unavoidable
tackling all the scientific aspects at multiple
scales.
Generally to facilitate the management of water
resources we will think about improving the adoption of
new technologies. In this paper, our efforts will be on:
Addressing the Big Data challenges related to water
Providing new and higher quality of information, due
to improved measurements through better measurement
techniques.
In addition improving the use of the available
information during model identification and prediction,
also developing improved models based on better
understanding of physical processes, mathematical
representation and approximation [8]. We are moving
toward a notion of implementing and developing a Big
Data open platform for supporting, modeling and
managing water resources. In the following section of
this paper we provide an overview of our platform
concepts and its components. The last section is an
application of one part of this platform to study the
hypsometrical approach in “Foum Tillicht” watershed in
Ziz basin in Morocco.
II. BIG DATA OPEN PLATFORM CONCEPTS, COMPONENTS,
STRUCTURE AND METHODOLOGY
A. Big Data Platform basic concepts and methodology
According to [9], Decision Support System (DSS) is a
computerized management advisory system that utilizes
databases, models, and dialog systems to provide
decision makers with timely management information
and to interact with the system.
Recently, various specific applications of DSS for
water resources related water problems were reported.
For example, development of an DSS for facilitating
water quality management is mentioned [10].
Prototype DSS for analyzing impact of catchment
policies scenarios adopted by multiple scales of
watershed authorities was also underlined [11]. DSS to
support reservoir operational management is reputed
[12]. Developed an integrated scenario-based multicriteria
decision support system for planning water
resources management is developed [13].
Our Big Data Open Platform attempts to offer some
functionalities of these published works cited above and
includes others. It provides an environment that permits
acquisition of data from different sources, integration of
databases, using GIS technology and developing
decision tools and models as well as multi-criteria
analysis and hypsometrical approach. Furthermore it
permits the use of hydraulic and hydrological models,
High Performance Computing (HPC), grid computing
and simulation models.
The requirements assigned for our Big Data Open
Platform are to develop a system that:
x Can manipulate a diversity of heterogeneous
databases management system and includes tools
for manipulating pre or post processing of Big
Data.
x Is modular this was considered as a very
important aspect since the infrastructure
developed should be open, distributed and capable to manipulate a variety of models and
databases.
x Can be used with a variety of computer platforms
and distributed architectures.
x Provides access to existing hydraulic and
hydrological models.
x Interact with Big Data analysis system for
collecting, analyzing and visualizing data.
x Can handle efficiently geodatabase by employing
Geographic Information System (GIS)
technology. Although GIS have been used in
conjunction with Big Data system, they are the
best combination to manipulate spatial data.
x Access or sharing databases, data sets and results
simulation with other distant or local systems via
Web services or APIs.
B. Big Data Platform components and structure
The conceptual structure of the Big Data Open
Platform for supporting water resources management as
shown in Fig. 1, which consists of nine blocs as follows:
1) Decision Support Tools, 2) Knowledge Based
System, 3) Geographic Information System (GIS), 4)
Big Data Analysis System, 5) Simulation Models, 6)
Computation and Processing, 7) Communication
System, 8) Search Engine and 9) Users Interfaces.
We will explain such a bloc as follows:
Decision Support Tools as a first bloc includes
decision support techniques to solve real-world decision
problems. The problem of assessing quality of decision
support tools is partly due to the variety of decision
support techniques available which potentially lead to
different decisions, in other words, selecting the best
decision method is a decision problem itself which, to be
solved, presumes that the best decision method needs to
be already known[14].
The second bloc consists of implementing a
Knowledge Based System. This bloc concerns a
collection and storage of quantitative and qualitative data
on the hydrological cycle and access to physical, socioeconomic,
demographic and water. It is used for data and
knowledge exchange between stakeholders, including
water professionals and experts. It could contain data
from existing spreadsheet or database management
system.
Geographic Information System (GIS) as a third bloc
is used to analyze, manipulate, integrate, store and
capture data. These data would be basic such as
hydrological data, meteorological data for hydrological
modeling, and basins characteristics, data water supply
and water demand information. According to [16]
ArcGis software considered as primordial GIS software.
It collects maps, applications, data and users. It allows
using geographic information for in-depth analysis, a
better understanding data in order to quickly make the
best decisions.
The fourth bloc consists in implementing Big Data
Analysis System to interact with the application of our
Big Data Open Platform. This bloc contains a set of
tools used to manage, analyze, visualize, and extract
useful information from huge quantities and varieties of
data sets so as to accelerate the progress of environment
discovery and innovation and also to encourage the
development of new data analytic tools and algorithms.
It requires particular technologies to efficiently process
large quantities of data, this includes: Big Data
Computing, Big Data Mining, Big Data Analytics and
Big Data Security.
The fifth bloc provides a set of Simulation Models for
water resources. These simulation models are linked to
data extracted from the fourth bloc (GIS) in order to
simulate water problems using interface and simulators.
Computation and Processing is the sixth bloc that
provides a container of tools including hydraulic,
hydrological models, high performance computing and
grid computing. The purpose of these tools is to
demonstrate the readiness of environmental science such
facilities for advancement of water resources prediction,
also facilities since large scale problems water are
concerned.
The seventh bloc Communication System is used for
rapid developments in communication technology. It
could be used to support data communication and
management requests for capacity building of the water
resources sector. To achieve efficiency and
effectiveness, it makes relevant data and information
available, as well as establishing a communication
system among related to other distant or local systems.
The eighth bloc Search Engine permits to answer the
following question: how users can find the appropriate
specialized information from our Big Data Open
Platform? The answer of this question is to provide a
possibility to users to use an automatic indexing
mechanism in order to return a precise research result.
Users Interfaces as the ninth bloc provide the users
capabilities of communicating with the platform. It helps
users formulate the problem, inputting data, presenting
results and graphics, visualizing data. It can also help
users to obtain information related to water resources.