08-01-2013, 12:14 PM
Data Management & Data Process of Environmental Observation
Data Management.pptx (Size: 1,010.07 KB / Downloads: 20)
Motivation behind Environmental DB
The task of creating a data model for environmental observations that can sufficiently capture and disseminate disparate data types has been a multi-year, multi-pronged collaborative effort between domain experts ranging in expertise from field research to database architecture; using cutting edge technologies in interoperability, data organization and intelligent service designs
A primary goal of the EnviroDB project is to promote and encourage community involvement around all facets of environmental data collection and dissemination
In addition to encouraging a data community, EnviroDB seeks to interact with other data schemas that are in use for environmental observations today including the Observations Data Model (ODM) created by CUAHSI and Utah State University (USU).
Database Framework and Support
EnviroDB is implemented using SQL Server2008 [18] for reasons of performance, stability, interoperability and the ability to plug directly into the tooling and resources of a proven enterprise class commercial framework. The use of a commercially supported database has been mirrored worldwide as adoption of SQL Server database software in particular climbed 16.5 percent in 2008 to $3.1 billion dollars [19]. With the release of SQL Server 2008, and Microsoft becoming a principal member of the OGC, geospatial data standards are now natively supported by the database. Data types such as geometry [20] and geography [21] are implemented providing rich functionality for parsing and manipulating spatial data, and serve as core drivers for storing and extracting spatial observations. It is similarly worth mentioning that hierarchical support is also native to SQL Server 2008 in the form of HierarchyID. As its name implies, it is designed to store parent-child relationships between records in a single table adding hierarchical functionality into the relational model. These newly supported SQLCLR Data Types are key components in storing and extracting spatial observation information used in environmental sensing.
Entity Data Model
Visual Studio 2010 [22] enhances the ability to connect programmatically to the database using Object-Relational Mapping (ORM). ORM is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates in effect a "virtual object database" that can be used from within the programming language. The ADO.NET Entity Framework is Microsoft’s instantiation of this, and as of Visual Studio 2008 allows a programmatic connection to the database schema using an Entity Data Model (EDM) or Entity-Relationship data model. Entities are instances of Entity Types (e.g., Customer, Employee), which are richly structured records with a key. An entity key is formed from a subset of properties of the Entity Type. The key (e.g., CustId, EmpId) is a fundamental concept to uniquely identify and update entity instances and to allow entity instances to participate in relationships. Entities are grouped in Entity Sets (i.e., Customers is a set of Customer instances). Relationships are instances of Relationship Types which are associations among two or more entity types (e.g., Employee Works for Department). Relationships are grouped in Relationship Sets. EnviroDB makes extensive use of the Entity Framework in order to facilitate straightforward database access for developers using any .NET language without having to understand the complexities of EnviroDB’s data schema.
The EDM works in conjunction with a new query language Entity SQL (eSQL), an SQL-like query language designed to enable set-oriented, declarative queries and updates in terms of entities and relationships. EDM can work with other query languages as well. The EDM and eSQL together represent a richer data model and query language and have been designed to enable applications such as CRM, ERP, data-intensive services such as Reporting Services, Business Intelligence, Replication, Synchronization, and data-intensive application developers to model and manipulate data at a level of structure and semantics that is closer to their needs.
Stored Procedures
Stored procedures are mechanisms that simplify the database development process by grouping Transact-SQL statements into manageable blocks. Key benefits in using stored procedures are precompiled execution, reduced client/server traffic, efficient reuse of code and programming abstraction, and enhanced security controls. Business layer code can be reduced by encapsulating logic into stored procedures and improving the performance of execution. Additionally client overhead, SQL traffic, can be reduced by calling pre-compiled and cached stored procedures. Stored procedures that are created with generality in compliance to the EnviroDB model can also be portable, adding extended functionality to the EnviroDB user community. It has been our observation that due to relative complexity of the EnviroDB model, it would be beneficial to have stored procedures that support a smart search feature (or Metadata Search) of the SCO CRONOS dataset. For example, querying “Feature of Interest” with its children and corresponding attributes.
Functions
Similar to stored procedures, functions are used to improve the performance of the database but are additionally expected to return a scalar or table value. Functions can only be called from within another SQL statement and have no pre-planned caches. EnviroDB employs the user-defined functions as described below:
Get Data From Trajectory (Table valued)
Extract all observations, including spatial information, for a particular series, such as the values tagged for a hurricane for example.
Get Raster Metadata (Table valued)
Extract and parse metadata for a specified file containing raster data.
Import Raster (Table valued)
Import raster data from a specified file as table output
Get Parent FeatureID (Scalar valued)
Return the parent feature ID for a Feature ID using the functionality methods of system data type HierarchyID.
In general functions are more suitable for row centric operations when it comes to overall SQL Server performance.