03-01-2013, 12:28 PM
Bi-directional Translation of Relational Data into Virtual RDF Stores
1Bi-directional Translation.pdf (Size: 742.54 KB / Downloads: 24)
Abstract
A vast majority of the world’s valuable data currently
exists in relational databases and other legacy storage
systems. In order for Semantic Web applications to access such
legacy data without replication or synchronization of the same,
the gap between the two needs to be bridged. Several efforts exist
that publish relational data as Resource Description Framework
(RDF) triples; however almost all current work in this arena is
uni-directional, presenting the existing and new data from an
underlying relational database into a corresponding virtual RDF
store in a read-only manner. This paper expands on previous
relational-to-RDF bridging work, by enabling the bridge to be
bi-directional and allowing data updates specified as triples
to be propagated back to the relational database as tuples.
Algorithms to translate the triples to be updated/inserted/deleted
into equivalent relational attributes/tuples whenever possible are
presented. A widely embraced open-source tool called D2RQ is
enhanced with these algorithms to serve as evidence of the bidirectionality
of our translation process.
INTRODUCTION
The ability of the World Wide Web to serve as a universal
medium for the exchange of data and knowledge has increased
its popularity and spawned related efforts such as the Semantic
Web. The Semantic Web is a rapidly maturing initiative that
is envisioned to enhance computer understandability of the
subject matter of web pages through the use of meta-data
rather than keywords. Further, Semantic Web technologies and
specifications such as the Resource Description Framework
(RDF1) and the Web Ontology Language (OWL2) provide
a means to integrate disparate data sources and reuse data
across applications through the use of ontologies. The above
reasons, coupled with the simplicity and flexibility offered by
RDF and other Semantic Web technologies, have resulted in
widespread adoption of the same. On the other hand, relational
databases, by virtue of having been in existence for several
decades now, are the most commonly used storage solutions
in production environments.
RELATED WORK
Several research efforts exist that attempt to bring relational
database concepts and Semantic Web concepts together in
a uni-directional, read-only manner. One such effort is the
D2RQ project [2] which is essentially a mapping between
relational schema and OWL/RDF-Schema (RDFS4) concepts.
D2RQ takes a relational database schema as input and presents
an RDF interface of the same as output. Our work in this
paper is centered completely around D2RQ and attempts to
extend the same to permit insert, update, and delete operations
on the underlying RDBMS. The work in [3] is yet another
effort that, like D2RQ, also uses a declarative meta schema
consisting of quad map patterns that define the mapping of
relational data to RDF ontologies. RDF123 [6], an open source
translation tool, also uses a mapping concept, however its
domain is spreadsheet data and it attempts to achieve richer
spreadsheet-to-RDF translation by allowing the users to define
mappings between spreadsheet semantics and RDF graphs.
The work in [7], [8], [9] describe more mapping attempts in
the reverse direction. In [7] the authors use relational.OWL
to extract the semantics of a relational database, automatically
transform them into a machine-readable RDF/OWL ontology.
D2RQ++ Algorithms
Algorithm 1 is fairly straightforward and is used on simple
triples that involve a literal or resource object and that do NOT
involve any blank nodes. As can be seen in the algorithm,
the only time an INSERT statement is executed against the
underlying relational schema is when the predicate exists as
a column in the table to which the subject of the triple
belongs and the subject value itself does not exist as a primary
key value in the same table. When the predicate exists as a
column in the table and the subject exists as a primary key
value in the same table, the object value is updated (using
an SQL UPDATE statement) only if the corresponding cell
in the relational schema is empty. If not, under the Open-
World Assumption, the object in the input triple is considered
to be a duplicate value for the corresponding column and
is preserved by housing the triple in the native RDF store.
Subsequent querying of that column will return both values,
i.e., the cell value stored in the relational database as well
as the object value for the corresponding predicate stored in
the native RDF store. In the event the predicate of the input
triple does not map to an equivalent column in the underlying
relational schema as specified in the mapping file, the input
triple is always added into the native RDF store.
CONCLUSION
A bi-directional translational mechanism between relational
databases and RDF data stores, D2RQ++, was presented in
this paper. This work was motivated by a need to enable DML
operations to be propagated back to the underlying relational
database whenever possible and to continue to maintain the
Open-World Assumption during the propagation process. One
of the requirements of our bi-directional translation work was
to reuse and extend existing uni-directional solutions in the
translational arena in order to avoid reinventing the wheel
wherever possible. Thus, D2RQ, a highly popular and widely
adopted uni-directional translational tool was chosen as the
foundation upon which our bi-directional algorithms were
to be built. D2RQ++ is, thus, essentially a wrapper around
D2RQ that transforms the latter from a read-only application
to a read-write application. The various algorithms comprising
D2RQ++ were presented and the feasibility of the proposed
framework was demonstrated through a variety of experimental
results in the form of screenshots and performance graphs.