24-08-2012, 10:14 AM
Natural Language processing in order to generate SPARQL queries
1Natural Language.doc (Size: 154.5 KB / Downloads: 46)
Abstract
Viewing the development of computer languages development, it seems just an encapsulation over the other which helps the user to use with ease. Computers work only on 1s and 0s (except quantum computers) it was found harder to instruct the machine using those binaries and hence the assembly language, and then the programming language using compiler emerges which then leads to numerous development frameworks. This could be easily illustrated through that even object oriented is a logical illusion. As already known, all the programming languages that exist would be with reference to some natural language i.e. English. So, the current focus was made on the natural language i.e. English.
Using any query language in order to query the database seems to be a complex problem always for the non-programmers. This complexity causes the common users not to use the resources effectively. Thus, the circumstances could be made available so that the non-programmers could give-in their query in the natural language itself and get their desired result. In this paper, the input sentence is tokenized, parsed, and then the desired query is made to be generated easily by classifying the user question.
INTRODUCTION
There are huge amount of data now available in every company, organization and so on. However, only the users who were expert in data query methods were able to use these raw data. Thus the path must be created so that natural language could also be used as a thick-mode to retrieve data from the warehouse which leads this paper. The conversion of natural language to the query language was done through the following steps as shown in Figure 1.
Here, the database was taken in the format of ontology in belief that this design of the warehouse would lead this paper to a meaningful extraction [1][2].
THE PATH
The route which would give out the query required was as in Figure 2. Here, the semantics as referred in Figure 1 was taken deep into account. The possible questions were classified based on the properties such as number of constraints, distance between the classes i.e. relations, name attribute and so on[3]. Finding out the question format from the parse tree through identification of nouns in relation to the classes as defined in ontology file. This was followed by finding out the output class, attributes and then the input. Finally the relationship between classes to be found out if the question given was of distance 2 or more. And then the query could be easily generated based on the template file for the respective format of question given.
Database design
As usual, the database was stored in 0’s and 1’s format, which was not encapsulated by the logical layer above which was rows and columns. To make the schema more relational and meaningful, the database schema was made through the ontology.
Ontology is a formal representation of the knowledge by a set of concepts which are specific to a domain and the relationships among these concepts. It could be easily defined as a formal data structure which uses the XML file to describe the database. Ontology consists of individuals, classes, properties. Individuals represent objects in the domain. An upper ontology describes common concepts that are generally applicable across a wide range of domains. Domain ontology is used in restricted-domain question answering systems to formalize domain knowledge and represent natural language questions and underlying unstructured information sources.
CONCLUSION AND FUTURE WORK
After the user question is analysed, the input and output classes, its attributes and input values are extracted from the user question to generate the corresponding SPARQL query with the help of query templates in accordance with the classification criteria. Thus the subset of templates are only to be searched while generating the SPARQL query.Then the SPARQL query is used to retrieve the answers to the user question from the RDF database through Jena. Thus, the very high-level programming has been made into act to query on the database.