12-07-2014, 02:53 PM
Information Retrieval Using Customized Ontology Model Based On Clustering
Information Retrieval.pdf (Size: 523.72 KB / Downloads: 18)
Abstract—
The explosion of data leads to the problem on how information should be retrieved accurately and effectively. To address this issue, ontologies are widely used to represent user profiles in personalized web information gathering. Most models use only knowledge from either a global knowledge base or user local information. In this paper, a non-content based customized ontology model is proposed for knowledge representation and reasoning over user profiles. This model generates user Local Instance Repository which includes non-content based descriptors referring to the subjects. The proposed customized ontology model is evaluated by comparing it against the previously proposed content-based ontology model for web information gathering. The result shows that this model has improvement over the former models in the hit/miss ratio, recall and precision parameters.
Keywords— Ontology, user profiles, non-content based descriptors, local instance repository, global knowledge
I. INTRODUCTION
In recent times, the amount of web information has exploded rapidly. Gathering useful information from the web has become a challenge for the web users. In most of the models that has been developed to solve this issue, user profiles has been created for extracting user background knowledge [1],[5],[9],[10].
User profiles contain the concept model which represents the background knowledge possessed by the users. A superior representation of user profiles can be built by simulating user’s concept model. A concept model is implicitly possessed by users and is generated from their background knowledge. While this concept model cannot be proven in laboratories, many web ontologists have observed in user behaviour [10].
For simulating user concept models, ontologies—a knowledge description and formalization model—are utilized in personalized web information gathering. Such ontologies are called ontological user profiles or personalized ontologies.
The user background knowledge can be analysed through global and local analysis. Global analysis uses the existing global knowledge bases for user background knowledge representation. Local analysis is used for extracting user behaviour from the user profiles. Both global and local information are used for discovering the user background knowledge in a better way. This discovery can be further improved by using ontological user profiles.
The commonly used knowledge bases include generic ontologies e.g. Word net, Thesauruses, digital libraries. Word Net was reported as helpful in capturing user interest. It is used in creating ontological user profiles.
The goal of ontology learning is to semi-automatically extract relevant concepts and relations from a given corpus or other kinds of data sets to form ontology. In this paper, a customized ontology to evaluate this hypothesis is proposed.
The ideas which we have implemented in this paper:
1. Global search produces search results based on the existing global knowledge.
2. Local search produces search results based on the user interest which is analysed using user profiles.
3. Content-based clustering is done which searches not only the query with the document name but also with the content present in it.
All local and global repositories have content-based descriptors referring to the subjects. However, a large volume of documents existing on the web may not have such content-based descriptors. To refer those non-content based descriptors clustering technique is used which also groups the documents which does not have 435
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
www.ijert.org
IJERTIJERT
Vol. 2 Issue 6, June - 2013
IJERTV2IS60022
descriptors. Compared with other benchmark models customized ontology model is successful.
IV.CUSTOMIZED ONTOLOGY CONSTRUCTION USING CLUSTERING TECHNIQUE
Customized ontology is constructed which describes the user background knowledge. For example a user might have different expectations for searching the same query. For example if we are searching for the term “Singapore”, business travellers may expect different search from leisure travellers. A user’s concept model may change according to different information needs.
Constructing a customized ontology groups the related documents for the given query. But the documents which have contents which is related to the query will be left unsearched. Using clustering technique the content-based clustering is done which searches not only the query with the document name but also with the content present in it.
A. World Knowledge Representation
Global Knowledge representation is the analysis of how to accurately and effectively reason and how best to use a set of symbols to represent a set of facts with in a knowledge domain.
In this model user background knowledge is extracted from the set of files, documents and links loaded in the server.
The initial step is the construction of world knowledge base. The user expects various results for searching a single query so the world knowledge base should cover the wide range of topics.
The World Knowledge base is created by the administrator. The administrator uploads files, documents and links which are commonly referred by the users.
B. Ontology Construction
An ontology is constructed using the feedback provided for the subjects by the user for the given topic. The structure of the ontology is based on the semantic relations linking those subjects.
Depending on the users interest the subjects are provided ranks and based on the ranks the data are classified and the customized ontology for each user is constructed.
During the global search, after the construction of ontology the data are retrieved based on the information given in the user’s profile.
V.PROPOSED MODEL
In the proposed model two types of search operations are performed. The two types of search operations are global search and customized search.
The global search considers the subjects provided in the world knowledge base. The customized search considers only the subjects provided by the individual based on their interests.
Clustering is used in the information retrieval systems to enhance the efficiency and effectiveness of the retrieval process.
Clustering is a division of data into groups of similar objects. Each group consists of objects that are similar between themselves and dissimilar to objects of the group. In our proposed model the concept of clustering is applied at the initial level i.e. global knowledge representation level, which makes the user to search in the respective domain of the given key word. This will results in effective search and the accurate output.
We use relationships between the keywords to cluster the documents. The relationships are retrieved from the Word Net ontology and represented in the form of a graph. The document graphs, which reflect the essence of the documents, are searched in order to find the frequent sub graphs. To discover the frequent sub graphs, we use the Frequent Pattern Growth (FP-growth) approach. The common frequent sub graphs discovered by the 437
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
www.ijert.org
IJERTIJERT
Vol. 2 Issue 6, June - 2013
IJERTV2IS60022
FP-growth approach are later used to cluster the documents.
CONCLUSION AND FUTURE WORK
A. Conclusion
The customized ontology model for information retrieval performs better in producing the accurate results by clustering the text documents based on its content. Clustering of documents improves the recall parameter by 80%. This in-turn increases the precision parameter value. Since the correctness of the results is more, the user can find documents relevant to his interest in a single search.
B. Future direction
Future work will experiment the algorithm in which search can be extended for all kinds of documents by varying parameters. Multilingual concepts can be introduced. Since the ontologies are constructed in the language that the developers are used to, the search query and the result will be of the same language. For a person who does not know that language will not be able to do the search. So before the customized ontology module a dictionary/wordnet module can be introduced to retrieve all semantic words related to the given keyword and then a multilingual terms module in order to get those words in the language that the user specified. This extends the system for different languages which allows people of different languages to make use of the system.