Implementation of Conflation algorithm.

**seminar ideas** · 14-06-2012, 02:28 PM

Implementation of Conflation algorithm.

.doc

CONFLATION_writeup.doc (Size: 27.73 KB / Downloads: 56)

Introduction to Information Retrieval

In today’s information explosion era, increase in demand for quicker dissemination of
information, from contents stored in a variety of forms requires speedy search and timely
retrieval. The values of documents are measured according to the information it contains but
they are proved useless until the stored information is brought out for use by the readers. This
may be either by subject analysis or representation of the terms through symbols. It has
always been the need of the scholars and the lingering turmoil in the minds of library
organizers, to suitably facilitate the extraction of the contents expeditiously and exhaustively
that has brought forward the concept of information retrieval.

Meaning & Definition:
Calvin Mooers coined the term information retrieval in 1950. In the context of library and
information science, we mean to get back information, which is, in a way, hidden, from
normal sight or vision. According to J.H. Shera: It is, "the process of locating and selecting
data, relevant to a given requirement."
Calvin Mooers: "Searching and retrieval of information from storage, according to
specification by subject."

Functions:
The major functions that constitute an information retrieval system, comprises of: Acquisition,
Analysis, Representation of information, Organisation of the indexes, Matching, Retrieving,
Readjustment and Feedback

Components of Information Retrieval System:
A study of the functions of IRS brings forth some of the essential components that constitute
the proper functioning of the system. According to Lancaster, an information retrieval system
consists of six basic subsystems. They are as follows:
1. The document selection subsystem
2. The indexing subsystem
3. The vocabulary subsystem
4. The searching subsystem
5. The user-system interface
6. The marching subsystem
All the above subsystems may be grouped under two groups' subject/content analysis and
search strategy. Subject or content analysis includes the task of analysis, organisation and
storage of information. Search strategy includes analysis of user queries, creation of search
formula and the actual searching.

Conflation Algorithm

Ultimately one would like to develop a text processing system which by means of computable methods with the minimum of human intervention will generate from the input text (full text,abstract, or title) a document representative adequate for use in an automatic retrieval system.This is a tall order and can only be partially met. A document will be indexed by a name if one of its significant words occurs as a member of that class.

Such a system will usually consist of three parts:

(1) removal of high frequency words,
(2) suffix stripping,
(3) detecting equivalent stems.

The removal of high frequency words, 'stop' words or 'fluff' words is one way of
implementing Luhn's upper cut-off. This is normally done by comparing the input text with a
'stop list' of words which are to be removed. The advantages of the process are not only that
non-significant words are removed and will therefore not interfere during retrieval, but also
that the size of the total document file can be reduced by between 30 and 50 per cent.

INPUT:
1. A text file containing stop words
2. A document which is searched and index according to frequency of words

OUTPUT:
Document containing frequently appearing words without stop words and removing
Stemming.

OPERATIONAL STEPS REQUIRED: Steps Required for Conflation algorithm which are as follows:-
1) Removal of High Frequency words.
2) Suffix Stripping (Stemming).

ADVANTAGES:
The advantages of the algorithm are not only that non-significant words are removed and will therefore not interfere during retrieval, but also that the size of the total document file can be reduced by between 30 and 50 per cent.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Blowfish Encryption Algorithm pdf	project girl	1	1,113	12-09-2017, 12:36 PM Last Post: jaseela123
	Design and Implementation of Secure Auditing System in Linux Kernel	seminar ideas	1	2,308	06-09-2017, 12:24 PM Last Post: jaseela123
	A Reduced-Bit Multiplication Algorithm for Digital Arithmetic	seminar flower	1	1,864	31-08-2017, 03:41 PM Last Post: jaseela123
	COMPARISON OF DIFFERENT TYPES OF SCHEDULING ALGORITHM IN WIRE NETWORK	seminar surveyer	1	4,508,586	30-08-2017, 11:02 AM Last Post: jaseela123
	Grover\'s Algorithm A Fast Search Method Using Quantum Parallelism	nit_cal	0	7,716,663	25-08-2017, 09:32 PM Last Post: nit_cal
	Implementation of Digital Watermarking	nit_cal	0	7,500,150	25-08-2017, 09:32 PM Last Post: nit_cal
	Implementation of faster RSA key generation on smart cards	nit_cal	0	10,754,521	25-08-2017, 09:32 PM Last Post: nit_cal
	BREAKING RSA (An efficient Factoring algorithm)	seminar projects crazy	0	19,630,170	25-08-2017, 09:32 PM Last Post: seminar projects crazy
	P3P Implementation Using Database	nit_cal	0	9,813,970	25-08-2017, 09:32 PM Last Post: nit_cal
	HEURISTIC ALGORITHM FOR CLIQUE PROBLEM	seminar projects crazy	0	13,039,448	25-08-2017, 09:32 PM Last Post: seminar projects crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.