18-01-2013, 01:01 PM
A new fuzzy logic based information retrieval model
1A new fuzzy logic.pdf (Size: 213.12 KB / Downloads: 37)
Abstract
We propose a comprehensive model
of information retrieval (IR) based
on Zadeh's linguistic statements. Its
characteristic feature is a capability
to take into account both the im-
precision and uncertainty pervading
the textual information representa-
tion. It extends earlier IR models
based on broadly meant fuzzy logic.
Moreover, some techniques for ob-
taining quantitative representations
of documents and queries are pro-
posed.
Introduction
Generally, (textual) information retrieval
deals with broadly meant storage and process-
ing of textual information. A basic task is
here the retrieval of those documents from a
collection which are relevant, i.e., match infor-
mation needs of a user expressed as a query.
The relevance may be meant as binary, i.e., a
document is then regarded as either relevant
or irrelevant. Thus the answer to a query is
a set of documents considered to be relevant.
More generally, a matching degree is computed
for each document meant as an assessment of
its relevance. Then an answer to a query is
a list of documents non-increasingly ordered
against their matching degree.
Fuzzy logic in the sense of Zadeh
We assume a standard notation in which a
fuzzy set F in the universe U is dened by
its membership function μF : U ! [0, 1],
μF (x) 2 [0, 1], 8x 2 U, and μF (x) is meant as
the membership degree of x to the set F. The
family of all fuzzy sets dened in U will be
denoted F(U). The membership degree has
its counterpart in the truth value in multival-
ued logic. For example, the statement John is
young, depending on the actual age of John,
denoted x, may be treated as true to a certain
degree, with this truth degree (value) equated
with the membership degree μF (x), where F
is a fuzzy set modelling the linguistic term
young.
The model
Our starting point is the classical Boolean
model (cf., [1]) and its fuzzy logic based exten-
sion, notably due to Bordogna and Pasi [4].
We employ the statements of the type (2) to
represent documents and queries. Thus we
treat the importance of keywords as linguis-
tic variables and the statement Xi IS A is
meant as a generic form of the expressions
exemplied by: Keyword ti is fairly impor-
tant for the representation of the content of
the document (query) so that we can model
imprecision concerning the actual importance
of the keywords. To also grasp uncertainty,
the certainty qualied statements (10) are em-
ployed.
Pragmatic aspects of the model
The question of the actual form of (14) used
to represent documents and queries goes be-
yond the proposed model. However we have
experimented with some possible ways of de-
termining them on two levels of abstraction.
Namely we have made some tests with: the
general shapes (templates) of the membership
functions of the linguistic terms appearing in
(14) and the ways of automatic determining
concrete forms of these membership functions
during the indexing process. Here we will only
discuss the former one. Moreover, since min
in (7) does not lead to satisfactory results, we
have also tested some alternatives.
Concluding remarks
We presented a new fuzzy logic based informa-
tion retrieval model to directly represent im-
precision and uncertainty of the IR processes
within the formal framework of fuzzy logic.
Pragmatic aspects of the proposed model are
discussed. Three templates for the representa-
tion of keyword importance are proposed.