25-08-2017, 09:32 PM
Metrics and Information Retrieval
Metrics and IR.pptx (Size: 1.33 MB / Downloads: 64)
Introduction: -
Primary Concern : -
Effectiveness in Information Retrieval (IR) field.
Information retrieval (IR) is the task of representing, storing, organizing, and offering access to information items.
To evaluate an IR system is to measure how well the system meets the information needs of the users.
This is troublesome, given that a same result set might be interpreted differently by distinct users
To deal with this problem, some metrics have been defined that, on average, have a correlation with the preferences of a group of users
Without proper retrieval evaluation, one cannot
determine how well the IR system is performing
compare the performance of the IR system with that of other systems, objectively
Retrieval evaluation is a critical and integral component of any modern IR system
Retrieval performance evaluation consists of associating a quantitative metric to the results produced by an IR system
This metric should be directly associated with the relevance of the results to the user
Usually, its computation requires comparing the results produced by the system with results suggested by humans for a same set of queries
As per now total no. of various metrics available are 44.
Classification is based on two factors: Relevance & Retrieval.
Precision @ N ( P@N ) and R-Precision: -
Precision and recall are not enough for evaluating IR systems.
For example, if we have two systems that retrieve 10 documents each, 5 relevant and 5 not relevant, both have precision 0.5, but a system that has the first 5 retrieved documents relevant and the next 5 irrelevant is much better than a system that has the first 5 retrieved documents irrelevant and the next 5 relevant (because the user will be annoyed to have to check the irrelevant documents first).
Thus, modified measures that combine precision and recall and consider the order of the retrieved documents are needed.
Some good measures are: precision at 5 retrieved documents, precision at 10 retrieved documents or some other cut-off point (N); the R-Precision.
The R-precision is the precision at the R-th position in the ranking of the results for a query that has R known relevant documents. Precision is equal to Recall at the R-th position.
Other Measures
Relative recall: the ratio between the number of relevant documents found and the number of relevant documents the user expected to find
Recall effort: the ratio between the number of relevant documents the user expected to find and the number of documents examined
Q – measure and R – measure: -
The following are some properties of Q-measure and R-measure:
Q-measure is equal to one iff a system output is an ideal one.
R-measure is equal to one iff all of the top R documents are (at least partially) relevant; That is, it cannot tell the difference between an ideal ranked output and (say) an output that has all B-relevant documents at the very top, followed by all A-relevant ones, followed by all S-relevant ones.
In a binary relevance environment, Q-measure = AveP holds iff there is no relevant document below Rank R, and Q-measure > AveP holds otherwise.
In a binary relevance environment, R-measure = R-Prec.
With small gain values, Q-measure behaves like AveP.
With small gain values, R-measure behaves like R-prec.
IR metrics based on graded-relevance are required to:
Prefer systems that return highly relevant documents to those that return partially relevant documents;
Prefer systems that have relevant documents near the top of the ranked list to those that have relevant documents near the bottom.
Q-measure an averageable graded relevance metric.
Q-measure which is very highly correlated with AveP and is at least as stable and discriminative as AveP.
Q-measure uses recall as the basis, whereas nDCG is rank based .
Q-measure is more flexible than normalized Discounted Cumulative Gain and generalized Average Precision.