25-02-2013, 09:16 AM
Answering General Time-Sensitive Queries
Answering General.doc (Size: 31 KB / Downloads: 35)
ABSTRACT
Time is an important dimension of relevance for a large number of searches, such as over blogs and news archives. So far, research on searching over such collections has largely focused on locating topically similar documents for a query. Unfortunately, topic similarity alone is not always sufficient for document ranking. In this paper, we observe that, for an important class of queries that we call time-sensitive queries, the publication time of the documents in a news archive is important and should be considered in conjunction with the topic similarity to derive the final document ranking. Earlier work has focused on improving retrieval for “recency” queries that target recent documents. We propose a more general framework for handling time-sensitive queries and we automatically identify the important time intervals that are likely to be of interest for a query. Then, we build scoring techniques that seamlessly integrate the temporal aspect into the overall ranking mechanism. We present an extensive experimental evaluation using a variety of news article data sets, including TREC data as well as real web data analyzed using the Amazon Mechanical Turk. We examine several techniques for detecting the important time intervals for a query over a news archive and for incorporating this information in the retrieval process. We show that our techniques are robust and significantly improve result quality for time-sensitive queries compared to state-of-the-art retrieval techniques.
Interaction Model
1. Client-driven interventions
Client-driven interventions are the means to protect customers from unreliable services. For example, services that miss deadlines or do not respond at all for a longer time are replaced by other more reliable services in future discovery operations.
2. Provider-driven interventions
Provider-driven interventions are desired and initiated by the service owners to shield themselves from malicious clients. For instance, requests of clients performing a denial of service attack by sending multiple requests in relatively short intervals are blocked (instead of processed) by the service.
Time Interval Feedback:
Time-sensitive query over a news archive, our approach automatically identifies important time intervals for the query. These intervals are then used to adjust the document relevance scores by boosting the scores of documents published within the important intervals. We have implemented our system on top of Indri,2 a state-of-the-art search engine that combines language models and inference networks for retrieval, as well as over Lemur3, into its implementation. Our system provides a web interface for searching the News blaster archive4, an operational news archive and summarization system, and for experimenting with variations of our approach.
Temporal Relevance Feedback:
We discuss several techniques to estimate the temporal relevance of a day to a query at hand. These estimation techniques use the temporal distribution of matching articles for the query to compute the probability that a day in the archive has a relevant document for the query.
Overall ranking document identification:
We integrate temporal relevance with state-of-the- art retrieval models, including a query likelihood model, a relevance model, a probabilistic relevance model, and a query expansion with pseudo relevance feedback model, to naturally process time-sensitive queries. In these models, we combine topical relevance and temporal relevance to determine the overall relevance of a document.