13-11-2012, 01:58 PM
Optimizing Bloom Filter Settings in Peer-to-Peer Multi keyword Searching
Abstract
Peer-to-Peer multi keyword searching requires
distributed intersection/union operations across wide
area networks,
raising a large amount of traffic cost. Existing schemes
commonly utilize Bloom Filters (BFs) encoding to
effectively
reduce the traffic cost during the intersection/union
operations. In this paper, we address the problem of
optimizing the settings of a BF. We show, through
mathematical proof, that the optimal setting of BF in
terms of traffic cost is determined by the statistical
information of the involved inverted lists, not the
minimized false positive rate as claimed by previous
studies. Through numerical analysis, we demonstrate
how to obtain optimal settings. To better evaluate the
performance of this design, we conduct comprehensive
simulations on TREC WT10G test collection and query
logs of a major commercial web search engine. Results
show that our design significantly reduces the search
traffic and latency of the existing approaches.