01-01-2013, 01:03 PM
Learn to Personalized Image Search from the Photo
Sharing Websites
INTRODUCTION
Keyword-based search has been the most popular search
paradigm in today’s search market. Despite simplicity and efficiency,
the performance of keyword-based search is far from
satisfying. Investigation has indicated its poor user experience
- on Google search, for 52% of 20,000 queries, searchers did
not find any relevant results [1]. This is due to two reasons. 1)
Queries are in general short and nonspecific, e.g., the query of
Copyright © 2011 IEEE. Personal use of this material is permitted.
However, permission to use this material for any other purposes
must be obtained from the IEEE by sending a request to pubspermissions
ieee.org.
This work was supported in part by the National Natural Science
Foundation of China (Grant No. 90920303, 61003161) and National
Program on Key Basic Research Project (973 Program, Project No.
2012CB316304).
J. Sang and C. Xu (corresponding author) are with the National Lab
of Pattern Recognition, Institute of Automation, Chinese Academy
of Sciences, Beijing 100190, China; and also with the China-
Singapore Institute of Digital Media, Singapore, 119615 (e-mail:
jtsang[at]nlpr.ia.ac.cn; csxu[at]nlpr.ia.ac.cn).
D. Lu is with the State Key Laboratory of Intelligent Control
and Management of Complex Systems, Institute of Automation,
Chinese Academy of Sciences, Beijing 100190, China
RELATED WORK
In recent years, extensive efforts have been focusing on
personalized search. Regarding the resources they leveraged,
explicit user profile [17], relevance feedback [18], user history
data (browsing log [19], click-through data [20], [21] and
social annotations [11], [8], [4] etc.), context information
[23] (time, location, etc.) and social network [1], [3], [16]
are exploited. For the implementation there are two primary
strategies [24], query refinement and result processing. In the
following we review the related work by the strategy they
used.
Query Refinement, also called Query Expansion, refers to
the modification to the original query according to the user
information. It includes augmenting the query by other terms
[18], [25] and changing the original weight of each query term
[26]. Kraft et al. [18] utilized the search context information
collected from users’ explicit feedback to enrich the query
terms. Chirita et al. [25] proposed five generic techniques for
providing expansion terms, ranging from term and expression
level analysis up to global co-occurrence statistics and external
thesauri. While, Teevan et al. [26] re-assigned the weights
of original query terms using BM25 weighting scheme to
incorporate user interests as collected by their desktop indexes.
We do not explicitly perform query refinement in this paper.
However, mapping the queries into user-specific topic spaces
can be considered as implicit query refinement.
CONCLUSION AND FUTURE WORK
How to effectively utilize the rich user metadata in the social
sharing websites for personalized search is challenging as well
as significant. In this paper we propose a novel framework
to exploit the users’ social activities for personalized image
search, such as annotations and the participation of interest
groups. The query relevance and user preference are simultaneously
integrated into the final rank list. Experiments on a
large-scale Flickr dataset show that the proposed framework
greatly outperforms the baseline.
In the future, we will improve our current work along
four directions. 1) In this paper, we only consider the simple
case of one word-based queries. Actually, the construction
of topic space provides a possible solution to handle the
complex multiple words-based queries. We will leave it for
our future work. 2) During the user-specific topic modeling
process, the obtained user-specific topics represent the user’s
distribution on the topic space and can be considered as user’s
interest profile. Therefore, this framework can be extended to
any applications based on interest profiles. 3) For batch of
new data (new users or new images), we directly restart the
RMTF and user-specific topic modeling process. While, for a
small amount of new data, designing the appropriate update
rule is another future direction. 4) Utilizing large tensors
brings challenges to the computation cost. We plan to turn to
parallelization (e.g. parallel MATLAB) to speedup the RMTF
converge process. Moreover, the distributed storing mechanism
of parallelization will provide a convenient way to store very
large matrices and further reduce the storage cost.