12-12-2012, 04:57 PM
AUTOMATIC SEMANTIC IMAGE ANNOTATION SYSTEM
AUTOMATIC SEMANTIC IMAGE ANNOTATION.docx (Size: 192.44 KB / Downloads: 31)
Abstract
Most research on content based image classification has predominantly focused on the contextual information as
a way to eliminate well known semantic gap. This paper presents a novel system with word correlation model for learning the semantics of images that allow us automatically annotate an image with keywords. To exploit the context of image labels, co-occurrence between them was incorporated to ensureannotation refinement. Further, dynamical selection instead of fixed number of annotation labels is applied. Experiments on benchmark dataset demonstrate that annotation system with word correlation module achieves promising results
Introduction
Recently, massive unlabelled photos or photos with few tags have posed a great challenge
for image retrieval tasks. Automatic image annotation (AIA) is one of the promising solutions to address this challenge. The main idea of AIA techniques is to automatically learn semantic concept models from image samples, and use the concept models to label new images. In a typical image annotation problem, each picture is usually associated with a number of different semantic keywords. This poses so called Multi-Label Classification problem, in which each image may be associated with more than one class label. Once images are annotated with semantic labels, an image retrieval problem is turned into a text retrieval task, which is also more convenient for users.
As a way to facilitate automatic image annotation, inter-relation concept has received much research attention. Context of a picture from Flickr groups is investigated and proved to improve annotation (Ulges et al., 2010). Alike Sun et al. (2010) combines the annotation of similar images via collaborative approach. Similar images are searched with search engines and their tags then infer via word correlation the annotation of target image.
Jin et al. (2005) use WordNet for annotation refinement. WordNet is an online lexicon where more than 150K words are hierarchically organised. The words in WordNet maintain
‘is a kind of’ or ‘is a part of’ relationships which are used to find similarity between words. ImageNet (Deng et al., 2009) incorporates the concept ontology such as WordNet for organizing a large number of image concepts and their relevant images.
The word semantic correlation measurement based on cosine between vectors representing words is applied in several annotation systems (Sun F. et al., 2010, Zhang, Ma, 2011, Yang Y. et al., 2012). The main idea is based on capturing labels relations in a co-occurrence matrix. Gong et al. (2010) proposed a framework of using language models to represent the word-to- word relation. This improves the performance of existing image annotation approaches utilizing probabilistic models. Inspired by ideas from last group of research approaches, in this paper the annotation system with incorporated word correlation is proposed.
Annotation System
In order to capture semantic relation between images labels a novel multi-label annotation
system (MLAS) with word correlation model is proposed. Fig. 1 shows an overview of system. From training images the low-level features are extracted, to represent image in learning phase of classifier, when model is learnt. Information about correlation between
labels is also obtained from training images. In testing phase classifier assigns probabilities of labels to the test image represented with low-level features. The posterior probabilities are adjusted by the word correlation module. So, at the end top labels are selected to describe image.
Testing Images
Training Images
Low-level features module
Multi-label annotation
Colour is one of the most important features of images, so Scalable Colour Descriptor (SCD), Colour Layout Descriptor (CLD) (Manjunath et al., 2002) and dominant colour feature (Molitorisová, 2011) are extracted as low-level features. Texture and shape is captured by Edge Histogram Descriptor (EHD). They are suitable for describing photographs.
For classification the multi-class Support Vector Machine (SVM) with probabilistic output is utilized. It has been shown with high effectiveness in high dimensional data classifications, especially when the training dataset is small. SVM offers the state-of-the-art classification performances in many tasks, such as text classification, object recognition and image annotation. Since SVM is a binary classifier, “one against one” method is applied. By mapping the multi-class SVM outputs into probabilities we get the posterior probabilities of each image belonging to each category, denoted as