22-08-2012, 03:28 PM
Report Project Datamining
datamining_2.pdf (Size: 1.16 MB / Downloads: 58)
Introduction
Given a set of images humans generally have no problems making sense of the contents,
recognizing the objects displayed and using associations to classify them in categories.
Automation of all but the simplest of these tasks is daunting but very much needed when looking
for specific types of images in large databases (e.g. the Internet).
We have built a system that is capable of retrieving images while taking into account the
feedback of the user requesting the images. In the following chapters the workings of the system
and the process to this implementation will be explained.
Problem description
The system retains a collection of images each of which is a member of a class (e.g. birds,
buildings, flowers).
The user selects an image from the collection and inputs it to the system. The system then returns
those images which it thinks are most similar to the given image (and its associated class).
After this first step the user gives feedback to the system marking those images which are from
the correct class. The system uses this feedback to improve the results in the next iteration.
These iterations continue until an acceptable number of correct images has been returned or a
specified maximum number of iterations have taken place
Graphical User Interface
The Graphical User Interface (GUI) is an important part of our program, because in this
assignment, the user has to select the images that are relevant for his or her query. We chose to
use Java for the GUI, because the Matlab GUI development tool (GUIDE) is not suitable for
serious GUI building.
Relevance Feedback
The first method we tried was based on Rocchio's formula. It starts out by representing all
documents as points in an n-dimensional space, the image-space D. The query-image, is
represented as a vector, Q. After the initial run D is split into a set of relevant images and
irrelevant images, Dr and Dirr respectively. It then updates the query-vector Q using the following
formula:
Q* = Q + α Σ Dr – β Σ Dirr
After some experimentation this method was rejected. It didn't produce the results we'd hoped
for. This algorithm uses a purely nearest neighbor approach where the query point gets bumped
through the image-space at each iteration. This does not work if the classes are not well separated
in the feature space.
The next method had an individual weight for each feature individually instead of for each image
as a whole. Based on the relevant and the irrelevant images a feature-weight vector was updated
to reflect the relevance or irrelevance of certain features for the current requested image.
The idea was to have the algorithm focus on those features which were important to the
classification problem at hand and ignore features that were not important. This approach was
promising but did not provide the desired results and left us stranded.
datamining_2.pdf (Size: 1.16 MB / Downloads: 58)
Introduction
Given a set of images humans generally have no problems making sense of the contents,
recognizing the objects displayed and using associations to classify them in categories.
Automation of all but the simplest of these tasks is daunting but very much needed when looking
for specific types of images in large databases (e.g. the Internet).
We have built a system that is capable of retrieving images while taking into account the
feedback of the user requesting the images. In the following chapters the workings of the system
and the process to this implementation will be explained.
Problem description
The system retains a collection of images each of which is a member of a class (e.g. birds,
buildings, flowers).
The user selects an image from the collection and inputs it to the system. The system then returns
those images which it thinks are most similar to the given image (and its associated class).
After this first step the user gives feedback to the system marking those images which are from
the correct class. The system uses this feedback to improve the results in the next iteration.
These iterations continue until an acceptable number of correct images has been returned or a
specified maximum number of iterations have taken place
Graphical User Interface
The Graphical User Interface (GUI) is an important part of our program, because in this
assignment, the user has to select the images that are relevant for his or her query. We chose to
use Java for the GUI, because the Matlab GUI development tool (GUIDE) is not suitable for
serious GUI building.
Relevance Feedback
The first method we tried was based on Rocchio's formula. It starts out by representing all
documents as points in an n-dimensional space, the image-space D. The query-image, is
represented as a vector, Q. After the initial run D is split into a set of relevant images and
irrelevant images, Dr and Dirr respectively. It then updates the query-vector Q using the following
formula:
Q* = Q + α Σ Dr – β Σ Dirr
After some experimentation this method was rejected. It didn't produce the results we'd hoped
for. This algorithm uses a purely nearest neighbor approach where the query point gets bumped
through the image-space at each iteration. This does not work if the classes are not well separated
in the feature space.
The next method had an individual weight for each feature individually instead of for each image
as a whole. Based on the relevant and the irrelevant images a feature-weight vector was updated
to reflect the relevance or irrelevance of certain features for the current requested image.
The idea was to have the algorithm focus on those features which were important to the
classification problem at hand and ignore features that were not important. This approach was
promising but did not provide the desired results and left us stranded.