28-05-2013, 12:32 PM
Content-Based Microscopic Image Retrieval System for Multi-Image Queries
Content-Based Microscopic.PDF (Size: 1.03 MB / Downloads: 24)
Abstract
In this paper, we describe the design and development
of a multitiered content-based image retrieval (CBIR) system for
microscopic images utilizing a reference database that contains
images of more than one disease. The proposed CBIR system uses
a multitiered approach to classify and retrieve microscopic images
involving their specific subtypes, which are mostly difficult
to discriminate and classify. This system enables both multi-image
query and slide-level image retrieval in order to protect the semantic
consistency among the retrieved images. New weighting
terms, inspired from information retrieval theory, are defined for
multiple-image query and retrieval. The performance of the system
was tested on a dataset including 1666 imaged high power
fields extracted from 57 follicular lymphoma (FL) tissue slides
with three subtypes and 44 neuroblastoma (NB) tissue slides with
four subtypes. Each slide is semantically annotated according to
their subtypes by expert pathologists. By using leave-one-slide out
testing scheme, the multi-image query algorithm with the proposed
weighting strategy achieves about 93% and 86% of average
classification accuracy at the first rank retrieval, outperforming
the image-level retrieval accuracy by about 38 and 26 percentage
points, for FL and NB diseases, respectively.
INTRODUCTION
THANKS to the technical advances in diverse modalities
such as X-ray, computed tomography (CT), and MRI, and
their common use in clinical practice, the number of medical
images is increasing every day. These medical images provide
essential anatomical and functional information about different
body parts for detection, diagnosis, treatment planning, and
monitoring, as well as medical research and education. Exploration
and consolidation of the immense image collections
require tools to access structurally different data for research,
diagnostics, and teaching. Picture archival and communication
systems provide the hardware and software for the storage, retrieval,
and management of radiological images [1].
RELATED WORK
Most of the commercial search engines (e.g., Google,Yahoo!,
Bing Image Search) are built around a semantic search, i.e., the
user needs to type in a series of keywords and the images in
those databases are also annotated using keywords; the match
is accomplished primarily through these keywords. CBIR systems
have been developed in the recent years to organize and
utilize the valuable image sources effectively and efficiently for
diverse collections of images. Most of the recent CBIR systems
in biomedicine [5], [8], [9], [20], [21] are designed to classify
and retrieve images according to the anatomical categories of
their content, i.e., head or chest X-ray images or abdominal CT
images. For example, the Automatic Search and Selection Engine
with Retrieval Tools (ASSERT) system [5] was designed
for high-resolution CT images of the lung, where each set of
feature was extracted from the pathology-bearing regions. Similarly,
CBIR for CT images of three types of liver lesions was
investigated by incorporating semantic features observed by radiologist
as well as features computationally extracted from the
images [8]. Previously, a prefiltering approach [9] was proposed
to reduce the search space of query images by categorizing the
images using multiclass support vector machines (SVMs) and
fuzzy c-mean clustering.
FEATURE EXTRACTION
In this section, we will explain the feature extraction techniques
that we employed to the images in our database.
Low-Level Feature Extraction
There are many factors affecting the performance and accuracy
of CBIR systems, such as choosing more discriminative
features, similarity measurement criteria, query formulation,
and so on. In order to design an effective CBIR system, the
initial step in our study is to extract discriminative features from
the images in the reference database. These features will also
be calculated for query images.
One of the most discriminating characteristics ofmicroscopic
images is color, especially when compared to most common radiological
images, which are mostly gray level. Due to the high
resolution of microscopic images, subtle changes in characteristics
of cells, combinations of cells, structures, and tissues can
also be differentiated from each other by texture characteristics.
Therefore, for our CBIR design, we heavily make use of
color and texture characteristics and extract these features using
low-level image feature extraction techniques.
TWO-TIER RETRIEVAL APPROACH
FOR MULTI-IMAGE QUERIES
Our CBIR system operates at two tiers. In the first tier, the
designed classifier categorizes the query image/images into one
of the major disease types such as FL and NB. Once the disease
category of the image is determined, the search for the query
image can be carried out among the category relevant subtypes
in the subsequent tier. For example, when the query image belongs
to NB disease, database images in the first tier will be
filtered according to the NB disease category.
DATASET AND EXPERIMENTAL RESULTS
Annotated Microscopic Image Dataset
Table I lists the details of the database that we used in this
study and Fig. 4 shows randomly selected sample images belonging
to different histological grades of FL cases. The number
of cropped images per slide is between 11 and 30 for FL cases
and between 7 and 35 for NB cases. For FL slides, a team of
experienced hematopathologists selected about 10 random microscopic
high power fields (HPF) to interpret the disease grade
in terms of the average number of centroblasts per HPF. Note
that, for both FL and NB, we use internationally accepted and
clinically practiced standards. For FL, our collaborating pathologists
use the World Health Organization grading system.
CONCLUSION
In this paper, we have presented a novel content-based microscopic
image/slide retrieval algorithm. We have demonstrated
that by using the proposed weighting scheme inspired by IR theory,
the slide-level retrieval performance of the CBIR system is
considerably better than the traditional image-level retrieval accuracy
for all seven subtypes of two challenging diseases, which
have inter- and intrareading semantic variations, intraslide semantic
variations, and intersubtype visual similarities. In the
first tier, only one slide among 44 NB slides is misclassified,
and in the second tier, about 26 percentage points of improvement
was achieved on the classification accuracy at the first rank
retrieval over all diseases by using the proposed score weighting
strategy. This CBIR system can enable the user, e.g., a pathologist,
to select multiple HPF regions from a suspected tissue and
submit those images as a query to the CBIR system and retrieve
the most relevant slides with their semantic annotations with
higher accuracies. The results, achieved under those challenging
conditions, are also promising for automatic and unsupervised
selected query images based on their HPF regions. Application
of the proposed weighting strategy, inspired by the IR theory, is
not limited to microscopic images only, and can be also useful
for any type of multiquery search and content-based retrieval
systems.