20-08-2014, 04:55 PM
Improving Offline Handwritten Text Recognition with Hybrid HMM/ANN Models
Project Report
Improving Offline.pdf (Size: 1.08 MB / Downloads: 16)
Abstract
This paper proposes the use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing
unconstrained offline handwritten texts. The structural part of the optical models has been modeled with Markov chains, and a
Multilayer Perceptron is used to estimate the emission probabilities. This paper also presents new techniques to remove slope and
slant from handwritten text and to normalize the size of text images with supervised learning methods. Slope correction and size
normalization are achieved by classifying local extrema of text contours with Multilayer Perceptrons. Slant is also removed in a
nonuniform way by using Artificial Neural Networks. Experiments have been conducted on offline handwritten text lines from the IAM
database, and the recognition rates achieved, in comparison to the ones reported in the literature, are among the best for the same
task.
INTRODUCTION AND MOTIVATION
FFLINE handwritten text recognition is one of the most
active areas of research in computer science and it is
inherently difficult because of the high variability of writing
styles. High recognition rates are achieved in character
recognition and isolated word recognition, but we are still
far from achieving high-performance recognition systems
for unconstrained offline handwritten texts [1], [2], [3], [4],
[5], [6], [7].
PREPROCESSING AND FEATURE EXTRACTION
Handwritten image normalization from a scanned image
includes several steps, which usually begin with image
cleaning, page skew correction, and line detection [9]. A
database of skew-corrected lines has been used in all the
experiments [32]; thus page skew correction and line
detection are skipped in this work. With the handwritten
text line images, several preprocessing steps to reduce
variations in writing style are usually performed: slope and
slant removal and character size normalization
HYBRID HMM/ANN MODELING
For small vocabulary handwriting recognition tasks (for
example, check amounts or postal addresses), it is possible to
model words individually. But, for large vocabulary or even
unconstrained tasks, the only feasible approach is to
recognize individual graphemes and map them onto
complete words belonging to a fixed vocabulary . The
same problem has to be addressed for automatic speech
recognition, and HMMs have been accepted as the standard
solution [44]. For offline handwritten text recognition, the
image is converted into a sequence X ¼ ðx1 ... xmÞ of feature
vectors and, under the statistical approach to pattern
recognition [44], [45], the goal of general handwritten text
recognition is to find the likeliest word sequence W? ¼
ðw1 ... wnÞ maximizing the a posteriori probability:
Feature Extraction
After preprocessing, a feature extraction method is applied
to capture the most relevant characteristics of the character
to recognize. In our system, a handwritten text line image is
converted into a sequence of fixed-dimension feature
vectors. Following [10], features are extracted by applying
a grid to the image and computing three values for each cell
of the grid: the normalized gray level and the horizontal
and vertical gray level derivatives. A grid of square cells
MLP for Image Cleaning
As described in Section 2.1, an MLP has been used for image
cleaning by learning the appropriate filter from examples.
4.2.1 Training Data
Original noisy images from the IAM database and the same
images that were cleaned by hand formed the training pairs.
Additionally, artificially noised images (created by following
the ideas presented in [34]) were also used as training data.
MLP for Slant Removal
As described in Section 2.3, part of the process of slant
removal needs an MLP to determine whether or not an
image has slant.
4.4.1 Training Data
The same set of 1,000 images was manually slant-corrected
in a nonuniform way by using a graphical tool. The user
specifies a series of slant angles which are interpolated for
every image column. This information is used to train the
Slant-MLP. As before, 200 images were used for validati
Experiments with Hybrid HMM/ANN Models
Hybrid HMM/ANN models, with a different number of
states and different topologies and parameters of MLP,
were tested. In all cases, the MLP input consisted of nine
consecutive feature vectors (the central feature vector and a
context of four vectors at each side). The softmax outputs
(after being divided by the prior state probabilities) were
used as emission probabilities of the states of the 78 optical
models. Thus, we trained fully connected MLPs of 540 input
units (the 60-dimensional nine feature vectors). The number
of output units is determined by the total number of states
of the 78 optical models (from 78 6 output units for 6-state
HMMs to 78 9 output units for 9-state HMMs) since each
output unit of the MLP is related to one state of the HMMs.
The number of hidden units was determined empirically by
measuring the MSE on the validation set. Other parameters,
such as the learning rate and the momentum term, were
also empirically tuned with the validation dat
CONCLUSIONS
In this paper, we have presented a hybrid HMM/ANN
system for recognizing unconstrained offline handwritten
text lines. The key features of the recognition system are the
novel approach to preprocessing and recognition, which
are both based on ANNs. The preprocessing is based on
using MLPs:to clean and enhance the images,
. to automatically classify local extrema in order to
correct the slope and to normalize the size of the text
lines images, and
. to perform a nonuniform slant correction.
The recognition is based on hybrid optical HMM/ANN
models, where an MLP is used to estimate the emission
probabilities