Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

SIGNAL PROCESSING TOOLS FOR SPEECH RECOGNITION

[attachment=32296]

ABSTRACT

This paper describes the design and development of a set
of signal processing software tools for speech
recognition. The tools were developed for inclusion in a
comprehensive public domain speech recognition toolkit.
We describe the design philosophy underlying the
development of the tools as well as the key features that
enable realization of our design goals of modularity,
extensibility, and usability. A GUI-based configuration
tool is presented that allows complicated, multi-pass front
end algorithms to be created using a graphical editor and a
library of fundamental algorithm components. We also
discuss results of tests to verify the correctness and
usability of the tool set, including benchmarks on
SWITCHBOARD, WSJ0 and the Aurora Large Vocabulary
tasks.

INTRODUCTION

The Institute for Signal and Information Processing (ISIP)
provides a comprehensive public domain toolkit [1] for
performing speech and signal processing research. Several
differentiating features are its ease of use, extensibility,
and educational components. In this paper we describe the
design and implementation of its signal processing
components, which provide a GUI-based environment to
perform signal processing research and education.
An overview of a speech recognition system is
shown in Figure 1. The tool described here deals with the
block known as the Acoustic Front-end, which
encapsulates most of the signal processing portions of a
recognition system. Signal processing tools extract feature
vectors from speech data, and play a critical role in the
development of speech recognition systems.

Algorithm Library

The algorithm library contains a collection of signal
processing algorithms implemented as a hierarchy of C++
classes. The implementation of this hierarchy using an
abstract base class, AlgorithmBase, and virtual functions
or methods that comprise the interface contract, is the
single most important feature, since it makes the library
extensible. All algorithm classes are derived from this base
class. However, since it is an abstract class, no objects are
ever directly instantiated from it. Instead, it defines the
interface contract, specifying virtual functions that all
Algorithm classes must provide, and centralizes useful
protected data common to all algorithms, such as sample
frequency and frame duration. The interface contract is
summarized in Table 1.

Signal Processing Configuration Tool

The procedure by which users employ the tools and
libraries can be described as follows: First, the signal
processing configuration tool is used to graphically
specify the sequence of algorithms and their configuration
using a block diagram. This is saved to a file containing a
description of the block diagram. This description uses a
graph data structure containing components, each of
which has its own configuration. Second, a control tool
accepts the speech data file and the recipe files produced
in the first step as input. It then parses the recipe file using
functions provided by the signal processing library to
obtain the necessary information for each algorithm.
Finally, the control tool applies the corresponding
algorithm functions to process the input speech data by
calling the correct method in the algorithm library.

EXPERIMENTAL RESULTS

We tested the quality of our toolkit along two dimensions,
correctness and usability. To verify the correctness of the
computation results, we have successfully built several
complex front ends, including an industry standard front
end based on Mel-frequency cepstrum coefficients
(MFCCs) [4]. Our testing procedure entailed first
comparing the data generated from the general purpose
tools described in this paper to similar data generated from
a prior version of our software that has been publicly
available for several years, but contained less general
implementations of many algorithms. Since there are subtle
differences in the way the components are implemented,
byte by byte comparisons of the data are not always
possible or desirable.

CONCLUSIONS

This paper has presented the signal processing
component of our public domain speech recognition
toolkit. This component was designed and developed in
adherence to our philosophy of providing a flexible,
extensible software environment for speech recognition
researchers. Our goal was to enable researchers to explore
ideas freely, unencumbered by low-level programming
issues. To achieve this goal, we implemented several
critical features in our signal processing software tools,
including a library of standard algorithms for basic DSP
functions, the ability to add new algorithms to this library
easily, and a GUI-based configuration tool for creating
block diagrams to describe algorithms, allowing rapid
prototyping without programming.

seminar flower