Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: The Evolution of DSP Processors ppt
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
The Evolution of DSP Processors

[attachment=30877]

Introduction

The number and variety of products that include some
form of digital signal processing has grown dramatically
over the last five years. DSP has become a key component
in many consumer, communications, medical, and industrial
products. These products use a variety of hardware
approaches to implement DSP, ranging from the use of
off-the-shelf microprocessors to field-programmable gate
arrays (FPGAs) to custom integrated circuits (ICs). Programmable
“DSP processors,” a class of microprocessors
optimized for DSP, are a popular solution for several reasons.
In comparison to fixed-function solutions, they have
the advantage of potentially being reprogrammed in the
field, allowing product upgrades or fixes. They are often
more cost-effective (and less risky) than custom hardware,
particularly for low-volume applications, where the
development cost of custom ICs may be prohibitive. And
in comparison to other types of microprocessors, DSP
processors often have an advantage in terms of speed,
cost, and energy efficiency.

DSP Algorithms Mold DSP Architectures

From the outset, DSP processor architectures have been
molded by DSP algorithms. For nearly every feature
found in a DSP processor, there are associated DSP algorithms
whose computation is in some way eased by inclusion
of this feature. Therefore, perhaps the best way to
understand the evolution of DSP architectures is to examine
typical DSP algorithms and identify how their computational
requirements have influenced the architectures of
DSP processors. As a case study, we will consider one of
the most common signal processing algorithms, the FIR
filter.

Fast Multipliers

The FIR filter is mathematically expressed as , where
is a vector of input data, and is a vector of filter coefficients.
For each “tap” of the filter, a data sample is multiplied
by a filter coefficient, with the result added to a
running sum for all of the taps (for an introduction to DSP
concepts and filter theory, refer to [2]). Hence, the main
component of the FIR filter algorithm is a dot product:
multiply and add, multiply and add. These operations are
not unique to the FIR filter algorithm; in fact, multiplication
(often combined with accumulation of products) is
one of the most common operations performed in signal
processing—convolution, IIR filtering, and Fourier transforms
also all involve heavy use of multiply-accumulate
operations.

Efficient Memory Accesses

Executing a MAC in every clock cycle requires more than
just a single-cycle MAC unit. It also requires the ability to
fetch the MAC instruction, a data sample, and a filter
coefficient from memory in a single cycle. Hence, good
DSP performance requires high memory bandwidth—
higher than was supported on the general-purpose microprocessors
of the early 1980’s, which typically contained
a single bus connection to memory and could only make
one access per clock cycle. To address the need for
increased memory bandwidth, early DSP processors
developed different memory architectures that could support
multiple memory accesses per cycle. The most common
approach (still commonly used) was to use two or
more separate banks of memory, each of which was
accessed by its own bus and could be read or written during
every clock cycle. Often, instructions were stored in
one memory bank, while data was stored in another. With
this arrangement, the processor could fetch an instruction
and a data operand in parallel in every cycle

Data Format

Most DSP processors use a fixed-point numeric data type
instead of the floating-point format most commonly used
in scientific applications. In a fixed-point format, the
binary point (analogous to the decimal point in base 10
math) is located at a fixed location in the data word. This
is in contrast to floating-point formats, in which numbers
are expressed using an exponent and a mantissa; the
binary point essentially “floats” based on the value of the
exponent. Floating-point formats allow a much wider
range of values to be represented, and virtually eliminate
the hazard of numeric overflow in most applications. DSP
applications typically must pay careful attention to
numeric fidelity (e.g., avoiding overflow).

Specialized Instruction Sets

DSP processor instruction sets have traditionally been
designed with two goals in mind. The first is to make maximum
use of the processor's underlying hardware, thus
increasing efficiency. The second goal is to minimize the
amount of memory space required to store DSP programs,
since DSP applications are often quite cost-sensitive and
the cost of memory contributes substantially to overall
chip and/or system cost. To accomplish the first goal, conventional
DSP processor instruction sets generally allow
the programmer to specify several parallel operations in a
single instruction, typically including one or two data
fetches from memory (along with address pointer
updates) in parallel with the main arithmetic operation.
With the second goal in mind, instructions are kept
short (thus using less program memory) by restricting
which registers can be used with which operations, and
restricting which operations can be combined in an
instruction. To further reduce the number of bits required
to encode instructions, DSP processors often offer fewer
registers than other types of processors

The Current DSP Landscape

Conventional DSP Processors
The performance and price range among DSP processors
is very wide. In the low-cost, low-performance range are
the industry workhorses, which are based on conventional
DSP architectures. These processors are quite similar
architecturally to the original DSP processors of the early
1980s. They issue and execute one instruction per clock
cycle, and use the complex, multi-operation type of
instructions described earlier. These processors typically
include a single multiplier or MAC unit and an ALU, but
few additional execution units, if any.