28-09-2013, 03:25 PM
A SPURIOUS-POWER SUPPRESSION TECHNIQUE FOR MULTIMEDIA/DSP APPLICATIONS
SPURIOUS-POWER SUPPRESSION.pdf (Size: 923.78 KB / Downloads: 117)
ABSTRACT
This paper presents the design exploration and applications of a spurious-power suppression technique (SPST)
which can dramatically reduce the power dissipation of combinational VLSI designs for multimedia/DSP purposes.
The proposed SPST separates the target designs into two parts, i.e., the most significant part and least significant
part (MSP and LSP) .
INTRODUCTION
Power dissipation is recognized as a critical parameter in modern VLSI design field. To satisfy MOORE‟S law and
to produce consumer electronics goods with more backup and less weight, low power VLSI design is necessary. Fast
multipliers are essential parts of digital signal processing systems. The speed of multiply operation is of great
importance in digital signal processing as well as in the general purpose processors today, especially since the media
processing took off. In the past multiplication was generally implemented via a sequence of addition,
Subtraction, and shift operations. Multiplication can be considered as a series of repeated additions. The
number to be added is the multiplicand, the number of times that it is added is the multiplier, and the result is the
product. Each step of addition generates a partial product. In most computers, the operand usually contains the same
number of bits. When the operands are interpreted as integers, the product is generally twice the length of operands
in order to preserve the information content. This repeated addition method that is suggested by the arithmetic
definition is slow that it is almost always replaced by an algorithm that makes use of positional representation. It is
possible to decompose multipliers into two parts. The first part is dedicated to the generation of partial products, and
the second one collects and adds them.
TYPES OF MULTIPILERS
Webster‟s dictionary defines multiplication as “a mathematical operation that at its simplest is an
abbreviated process of adding an integer to itself a specified number of times”. A number (multiplicand) is added to
itself a number of times as specified by another number (multiplier) to form a result (product). In elementary
school, students learn to multiply by placing the multiplicand on top of the multiplier. The multiplicand is then
multiplied by each digit of the multiplier beginning with the rightmost, Least Significant Digit (LSD). Intermediate
results (partial-products) are placed one atop the other, offset by one digit to align digits of the same weight. The
final product is determined by summation of all the partial-products. Although most people think of multiplication
only in base 10, this technique applies equally to any base, including binary. Figure 1.1 shows the data flow for the
basic multiplication technique just described. Each black dot represents a single digit.
Hardware Multipliers
Direct hardware implementations of shift and add multipliers can increase performance over software
synthesis, but are still quite slow. The reason is that as each additional partial-product is summed a carry must be
propagated from the least significant bit (LSB) to the most significant bit (MSB). This carry propagation is time
consuming, and must be repeated for each partial product to be summed.
Array Multipliers :
Conventional linear array multipliers consist of rows of carry-save adders (CSA). A portion of an array
multiplier with the associated routing can be seen in Figure 1.2. In a linear array multiplier, as the data propagates
down through the array, each row of CSA‟s adds one additional partial-product to the partial sum. Since the
intermediate partial sum is kept in a redundant, carry-save form there is no carry propagation. This means that the
delay of an array multiplier is only dependent upon the depth of the array, and is independent of the partial-product
width. Linear array multipliers are also regular, consisting of replicated rows of CSA‟s. Their high performance and
regular structure have perpetuated the use of array multipliers for VLSI math co-processors and special purpose DSP
chips.
A radix-4 modified Booth's algorithm:
Booth's Algorithm is simple but powerful. Speed of VMFU is dependent on the number of partial products
and speed of accumulate partial product. Booth's Algorithm provide us to reduced partial products. We choose
radix-4 algorithm because of below reasons.
Circuit Design Features :
One of the most advanced types of MAC for general-purpose digital signal processing has been proposed
by Elguibaly . It is an architecture in which accumulation has been combined with the carry save adder (CSA) tree
that compresses partial products. In the architecture proposed in, the critical path was reduced by eliminating the
adder for accumulation and decreasing the number of input bits in the final adder. While it has a better performance
because of the reduced critical path compared to the previous VMFU architectures, there is a need to improve the
output rate due to the use of the final adder results for accumulation. The architecture to merge the adder block to
the accumulator register in the VMFU operator was proposed to provide the possibility of using two separate N/2-bit
adders instead of one-bit adder to accumulate the MAC results. Recently, Zicari proposed an architecture that took a
merging technique to fully utilize the 4–2 compressor .It also took this compressor as the basic building blocks for
the multiplication circuit.
CONCLUSION
This work presents a versatile multimedia functional unit is designed with low-power technique called
SPST, 16x16 multiplier-accumulator (MAC), with addition, subtraction, sum of absolute difference, interpolation. A
Radix-2 Modified Booth multiplier circuit is used for MAC architecture. Compared to other circuits, the Booth
multiplier has the highest operational speed and less hardware count. The basic building blocks for the VMFU unit
are identified and each of the blocks is analyzed for its performance. Power and delay is calculated for the blocks.
MAC unit is designed with enable to reduce the total power consumption based on block enable technique. Using
this block, the N-bit MAC unit is constructed and the total power consumption is calculated for the MAC unit.