Using Embedded Multipliers in Spartan-3 FPGAs pdf

**study tips** · 25-08-2017, 09:32 PM

Using Embedded Multipliers in Spartan-3 FPGAs

.pdf

Using Embedded.pdf (Size: 122.81 KB / Downloads: 23)

Introduction

Spartan-3 FPGAs have a number of features to fortify the chip’s arithmetic capabilities. Carry
logic and dedicated carry routing continues to be provided as in past generations. Dedicated
AND gates in the CLBs accelerate array multiplication operations. The newest and most
significant addition is the dedicated 18x18 two’s-complement multiplier block. With 4 to 104 of
these dedicated multipliers in each device, fast arithmetic functions can be implemented with
minimal use of the general-purpose resources. In addition to the performance advantage,
dedicated multipliers require less power than CLB-based multipliers.
The embedded multipliers offer fast, efficient means to create 18-bit signed by 18-bit signed
multiplication products. The multiplier blocks share routing resources with the Block
SelectRAM™ memory, allowing for increased efficiency for many applications. Cascading of
multipliers can be implemented with additional logic resources in local Spartan-3 slices.
Applications such as signed-signed, signed-unsigned, and unsigned-unsigned multiplication,
logical, arithmetic, and barrel shifters, two’s-complement and magnitude return are easily
implemented.
The 18-bit x 18-bit multipliers can be quickly created using the CORE Generator™ system, or
they can be instantiated (or inferred) using VHDL or Verilog.

Data Flow

Each embedded multiplier block (MULT18X18 primitive) supports two independent dynamic
data input ports: 18-bit signed or 17-bit unsigned. The two inputs are referred to as the
multiplicand and the multiplier, or the factors, while the output is the product. The MULT18X18
primitive is illustrated in Figure 1.

Timing Specification

The result is generated faster for the LSBs than the MSBs, since the MSBs require more levels
of addition, so timing specifications are different for each of the 36 multiplier outputs. Designs
should use only as many output bits as are necessary. For example, if two unsigned numbers
will never have a product of 235 or higher, the P[35] output is always zero. For any pair of signed
numbers of n bits, if you will never have -2n-1 x -2n-1, then the MSB is always identical to the
next lower-order bit (P[2n-1] = P[2n-2]). Also consider that if some outputs must have longer
routing delays, they should be put on the output LSBs to balance with the MSB delays.
For the same reason, the data input setup time for the pipelined multiplier will be shorter for the
MSBs than the LSBs, but the timing parameters do not differentiate between pins for setup
time. For additional safety margin in a design, slower inputs should be put on the MSBs. The
Reset and Clock Enable inputs have much faster setup times than any of the data inputs, and
all have zero hold times. The timing parameter name "tMULIDCK" (MULtiplier Input Data to
ClocK) is used for both the data and control inputs, but will have different values for each type.

Multipliers in the Spartan-3 Architecture

The multipliers are located adjacent to the block RAM, making it convenient to store inputs or
results in the block memory (see Figure 4). There are two or four columns of multipliers in each
device. Where there are two columns, they have two rows of CLBs between them and the edge,
allowing the multiplier to be easily driven by CLB or IOB logic. There are four CLBs, or 16 slices
and 32 LUTs, on either side of a given multiplier block, allowing 32 input and output signals to
be connected immediately adjacent to the multiplier block. One possible high-speed layout is to
put A[15:0] on one side, B[15:0] on the other side, and intersperse the P[31:0] outputs on both
sides. For a full-size 18x18 multiplier, the extra inputs and outputs can connect to the next CLB
column. For best performance, pipeline the inputs with registers in the adjacent CLBs.

Expanding Multipliers

Multiplication using inputs with more than 18 bits is possible by decomposing the multiplication
process into smaller subprocesses. The binary representation of either input can be split at any
point, provided the proper weighting and sign of the MSBs is taken into account. Splitting off the
18 MSBs of the input makes the best use of the 18-bit signed multipliers.
For example, Figure 5 shows how a 22x16 multiplier could be implemented. The 22-bit value is
decomposed into an 18-bit signed value and a 4-bit unsigned value from the LSBs. Two partial
products are formed. The first is a 20-bit signed product, which is the result of multiplying the
16-bit signed value by the 4-bit unsigned section. The second is a 34-bit signed product, formed
by multiplying the 16-bit signed value by the 18-bit signed section. The addition process
restores the weighting of the products (note the least significant bits of the first product bypass
the addition) and forms the final 38-bit product. Since the first product is signed, the 20-bit value
needs to be sign-extended before addition. The adder itself only needs to be 34 bits, requiring
17 slices.

Design Entry

There are many options for including the Spartan-3 multiplier in a design. The library primitive
MULT18X18 and MULT18X18S described earlier can be instantiated in the schematic or HDL
code. Synthesis tools can infer a multiplier block from the multiply operator, including Xilinx
XST, Synplicity Synplify, and Mentor LeonardoSpectrum. They will infer the MULT18X18S
when the operation is controlled by a clock for a synchronous multiplier.
LeonardoSpectrum features a pipeline multiplier that involves putting levels of registers in the
logic to introduce parallelism and, as a result, use CLB resources instead of the dedicated
multipliers. A certain construct in the input RTL source code description is required to allow the
pipelined multiplier feature to take effect. See the Synthesis and Simulation Design Guide for
more information.

System Generator

The Multiplier Generator is used by the System Generator for DSP when the MULT block is
used. System Generator presents a high level and abstract view of the design, but also
exposes key features in the underlying silicon, making it possible to build extremely highperformance
FPGA implementations. The System Generator also provides blocks for compiling
MATLAB® M-code into synthesizable HDL code. The System Generator uses the embedded
multiplier when a parallel multiplier is selected and the use of the dedicated multiplier is
checked in the System Generator interface.

MAC Cores

The CORE Generator system and the System Generator can also implement more complex
functions using the multiplier as a building block. The Multiply Accumulator (MAC) core
supports up to 32-bit inputs and optional user-defined pipelining. The options of an Embedded
or LUT Based implementation control whether the dedicated multipliers or CLB resources are
used for the function. The MAC implementation uses relatively few CLB resources beyond the
dedicated multipliers and provides flexibility that is key to matching a design to the lowest
density and lowest cost solution possible.
The MAC and MAC-based FIR filters include an automatic pipeline control which is based on
required system clock performance. Levels of pipeline will automatically be inserted based on
the design requirement for a perfect speed/area trade-off.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	PRE-PAID POWER CONSUPTION USING COIN BOX REPORT	seminar flower	1	2,551	27-10-2017, 01:01 PM Last Post: nagaland 8pm today
	Electronics Letter Box Using IC 555	seminar tips	2	4,448	26-10-2017, 02:37 PM Last Post: jaseela123
	Industrial Conveyor System Using Magnetic Levitation system	seminar code	2	1,193	25-10-2017, 12:03 PM Last Post: jaseela123
	Anti-Bag-Snatching Alarm pdf	project girl	2	2,179	11-10-2017, 11:58 AM Last Post: jaseela123
	Qualitative Features Extraction from Sensor Data using Short-time Fourier Transform	seminar flower	1	856	21-09-2017, 01:04 PM Last Post: jaseela123
	SMART BOX USING ARM11	seminar code	1	664	21-09-2017, 11:08 AM Last Post: jaseela123
	''Solar Street Light using PIR sensor ''	seminar code	1	670	20-09-2017, 11:25 AM Last Post: jaseela123
	Water graphics using embedded system	project maker	1	599	19-09-2017, 04:35 PM Last Post: jaseela123
	“Power Quality Improvement In Reactive Power Control Using ‘FC-TCR’ Circuit	seminar code	1	1,012	19-09-2017, 04:10 PM Last Post: jaseela123
	A new algorithm for transient motor current signature analysis using wavelets	seminar tips	1	1,646	19-09-2017, 03:45 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.