A VLSI Inner Product Macrocell pdf

seminar post · 29-04-2014, 12:25 PM

A VLSI Inner Product Macrocell

.pdf

VLSI Inner Product.pdf (Size: 203.99 KB / Downloads: 20)

Abstract

Microcontrollers for embedded computer applica-
tions require a library of dedicated macrocells for specific applica-
tions. Arithmetic and basic digital signal processor (DSP) compu-
tations may be too inefficient when computed by software on the
core central processing unit (CPU) of the microcontroller. Here
it is defined and developed the architecture of a VLSI macrocell
for the ST9 microcontroller (8 bits), for the computation of the
inner (scalar) product of two vectors of integer numbers based
on the multiply/accumulate algorithm. The arithmetic core of the
macrocell is an integer pipeline. This macrocell fully interfaces
to the ST9 environment and is optimized so as to achieve the
maximum performances compatible with the bandwidth of the
bus of ST9 and the minimum consumption of silicon area.
The macrocell is implemented in CMOSM5H technology (0.7
channel width) and its performances, measured in terms of silicon
area and throughput, are evaluated.

INTRODUCTION

MICROCONTROLLERS for embedded computer appli-
cations require a library of dedicated macrocells for
specific applications. Among the most frequently required
functions there are digital signal processor (DSP) and image
processor (IP) algorithms, for telecommunication applications.
Such algorithms are based on few types of fundamental
arithmetic computations: addition, scaling (i.e., multiplication
by a constant), multiplication, discrete convolution, inner
(scalar) product of vectors and matrix product. Some nonlinear
operations may be required, too (e.g., comparison), normally
reducible to arithmetic operations. They work in integer arith-
metic, as data (samples) are frequently of integer type at
the source and, moreover, floating point algorithms are easily
reducible to algorithms working in integer arithmetic.
Arithmetic computations may be too inefficient when done
by software on the core central processing unit (CPU) of
the microcontroller. A dedicated macrocell must then be
integrated in the microcontroller. In this paper it is defined
and developed the architecture of a VLSI macrocell, for the
ST9 microcontroller (see Fig. 1) [1], [2], dedicated to the
computation of the inner product of two vectors of integer
numbers, based on the multiply/accumulate algorithm.

Specifications

The size and the type of the data processed by the VCU must
be programmable: the elements of the source vectors
can be 8 or 16 bits unsigned or signed (two’s complement)
integers, in all possible combinations, and the result of the
inner product must be represented over 32 bits, as an unsigned
or signed integer depending on the type of the operands. All
these arithmetic situations frequently occur in the applications.
can span over the whole memories or register

The vectors

file of the core CPU, with arbitrary base and stride; this is
also required for performing efficiently matrix multiplication.
Finally, the VCU must detect overflow, suspend and resume.
The VCU is connected to the core CPU of the microcon-
troller as a peripheral unit, and is programmable by means
of a file of 16 registers of 8 bits, allocated in the peripheral
addressing space of the core CPU (see Table I).

CONCLUSION

Comparisons with SW solutions prove that the VCU has a
far higher time efficiency. For instance, in comparison with
the best known SW routine written in ST9 machine language,
for the computation of the inner product of two vectors with
elements of 8 bits (unsigned or signed, but not mixed) located
in memory, the VCU exhibits a temporal speed up of a factor
19. For vectors located in the register file it is possible to reach
a temporal speed up factor of about 50. A further improvement
occurs when the two vectors are of heterogeneous arithmetic
type; in this case the VCU handles directly the multiplica-
tions of mixed unsigned/signed factors, because it contains
a suited parallel multiplier, whereas the machine language
of ST9 requires sequences of instructions to perform such
multiplications, missing an appropriate instruction.
It must be noted that the placement and the routing of the
layout has been performed automatically. By manual optimiza-
tion the consumption of silicon surface should decrease. A
downscale of about 50% is considered possible.
Future research may concentrate on the further optimization
of the pipeline stages, like for instance reduction and accumu-
lation, and on upscaling the VCU for larger vector elements
(32 bits).

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	The Contourlet Transform: An Efficient Directional Multiresolution Image pdf	seminar post	1	1,150	21-09-2017, 10:38 AM Last Post: jaseela123
	Introduction to Vibration Energy Harvesting pdf	study tips	1	1,168	13-09-2017, 02:53 PM Last Post: jaseela123
	Basic Experiments in PID Control for Non-electrical Engineers pdf	project girl	1	4,984	13-09-2017, 01:42 PM Last Post: jaseela123
	Pulse Width Modulated (PWM) Controller for 12 Volt Motors pdf	seminar projects maker	1	1,296	13-09-2017, 10:51 AM Last Post: jaseela123
	Application of High Power Thyristors in HVDC and FACTS Systems pdf	project girl	1	1,086	09-09-2017, 01:03 PM Last Post: jaseela123
	Clippers And Clampers pdf	project girl	1	1,016	09-09-2017, 11:16 AM Last Post: jaseela123
	POWER SYSTEM PROTECTION pdf	study tips	1	1,464	09-09-2017, 09:08 AM Last Post: jaseela123
	LINEAR WAVESHAPING pdf	study tips	1	903	09-09-2017, 09:06 AM Last Post: jaseela123
	Single Phase Half Controlled Bridge Converter pdf	study tips	1	958	30-08-2017, 03:28 PM Last Post: jaseela123
	Power Kites for Wind Energy Generation pdf	study tips	5	1,775	21-03-2017, 12:36 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.