31-05-2012, 02:00 PM
FlowMap: An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup-Table Based FPGA Designs
FlowMap An Optimal Technology Mapping Algorithm for Delay Optimization in Lookup.....pdf (Size: 1.21 MB / Downloads: 35)
Abstract The field programmable gate-array (FPGA) has
become an important technology in VLSI ASIC designs. In the
past a few years, a number of heuristic algorithms have been
proposed for technology mapping in lookup-table (LUT) based
FPGA designs, but none of them guarantees optimal solutions for
general Boolean networks and Little is known about how far their
solutions are away h m th e optimal ones. This paper presents a
theoretical breakthrough which shows that the LUT-based FPGA
technology mapping problem for depth minimization can be
solved optimally in polynomial time. A key step in our algorithm
is to compute a minimum height K-feasible cut in a network,
which is solved optimally in polynomial time based on network
flow computation. Our algorithm also effectively minimizes the
number of LUT’s by maximizing the volume of each cut and
by several post-processing operations. Based on these results, we
have implemented an LUT-based FPGA mapping package called
FlowMap. We have tested FlowMap on a large set of benchmark
examples and compared it with other LUT-based FPGA mapping
algorithms for delay optimization, including Chortle-d, MIS-pgadelay,
and DAG-Map. FlowMap reduces the LUT network depth
by up to 7% and reduces the number of LUT’s by up to 50%
compared to the three previous methods.
I. INTRODUCTION
TH E SHORT DESIGN cycle and low manufacturing cost have made FPGA an important technology for VLSI
ASIC designs. The LUT-based FPGA architecture is a popular
architecture used by several FPGA manufacturers, including
Xilinx and AT&T [15], [28]. In an LUT-based FFGA chip,
the basic programmable logic block is a K-input lookup table
(K-LUT) which can implement any Boolean function of up
to K variables. The technology mapping problem in LUTbased
FPGA designs is to cover a general Boolean network
(obtained by technology independent synthesis) using KLUT’s
to obtain a functionally equivalent K-LUT network.
This paper studies the LUT-based FFGA technology mapping
problem for delay optimization.
The previous LUT-based FPGA mapping algorithms can
be roughly divided into three classes. The algorithms in the
first class emphasize on minimizing the number of LUT’s
in the mapping solutions. This class includes MIS-pga and
its enhancement, MIS-pga-new, by Murgai et al. based on
Manuscript received September 28, 1992; revised April 30, 1993. This
research was supported in part by the NSF under Grant MIP-9110511. Xilinx
Inc. and the State of California MICRO F’rogram under Grant 92-030. This
paper was recommended by Associate Editor L. Trevillyan.
The authors are with the Department of Computer Science, University of
California, Los Angeles, CA 90024.
IEEE Log Number 9212334.
several synthesis techniques [20], [22], Chortle and Chortlecrf
by Francis et al. based on tree decomposition and bin
packing techniques [l 13, [14], Xmap by Karplus based on
the if-then-else DAG representation [17], the algorithm by
Woo based on the notion of edge visibility [27], and the
work by Sawkar and Thomas based on the clique partitioning
approach [24]. The algorithms in the second class emphasize
on minimizing the delay of the mapping solutions. This class
includes MIS-pga-delay by Murgai et al. which combines
the technology mapping with layout synthesis [21], Chortled
by Francis et al. which minimizes the depth increase at
each bin packing step [12], and DAG-Map by Cong et al.
[3], [7] based on Lawler’s labeling algorithm. The mapping
algorithms in the third class, including that proposed by Bhat
and Hill [l], and that by Schlag, Kong, and Chan [26], have
the objective of maximizing the routability of the mapping
solutions. Although many existing mapping methods showed
encouraging results, these methods are heuristic in nature, and
it is difficult to determine how far the mapping solutions
of these algorithms are away from the optimal solution.’ It
has been of both theoretical and practical interest to CAD
researchers to develop optimal FPGA mapping algorithms for
general Boolean networks.
This paper presents a theoretical breakthrough which shows
that the LUT-based FPGA technology mapping problem for
depth minimization can be solved optimally in polynomial
time for general K-bounded Boolean networks. A key step
in our algorithm is to compute a minimum height K-feasible
cut in a network, which is solved optimally in polynomial time
based on efficient network flow computation. Our algorithm
also effectively minimizes the number of LUT’s by maximizing
the volume of each cut and by several post-processing
operations. Based on these results, we have implemented an
JT-based FPGA mapping package named FlowMap. We
have tested FlowMap on a set of benchmark examples and
compared it with other LUT-based FPGA mapping algorithms
for delay optimization, including Chortle-d, MIS-pga-delay,
and DAG-Map. FlowMap reduces the LUT network depth by
up to 7% and reduces the number of LUT’s by up to 50%
compared to the three previous methods.
’ Some previous algorithms achieve optimal mapping for restricted problem
domains: Chortle is optimal when the input network is a tree, Chortle-crf and
Chortle-d are optimal when the input network is a tree and h’ 5 6, and DAGMap
is optimal when the mapping constraint is monotone, which is true for
trees.
0278-0070/94$04.00 0 1994 IEEE
2 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 1, JANUARY 1994
Fig. 1. Mapping a Boolean network to a K-LUT network (K = 3).
Our result makes a sharp contrast with the fact that the
conventional technology mapping problem in library-based
designs is “-hard for general Boolean networks [9], [18].
Due to the inherent difficulty, most conventional technology
mapping algorithms decompose the input network into a
forest of trees and then map each tree optimally [9], [18].
Such a methodology was also used in some existing FPGA
mapping algorithms [ll], [12], [14]. However, the result in
this paper shows that K-LUT mapping can be carried out
directly on general K-bounded Boolean networks to achieve
depth-optimal solutions.
The remainder of this paper is organized as follows. Section
I1 gives a precise problem formulation and some preliminaries.
Section I11 presents our depth-optimal technology mapping
algorithm for LUT-based FPGA designs. Section IV describes
several enhancements of our algorithm for area minimization.
Experimental results and comparative study are presented in
Section V. Extensions and conclusions are presented in Section
VI.
11. PROBLEM FORMULATIOANN D PRELIMINARIES
A Boolean network can be represented as a directed acyclic
graph (DAG) where each node represents a logic gate? and a
directed edge (z,j) exists if the qutput of gate i is an input
of gate j . A primary input (PI) node has no incoming edge
and a primary output (PO) node has no outgoing edge. We use
input(v) to denote the set of nodes which are fanins of gate
U. Given a subgraph H of the Boolean network, input(H)
denotes the set of distinct nodes outside H which supply
inputs to the gates in H. For a node v in the network, a Kfeasible
cone at v, denoted C,, is a subgraph consisting of U
and its predecessors3 such that linput(C,) I 5 K and any path
connecting a node in C, and v lies entirely in C,. The level
of a node v is the length of the longest path from any PI node
to U. The level of a PI node is zero. The depth of a network
is the largest node level in the network. A Boolean network
is K-bounded if linput(v)I 5 K for each node U.
We assume that each programmable logic block in an
FPGA is a K-input one-input lookup-table (K-LUT) that
can implement any K-input Boolean function. Thus, each
’In the rest of the paper gates and nodes are used interchangeably for
Boolean networks.
3~ is a predecessor of 2) if there is a directed path from U to v.
K-LUT can implement any K-feasible cone of a Boolean
network. The technology mapping problem for K-LUT based
FPGA’s is to cover a given K-bounded Boolean network
with K-feasible cones, or equivalently, K-LUT’s4. shows an
example of mapping a Boolean network into a 3-LUT network.
Note that we allow these cones to overlap, which means that
the nodes in the overlapped region can be duplicated when
generating K-LUT’s. In fact, our algorithm is capable of
duplicating nodes automatically when necessary, in order to
achieve depth optimization. A technology mapping solution
S is a DAG in which each node is a K-feasible cone
(equivalently, a K-LUT) and the edge (C,,C,)e xists if U
is in input(C,). Our main objective is to compute a mapping
solution that results in the minimum delay.
The delay of an FPGA circuit is determined by two factors:
the delay in K-LUT’s and the delay in the interconnection
paths. Each K-LUT contributes a constant delay (the access
time of a K-LUT) independent of the function it implements.
Since layout information is not available at this stage, we
assume that each edge in the mapping solution contributes
a constant delay. In this case, the delay is determined by the
depth of the mapping solution, which is known as the unit
delay model. We say that a mapping solution is optimal if its
depth is minimum. The primary objective of our algorithm
is to find an optimal mapping solution in terms of depth
minimization, and the secondary objective is to reduce the
number of K-LUT’s used in the technology mapping solution.
Several concepts about cuts in a network will be used in
our algorithm. Given a network N = (V(N)E, (N) )w ith a
source 3 and a sink t, a cut ( X , x ) is a partition of the nodes
in V(N) such that s E X and t E x. The node cut-size of
( X , x ) , denoted n(X,X), is the number of nodes in X that
are adjacent to some node in x, i.e.,
n(X,X) = I{. : (z,y) E E(N),x E X and y E x}l
A cut ( X , X ) is K-feasible if n(X,X) 5 K. Assume that
each edge (u,u) has a non-negative capacity c(u,v). The
edge cut-size of ( X , Y ) , denoted e ( X , X ) , is the sum of the
41f the input network is not K-bounded, it may not be covered with KLUT’s
directly. In this case, nodes in the network with more than K fanins
may have to be decomposed before covering. However, we consider such a
decomposition step as part of the synthesis procedure.
CONG AND DING: FLOWMAP AN OPTIMAL TECHNOLOGY MAPPING ALGORITHM
Fig. 2. A 3-feasible cut of edge cut-size 10, volume 9, and height 2.
capacities of the forward edges that cross the cut, i.e.,
Throughout this paper, we assume that the capacity of each
edge is one unless specified otherwise. The volume of a cut
( X ,y ),d enoted woZ(X,X),i s the number of nodes in x,i. e.,
woZ(X, x) = 1x1. Moreover, assume that there is a given label
E(w) associated with each node 'U. The height of a cut ( X , x),
denoted h ( X , X ) , is defined to be the maximum label in X,
i.e.,
h ( X , X ) = max{l(x) : 5 E X }
Fig. 2 shows a cut ( X , x ) in a network with given node
labels, where n(X,X) = 3, e(Xlx) = 10, h ( X , X ) = 2,
and woZ(X,x) = 9.
111. AN OPTIMAL LUT-BASED FPGA MAPPING ALGORITHM FOR DEPTH MINIMIZATION
Our algorithm is applicable to any K-bounded Boolean
network. Given a general Boolean network as input, if it is
not K-bounded, there are a number of ways to transform
it into a K-bounded network. For example, the Roth-Karp
decomposition [23] was used in [20] to obtain a K-bounded
network. We use the algorithm DMIG presented in [3], which
is based on the Huffman coding tree construction [16], to
decompose each multiple input simple gate5 into a tree of
two-input simple gates. According to the result in [3], such
a decomposition procedure increases the network depth by
at most a small constant factor. The reason for carrying out
such a transformation is that if we think of FPGA technology
mapping as a process of packing gates into K-LUT's, then,
smaller gates will be more easily packed, and the mapping
algorithm will be able to pack more gates along critical paths to
one K-LUT, resulting smaller depth in the mapping solution.
This argument is justified by our experimental results in Table
111 shown in Section V.
In the rest of this paper, we shall assume that the input
networks are K-bounded networks. Although we transform
'We can always obtain a simple gate network by representing each complex
gate in the sum-of-products form and then replacing it with two levels of
simple gates.
Fig. 3. Constraint on the number of inputs to LUT is not monotone (I< = 3).
an input network into a network of two-input simple gates,
the optimality of our algorithm does not depend on the fact
that each node in the given Boolean network is a two-input
simple gate. The optimality of our mapping result holds as
long as the input network is a K-bounded network, in which
the gates need not to be simple.
The fundamental difficulty in the LUT-based FPGA mapping
is that the constraint on the number of inputs of a
programmable logic block is not a monotone clustering constraint.
A clustering constraint r is monotone, if knowing
that a network H satisfies r implies that any subnetwork of
H also satisfies r [19]. For example, if we assume that the
constraint for each programmable logic block is the number
of gates it may cover in the original network, it is a monotone
clustering constraint. Unfortunately, limiting the number of
distinct inputs of each programmable logic block is not a
monotone clustering constraint. For example, Fig. 3 shows
a network of three distinct inputs, which is 3-feasible. But the
subnetwork consisting of nodes t, v and 20 has four distinct
inputs, which is not 3-feasible. Clustering (or, similarly, mapping)
for a monotone clustering constraint r is much easier
because if a subnetwork H does not satisfy the constraint
r, we can conclude that H is not a part of any cluster. It
was shown that Lawler's labeling algorithm [19] can produce
a minimum depth clustering solution in polynomial time
whenever the clustering constraint is monotone. The DAGMap
algorithm developed by Cong et al. [3], [7] modified
Lawler's algorithm and applied it to the LUT-based FPGA
mapping problem. Although it achieved encouraging results
for depth minimization, it was shown that the DAG-Map
algorithm is not optimal [3].
The mapping algorithm presented in this paper successfully
overcomes the difficulty due to the nonmonotone clustering
constraint in LUT-based FPGA technology mapping. The
algorithm runs in two phases. In the first phase, it computes
a label for each node which reflects the level of the K-LUT
implementing that node in an optimal mapping solution. In the
second phase, it generates the K-LUT mapping solution based
on the node labels computed in the first phase.