26-11-2012, 06:12 PM
Code Size Reduction Technique and Implementation for Software-Pipelined DSP Applications
Code Size Reduction.pdf (Size: 231.37 KB / Downloads: 29)
Abstract
Software pipelining technique is extensively used to explore the instruction level parallelism in
loops. However, this performance optimization technique results in code size expansion. For embedded
systems with very limited on-chip memory resources, the code size becomes one of the most
important optimization concerns. This paper presents the fundamental understanding of the relationship
between code size expansion and software pipelining based on retiming. We propose a general
Code-size REDuction technique (CRED) for software-pipelined loops on various kinds of processors.
Our CRED algorithms integrate the code size reduction procedure with software pipelining to produce
minimal code size for a target schedule length. The experiments on a set of well-known benchmarks
show the effectiveness of CRED technique on both reducing the code size of software-pipelined loops
and exploring the code size/performance trade-off space.
Introduction
Software pipelining is extensively used to exploit instruction level parallelism in loops [2–4,8,9,11,12,17–
20]. Although this performance optimization technique helps to achieve a compact schedule, it expands the
total code size by introducing prologue and epilogue sections, i.e, the codes executed before entering and
after leaving the new loop body. Furthermore, the size of prologue and epilogue grows proportionally as
more iterations of the loop get overlapped in the pipeline [20]. For embedded processors with very limited
on-chip memory resources, the code size expansion becomes a major concern. Consequently, making
trade-off between code size and performance for software-pipelined applications becomes an important
task for compilers targeting at embedded systems [1, 13, 22, 23].
A simple for loop and its code after applying software pipelining are shown in Figure 1(a) and Figure
1(b). The loop schedule length is reduced from four control steps to one control step for softwarepipelined
loop. However, the code size of software-pipelined loop is three times larger than the original
code size. Figure 2(a) and Figure 2(b) show the execution records of the original loop and the softwarepipelined
loop respectively.
Basic Principles
In this section, we provide an overview of the basic principles related to our code size reduction technique.
These include data flow graph, retiming, software pipelining and rotation scheduling. We demonstrate that
retiming and software pipelining are essentially the same concept. First of all, we briefly introduce the
data flow graph.
Rotation Scheduling
Rotation scheduling is a flexible technique for scheduling cyclic DFGs with resource constraints [2]. It
produces a compact schedule iteratively. In each rotation phase, it implicitly applies retiming operations
on a set of nodes, then these nodes are rescheduled to obtain a software-pipelined schedule. The effect of
this retiming on the static schedule is that the nodes are moved to a different iteration.
Figure 6(a) to Figure 8(b) illustrate the rotation scheduling progress on the program in Figure 1(b). In
the first rotation phase, node A is rotated and rescheduled as shown in Figure 6(b) and Figure 6©. The
effect on the schedule is the same as pushing the first copy of node A into prologue and the last copy of
the other nodes into epilogue. Figure 7(a) to Figure 8(b) show the second and the last rotation phases. The
resulted schedule is optimal. The 4-cycle loop is transformed into a 1-cycle loop. The pipeline depth is
four. The italic letters in the schedule show how the second copy of the original loop body are pipelined
with other copies in a new iteration. In the process of rotation scheduling, the state of rotation can be
recorded by retiming functions.
Code Size Reduction Algorithms
In this section, we present CRED algorithms. These algorithms can be used to meet various code size
reduction requests for removing prologue and epilogue totally, partially, or only removing iterations in
either prologue or epilogue. Our CRED algorithms are integrated with rotation scheduling to become a
code size-aware software pipelining procedure. The advantage of integrating code size reduction with software
pipelining is to achieve the required code size with the least possible schedule length increment. The
algorithms are illustrated for modified TMS320 processor. The CRED algorithms on the other processor
classes can be easily implemented according to our discussion in Section 3.
Total Code Size Reduction Algorithm
Algorithm 5.1 is used to totally remove the prologue and epilogue, assuming there are sufficient conditional
registers. The code size reduction is performed during rotation scheduling. Rotation scheduling generates
a software-pipelined schedule iteratively. Each rotation phase consists of four steps. The first step does
normal rotation scheduling. It tries to find a more compact schedule by rotating the first row of the initial
schedule and rescheduling the rotated nodes. The second step assigns one conditional register to guard
the nodes with
! , i.e., the nodes not being retimed. The third step detects the distinct retiming
values produced by rotation scheduling, and assigns one conditional register for each retming value. The
loop counter and the number of consumed conditional registers are updated at the same time. Note that the
inserted decrement instructions can also be rotated and rescheduled in the rotation scheduling procedure.
Thus, our algorithm can produce the minimal code size with the least schedule length increment. For
VLIW architecture, these inserted instructions can be inserted into an empty slot of the instruction word
without increasing the schedule length in most cases.