11-12-2012, 02:09 PM
Very-Long Instruction Word (VLIW) Computer Architecture
Very-Long Instruction.pdf (Size: 231.06 KB / Downloads: 205)
ABSTRACT
VLIW architectures are distinct from traditional RISC and CISC architectures
implemented in current mass-market microprocessors. It is important to
distinguish instruction-set architecture—the processor programming
model—from implementation—the physical chip and its characteristics.
VLIW microprocessors and superscalar implementations of traditional
instruction sets share some characteristics—multiple execution units and the
ability to execute multiple operations simultaneously. The techniques used
to achieve high performance, however, are very different because the
parallelism is explicit in VLIW instructions but must be discovered by
hardware at run time by superscalar processors.
VLIW implementations are simpler for very high performance. Just as RISC
architectures permit simpler, cheaper high-performance implementations
than do CISCs, VLIW architectures are simpler and cheaper than RISCs
because of further hardware simplifications. VLIW architectures, however,
require more compiler support.
INTRODUCTION AND MOTIVATION
Currently, in the mid 1990s, IC fabrication technology is advanced enough to allow unprecedented
implementations of computer architectures on a single chip. Also, the current rate of process advancement
allows implementations to be improved at a rate that is satisfying for most of the markets these
implementations serve. In particular, the vendors of general-purpose microprocessors are competing for
sockets in desktop personal computers (including workstations) by pushing the envelopes of clock rate (raw
operating speed) and parallel execution.
The market for desktop microprocessors is proving to be extremely dynamic. In particular, the x86 market
has surprised many observers by attaining performance levels and price/performance levels that many
thought were out of reach. The reason for the pessimism about the x86 was its architecture (instruction
set). Indeed, with the advent of RISC architectures, the x86 is now recognized as a deficient instruction set.
Instruction set compatibility is at the heart of the desktop microprocessor market. Because the application
programs that end users purchase are delivered in binary (directly executable by the microprocessor) form,
the end users’ desire to protect their software investments creates tremendous instruction-set inertia.
There is a different market, though, that is much less affected by instruction-set inertia. This market is
typically called the embedded market, and it is characterized by products containing factory-installed
software that runs on a microprocessor whose instruction set is not readily evident to the end user.
Although the vendor of the product containing the embedded microprocessor has an investment in the
embedded software, just like end users with their applications, there is considerably more freedom to
migrate embedded software to a new microprocessor with a different instruction set. To overcome this
lower level of instruction-set inertia, all it takes is a sufficiently better set of implementation characteristics,
particularly absolute performance and/or price-performance.
WHY VLIW?
The key to higher performance in microprocessors for a broad range of applications is the ability to exploit
fine-grain, instruction-level parallelism. Some methods for exploiting fine-grain parallelism include:
+ pipelining
+ multiple processors
+ superscalar implementation
+ specifying multiple independent operations per instruction
Pipelining is now universally implemented in high-performance processors. Little more can be gained by
improving the implementation of a single pipeline.
Using multiple processors improves performance for only a restricted set of applications.
Superscalar implementations can improve performance for all types of applications. Superscalar (super:
beyond; scalar: one dimensional) means the ability to fetch, issue to execution units, and complete more
than one instruction at a time. Superscalar implementations are required when architectural compatibility
must be preserved, and they will be used for entrenched architectures with legacy software, such as the x86
architecture that dominates the desktop computer market.
Specifying multiple operations per instruction creates a very-long instruction word architecture or VLIW. A
VLIW implementation has capabilities very similar to those of a superscalar processor—issuing and
completing more than one operation at a time—with one important exception: the VLIW hardware is not
responsible for discovering opportunities to execute multiple operations concurrently. For the VLIW
implementation, the long instruction word already encodes the concurrent operations. This explicit
encoding leads to dramatically reduced hardware complexity compared to a high-degree superscalar
implementation of a RISC or CISC.
ARCHITECTURE VS. IMPLEMENTATION
The word architecture in the context of computer science is often misused. Used accurately, architecture
refers to the instruction set and resources available to someone who writes programs. The architecture is
what is described in a definition document, often called a user’s manual. Thus, architecture contains
instruction formats, instruction semantics (operation definitions), registers, memory addressing modes,
characteristics of the address space (linear, segmented, special address regions), and anything else a
programmer would need to know.
An implementation is the hardware design that realizes the operations specified by the architecture. The
implementation determines the characteristics of a microprocessor that are most often measured: price,
performance, power consumption, heat dissipation, numbers of pins, operating frequency, and so on.
Architecture and implementation are separate, but they do interact. As many researchers into computer
architecture discovered between the mid 1970s and 1980s, architecture can have a dramatic effect on the
quality of an implementation. In the mid 1980s, IC process technology could fabricate a microcoded
implementation of a CISC instruction set and a tiny cache or MMU.
SOFTWARE INSTEAD OF HARDWARE: IMPLEMENTATION ADVANTAGES OF VLIW
A VLIW implementation achieves the same effect as a superscalar RISC or CISC implementation, but the
VLIW design does so without the two most complex parts of a high-performance superscalar design.
Because VLIW instructions explicitly specify several independent operations—that is, they explicitly, specify
parallelism—it is not necessary to have decoding and dispatching hardware that tries to reconstruct
parallelism from a serial instruction stream. Instead of having hardware attempt to discover parallelism,
VLIW processors rely on the compiler that generates the VLIW code to explicitly specify parallelism. Relying
on the compiler has advantages.
First, the compiler has the ability to look at much larger windows of instructions than the hardware. For a
superscalar processor, a larger hardware window implies a larger amount of logic and therefore chip area.
At some point, there simply is not enough of either, and window size is constrained. Worse, even before a
simple limit on the amount of hardware is reached, complexity may adversely affect the speed of the logic,
thus the window size is constrained to avoid reducing the clock speed of the chip. Software windows can
be arbitrarily large. Thus, looking for parallelism in a software window is likely to yield better results.
Second, the compiler has knowledge of the source code of the program. Source code typically contains
important information about program behavior that can be used to help express maximum parallelism at
the instruction-set level. A powerful technique called trace-driven compilation can be employed to
dramatically improve the quality of code output by the compiler. Trace-drive compilation first produces a
suboptimal, but correct, VLIW program. The program has embedded routines that take note of program
behavior. The recorded program behavior—which branches are taken, how often