Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: CRUSOE processor
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
CRUSOE processor
INTRODUCTION
Mobile computing has been the buzzword for quite a long time. Mobile computing devices like laptops, webslates & notebook PCs are becoming common nowadays. The heart of every PC whether a desktop or mobile PC is the microprocessor. Several microprocessors are available in the market for desktop PCs from companies like Intel, AMD, Cyrix etc.The mobile computing market has never had a microprocessor specifically designed for it. The microprocessors used in mobile PCs are optimized versions of the desktop PC microprocessor. Mobile computing makes very different demands on processors than desktop computing, yet up until now, mobile x86 platforms have simply made do with the same old processors originally designed for desktops. Those processors consume lots of power, and they get very hot. When you're on the go, a power-hungry processor means you have to pay a price: run out of power before you've finished, run more slowly and lose application performance, or run through the airport with pounds of extra batteries. A hot processor also needs fans to cool it; making the resulting mobile computer bigger, clunkier and noisier. A newly designed microprocessor with low power consumption will still be rejected by the market if the performance is poor. So any attempt in this regard must have a proper 'performance-power' balance to ensure commercial success. A newly designed microprocessor must be fully x86 compatible that is they should run x86 applications just like conventional x86 microprocessors since most of the presently available software’s have been designed to work on x86 platform.
Crusoe is the new microprocessor which has been designed specially for the mobile computing market. It has been designed after considering the above mentioned constraints. This microprocessor was developed by a small Silicon Valley startup company called Transmeta Corp. after five years of secret toil at an expenditure of $100 million. The concept of Crusoe is well understood from the simple sketch of the processor architecture, called 'amoeba’. In this concept, the x86-architecture is an ill-defined amoeba containing features like segmentation, ASCII arithmetic, variable-length instructions etc. The amoeba explained how a traditional microprocessor was, in their design, to be divided up into hardware and software.
Thus Crusoe was conceptualized as a hybrid microprocessor that is it has a software part and a hardware part with the software layer surrounding the hardware unit. The role of software is to act as an emulator to translate x86 binaries into native code at run time. Crusoe is a 128-bit microprocessor fabricated using the CMOS process. The chip's design is based on a technique called VLIW to ensure design simplicity and high performance. Besides this it also uses Transmeta's two patented technologies, namely, Code Morphing Software and Longrun Power Management. It is a highly integrated processor available in different versions for different market segments.
Technology Perspective
The Transmeta designers have decoupled the x86 instruction set architecture (ISA) from the underlying processor hardware, which allows this hardware to be very different from a conventional x86 implementation. For the same reason, the underlying hardware can be changed radically without affecting legacy x86 software: each new CPU design only requires a new version of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set. For the initial Transmeta products, models TM3120 and TM5400, the hardware designers opted for minimal space and power. By eliminating roughly three quarters of the logic transistors that would be required for an all-hardware design of similar performance, the designers have likewise reduced power requirements and die size. However, future hardware designs can emphasize different factors and accordingly use different implementation techniques. Finally, the Code Morphing software which resides in standard Flash ROMs itself offers opportunities to improve performance without altering the underlying hardware.
2. CRUSOE PROCESSOR VLIW HARDWARE
2.1 Basic principles of VLIW Architecture
VLIW stands for Very Long Instruction Word. VLIW is a method that combines multiple standard instructions into one long instruction word. This word contains instructions that can be executed at the same time on separate chips or different parts of the same chip. This provides explicit parallelism. VLIW architectures are a suitable alternative for exploiting instruction-level parallelism (ILP) in programs, that is, for executing more than one basic (primitive) instruction at a time. By using VLIW you enable the compiler, not the chip to determine which instructions can be run concurrently. This is an advantage because the compiler knows more information about the program than the chip does by the time the code gets to the chip. These processors contain multiple functional units, fetch from the instruction cache a Very-Long Instruction Word containing several primitive instructions, and dispatch the entire VLIW for parallel execution. These capabilities are exploited by compilers which generate code that has grouped together independent primitive instructions executable in parallel. The processors have relatively simple control logic because they do not perform any dynamic scheduling or reordering of operations (as is the case in most contemporary superscalar processors). Trace scheduling is an important technique in VLIW processing. Trace scheduling is when the compiler processes the code and determines which path is used the most frequently traveled. The compiler then optimizes this path. The basic blocks that compose the path are separated from the other basic blocks. The path is then optimized and rejoined with the other basic blocks. The rejoining includes special split and rejoin blocks that help align the converted code with the original code.
Dynamic scheduling is another important method when compiling VLIW code. The process, called split-issue splits the code into two phases, phase one and phase two. This allows for multiple instructions to execute at the same time. Thus, instructions that have certain delays associated with them can be run concurrently, and out-of-order execution is possible. Hardware support is needed to implement this technique and requires delay buffers and temporary variable space in the hardware. The temporary variable space is needed to store results when they come in. The results computed in phase two are stored in temporary variables and
are loaded into the appropriate phase one register when they are needed. VLIW has been described as a natural successor to RISC, because it moves complexity from the hardware to the compiler, allowing simpler, faster processors. The objective of VLIW is to eliminate the complicated instruction scheduling and parallel dispatch that occurs in most modern microprocessors. In theory, a VLIW processor should be faster and less expensive than a comparable RISC chip.
The instruction set for a VLIW architecture tends to consist of simple instructions RISC like). The compiler must assemble many primitive operations into a single "instruction word" such that the multiple functional units are kept busy, which requires enough instruction-level parallelism (ILP) in a code sequence to fill the available operation slots. Such parallelism is uncovered by the compiler through scheduling code speculatively across basic blocks, performing software pipelining, reducing the number of operations executed, among others.
2.2 VLIW in Crusoe Microprocessor
With the Code Morphing software handling x86 compatibility, Transmeta hardware
designers created a very simple, high-performance, VLIW engine with two integer units, a floating point unit, a memory (load/store) unit, and a branch unit. A Crusoe processor long instruction word, called a molecule, can be 64 bits or 128 bits long and contain up to four RISC-like instructions, called atoms. All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units; this greatly simplifies the decode and dispatch hardware. Figure 2 shows a sample 128-bit molecule and the straightforward mapping from atom slots to functional units. Molecules are executed in order, so there is no complex out-of-order hardware. To keep the processor running at full speed, molecules are packed as fully as possible with atoms. In a later section, we describe how the Code Morphing software accomplishes this.