Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: Itanium Processor Report
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Itanium Processor

[attachment=53204]

INTRODUCTION

The Itanium processor family came about for several reasons, but the primary one was that the processor architecture advances of RISC were no longer growing at the rate seen in the 1980’s or the 1990’s.Yet,customers continued demand greater application performance. The Itanium processor family was developed as a response to address the future performance and growth needs of business, technical and scientific users with greater flexibility, better performance and a much greater ‘bang for the buck’ in the price performance arena. Itanium is the first processor to use EPIC(Explicit Parallel Instruction Computing) architecture. Its performance is to be better than the present day Reduced Instruction Set Computing and Complex Instruction Set Computing(RISC & CISC).
The Itanium architecture achieves a more difficult goal than a processor that could have been designed with ‘price as no object’. Rather, it delivers near-peerless speed at a price that is sustainable by the mainstream corporate market.

SEQUENTIAL SEMANTICS

A program is a sequence of instructions. It has an implied order of instruction execution. So there is a potential dependence from instruction to instruction. But high performance needs parallel execution which in turn needs independent instructions. So independent instructions must be rediscovered by the hardware.

LOW INSTRUCTION LEVEL PARALLELISM(ILP)

In present day programs branches are frequent. As a result code blocks are small. So parallelism is limited within the code blocks. Wider machines need more parallel instructions. So ILP across the branches need to be exploited. But when this is done some instructions can fault due to wrong prediction. In short branches are a barrier to code motion.

BRANCH UNPREDICTABILITY

Branch predictions are not perfect. When wrong it leads to performance penalty. It is more if the instructions which went wrong consist of memory operations (loads & stores) or floating point operations. Also if exception on speculative operations, we need to defer it. This results in more book keeping hardware.

MEMORY DEPENDENCIES

Usually load instructions are at the top of a chain of instructions. ILP requires moving these loads. Store instructions are also a barrier. Dynamic disambiguation has its limitations For it requires additional hardware and it adds to the code size if done in software.

MEMORY LATENCY

Though the speed of A.L.U, decoders and other execution units have increased with time, the advances in technologies related to memories is not in pace with it. So even if the decoding and further execution of the instruction is fast ,the memory fetch which is needed prior to it takes time and leads to reduced pace of program execution. The cache hierarchy which reduces the memory latency has its limitations. It is managed asynchronously by hardware and helps only if there is locality. Also it consumes precious silicon area.

RESOURCE CONSTRAINTS

Small register space creates false dependencies. Shared resources like conditional flags and conditional registers force dependencies on independent instructions. Floating point resources are limited and not flexible.

PROCEDURE CALL & LOOP PIPELINING OVERHEAD

As modular programming is increasingly used the programs tend to be call intensive. Register space is shared by caller and calle. Call/return requires register save/restore.
Though loops are common sources of good ILP Unrolling/Pipelining is needed to exploit this ILP. Prologue/Epilogue causes code expansion. So the applicability of these techniques is limited.

RSE(REGISTER STACK ENGINE)

GR stack reduces need for save/restore across call. Also Itanium has a procedure stack frame of programmable size(0 to 96 registers).This mechanism is implemented by renaming registers. RSE automatically saves\restores registers without software intervention. It provides the illusion of infinite physical registers. RSE may be designed to utilize unused memory bandwidth to perform register spill and fill operations in the background.

IA-64 System Environment Overview

this environment. The operating system and the user level software can execute both IA-32 application level code and IA-64 instructions. The entire machine state, including all IA-64 resources,IA-32 general registers and the floating –point registers, segment selectors and descriptors is accessible to IA-64 code. As shown in figure below all major IA-32 operating modes are fully supported. The IA-64 environment is designed to support the execution of IA-64 operating systems running IA-32 or IA-64 applications.IA-32 applications can interact with IA-64 operating systems ,applications and libraries within In the IA-64 system environment ,IA-64 defined operating system resources supersede all IA-32 system resources. Specifically the IA-32 defined set of control, test, debug, machine check registers, privilege
instructions, and virtual paging algorithms are replaced by the IA-64 system resources. When IA-32 code is running on an IA-64 operating
system, the processor directly executes all performance critical but non-sensitive IA-32 application level instructions. Accesses to sensitive system resources (interrupt flags, control registers, TLBs, etc)are intercepted into the IA-64 operating system. Using this set of intervention hooks, an IA-64 operating system can emulate or visualize an IA-32 system resource, or device driver.