01-04-2011, 12:34 PM
PRESENTED BY:
Sasmita Kumari Misra
Itanium Processor.ppt (Size: 574 KB / Downloads: 129)
Itanium Processor
The Itanium processor family came about for several reasons, but the primary one was that the processor architecture advances of RISC were no longer growing at the rate seen in the 1980’s or the 1990’s.Yet,customers continued demand greater application performance.
The Itanium processor family was developed as a response to address the future performance and growth needs of business, technical and scientific users with greater flexibility, better performance and a much greater ‘bang for the buck’ in the price performance arena.
The Itanium architecture achieves a more difficult goal than a processor that could have been designed with ‘price as no object’.
Overview
History
32 bit Processors (Pentium Pro, Pentium Xeon)
64 bit Processors (Xeon, Itanium, Itanium 2)
ISA
EPIC
Predicated Execution (Branch Prediction)
Software Pipelining
Overview
ISA cont.
Register Stacking
IA-32 Emulation
Speculation
Architecture
Benchmarks
ISA Overview
Most Modern Processors:
Instruction Level Parallelism (ILP)
Processor, at runtime, decides which instructions have no dependencies
Hardware branch prediction
Itanium’s ISA
IA-64 – Intel’s (first) 64-bit ISA
Not an extension to x86(sucks) (Completely new ISA)
Allows for speedups without engineering “tricks”
Largely RISC
Surrounded by patents
IA-64
IA-64 largely depends on software for parallelism
VLIW – Very Long Instruction Word
EPIC – Explicitly Parallel Instruction Computer
IA-64
VLIW – Overview
RISC technique
Bundles of instructions to be run in parallel
Similar to superscaling
Uses compiler instead of branch prediction hardware
IA-64
EPIC – Overview
Builds on VLIW
Redefines instruction format
Instruction coding tells CPU how to process data
Very compiler dependent
Predicated execution
IA-64
Predicated Execution:
Decrease need for branch prediction
Increase number of speculative executions
Branch conditions put into predicate registers
Predicate registers kill results of executions from not-taken branch
IA-64
Software Pipelining:
Take advantage of programming trends and large number of available registers
Allow multiple iterations of a loop to be in flight at once
IA-64
EPIC – Pros:
Compiler has more time to spend with code
Time spent by compiler is a one-time cost
Reduces circuit complexity
IA-64
EPIC – Cons:
Runtime behavior isn’t always obvious in source code
Runtime behavior may depend on input data
Depends greatly on compiler performance
IA-64
IA-32 Support:
Done with hardware emulation
Uses special jump escape instructions to access
Slow (painfully so)
IA-64
32 Bit Hardware Emulation - Very Poor Performance
Software Emulation of x86 32-bit from either Microsoft or Linux can perform 50% better than Intel’s Hardware Emulation
Less than 1% of the chip devoted to Hardware Emulation
IA-64
IA-32 Slowness:
No out-of-order execution abilities
Functional units don’t generate flags
Multiple outstanding unaligned memory loads not supported
IA-64
IA-32 Support:
Hardware emulation augmented for Itanium 2
Software emulation (IA-32 Execution Layer) added
Runs IA-32 code at same speed as equivalently clocked Xeon
Architecture
Physical Layout
Conceptual Design Elements
Register Stack Engine (RSE)
Improve performance by removing latency associated with saving/restoring state for function calls
Hardware implementation of register stack ISA functionality
Itanium Pipeline
10 Stage
Instruction Pointer Generation
Fetch
Rotate
Expand
Rename
Word-Line Decode
Register Read
Execute
Exception Detect
Write Back
Itanium 2 Pipeline
8 stage
Instruction Pointer Generation
Rotate
Expand
Rename
Register Read
Execute
Detect
Write Back
Processor Abstraction Layer (PAL)
Internal processor firmware
External system firmware
Itanium 2 Execution Core
6 multimedia units
6 integer units
2 FPU
3 branch units
4 load / store units
Sasmita Kumari Misra
Itanium Processor.ppt (Size: 574 KB / Downloads: 129)
Itanium Processor
The Itanium processor family came about for several reasons, but the primary one was that the processor architecture advances of RISC were no longer growing at the rate seen in the 1980’s or the 1990’s.Yet,customers continued demand greater application performance.
The Itanium processor family was developed as a response to address the future performance and growth needs of business, technical and scientific users with greater flexibility, better performance and a much greater ‘bang for the buck’ in the price performance arena.
The Itanium architecture achieves a more difficult goal than a processor that could have been designed with ‘price as no object’.
Overview
History
32 bit Processors (Pentium Pro, Pentium Xeon)
64 bit Processors (Xeon, Itanium, Itanium 2)
ISA
EPIC
Predicated Execution (Branch Prediction)
Software Pipelining
Overview
ISA cont.
Register Stacking
IA-32 Emulation
Speculation
Architecture
Benchmarks
ISA Overview
Most Modern Processors:
Instruction Level Parallelism (ILP)
Processor, at runtime, decides which instructions have no dependencies
Hardware branch prediction
Itanium’s ISA
IA-64 – Intel’s (first) 64-bit ISA
Not an extension to x86(sucks) (Completely new ISA)
Allows for speedups without engineering “tricks”
Largely RISC
Surrounded by patents
IA-64
IA-64 largely depends on software for parallelism
VLIW – Very Long Instruction Word
EPIC – Explicitly Parallel Instruction Computer
IA-64
VLIW – Overview
RISC technique
Bundles of instructions to be run in parallel
Similar to superscaling
Uses compiler instead of branch prediction hardware
IA-64
EPIC – Overview
Builds on VLIW
Redefines instruction format
Instruction coding tells CPU how to process data
Very compiler dependent
Predicated execution
IA-64
Predicated Execution:
Decrease need for branch prediction
Increase number of speculative executions
Branch conditions put into predicate registers
Predicate registers kill results of executions from not-taken branch
IA-64
Software Pipelining:
Take advantage of programming trends and large number of available registers
Allow multiple iterations of a loop to be in flight at once
IA-64
EPIC – Pros:
Compiler has more time to spend with code
Time spent by compiler is a one-time cost
Reduces circuit complexity
IA-64
EPIC – Cons:
Runtime behavior isn’t always obvious in source code
Runtime behavior may depend on input data
Depends greatly on compiler performance
IA-64
IA-32 Support:
Done with hardware emulation
Uses special jump escape instructions to access
Slow (painfully so)
IA-64
32 Bit Hardware Emulation - Very Poor Performance
Software Emulation of x86 32-bit from either Microsoft or Linux can perform 50% better than Intel’s Hardware Emulation
Less than 1% of the chip devoted to Hardware Emulation
IA-64
IA-32 Slowness:
No out-of-order execution abilities
Functional units don’t generate flags
Multiple outstanding unaligned memory loads not supported
IA-64
IA-32 Support:
Hardware emulation augmented for Itanium 2
Software emulation (IA-32 Execution Layer) added
Runs IA-32 code at same speed as equivalently clocked Xeon
Architecture
Physical Layout
Conceptual Design Elements
Register Stack Engine (RSE)
Improve performance by removing latency associated with saving/restoring state for function calls
Hardware implementation of register stack ISA functionality
Itanium Pipeline
10 Stage
Instruction Pointer Generation
Fetch
Rotate
Expand
Rename
Word-Line Decode
Register Read
Execute
Exception Detect
Write Back
Itanium 2 Pipeline
8 stage
Instruction Pointer Generation
Rotate
Expand
Rename
Register Read
Execute
Detect
Write Back
Processor Abstraction Layer (PAL)
Internal processor firmware
External system firmware
Itanium 2 Execution Core
6 multimedia units
6 integer units
2 FPU
3 branch units
4 load / store units