31-05-2012, 11:59 AM
Codesigned Virtual Machines
Codesigned Virtual Machines.ppt (Size: 2.05 MB / Downloads: 135)
Applying Codesigned VMs
Advantages(performance, power efficiency, flexibility) can be achieved,
At the macro level: entirely new ISAs
VLIW: Transmeta Crusoe, IBM Daisy/BOA
OO source ISA: IBM AS/400
At the micro level
The implementation of specific performance enhancement
Instructions reordering, …
Introduction
In Jan. of 2000, Transmeta Corp. introduced the Crusoe processors.
Remarkably low power consumption
As might not be expected, The new technology is fundamentally software-based.
The power savings come from replacing large numbers of transistors with software.
The Crusoe Processor
Consists of a hardware engine logically surrounded by a software layer.
H/W: The engine
is a VLIW CPU capable of executing up to four operations in each clock cycle.
No resemblance to the x86 instruction set.
S/W: Code Morphing Software(CMS)
Dynamically “morphs” x86 instructions into VLIW instructions
CMS technology changes the entire approach to designing microprocessors.
Demonstrate practical microprocessors can be implemented as HW-SW hybrids.
Expanded the design space
Development teams may enlist software experts, working in parallel with hardware engineers to bring products to market faster.
Technology Perspective
Decoupled the x86 ISA from the underlying processor hardware.
Each new CPU design only requires a new version of the Code Morphing software to translate x86 instructions to the new CPU’s native instruction set.
Because the CMS would typically reside in standard Flash ROMs on the motherboard, improved versions can even be downloaded into processor in the field.
Crusoe Processor Fundamentals
VLIW engine
Two integer units, a floating point unit, a memory(store/load) unit, a branch unit
Molecule: a long(64 or 128bits) instruction word contain up to four RISC-like instructions, called atom.
All atoms within a molecule are executed in parallel, and the molecule format directly determines how atoms get routed to functional units.
This greatly simplifies the decode and dispatch hardware.
The integer register file
Has 64 registers, %r0 through %r63
CMS allocates some registers to hold x86 state while others contain state internal to the system, or can be used as temporary registers.
To keep the processor running at full speed, molecules are packed as fully as possible with atoms.
Conventional superscalar…
This type of processor hardware is much more complex than the Crusoe processor’s simple VLIW engine.
Code Morphing Software
CMS
Is fundamentally a dynamic translation system
In this case, x86 ISA -> VLIW ISA
“x86 ISA” is the only thing x86 code sees.
The only program written directly for the VLIW engine is the Code Morphing Software itself.
CMS: Drawing the HW-SW line
Choosing which functions to implement in HW and which in SW is a major engineering challenge
Involving issues such as cost and complexity, overall performance and power consumption
For example, The HW-SW line might be drawn differently for a high-end server processor.
CMS: Decoding and Scheduling
Code Morphing can translate an entire group of x86 instructions at once,
Whereas a superscalar x86 translates single instructions in isolation.
The Code Morphing approach can amortize the cost of translation over many executions.
Allowing it to use much more sophisticated translation and scheduling algorithm.
CMS: Caching
The translation cache resides in a separate memory space that is inaccessible to x86 code.
As an application executes,
Code Morphing “learns” more about the program and improves it so will execute faster and faster.
Some benchmarks do not accurately predict the performance of Crusoe processor!!
CMS: Filtering
The translation system needs to
Choose carefully how much effort to spend on translating and optimizing a given piece of x86 code.
A wide choice of execution modes
Interpretation only(no translation)
Simple-mined code generation
Highly-optimized code generation
CMS: Prediction and Path Selection
CMS can gather feedback
Instrumentation profiling
The translator adds code to collect info.
This data can be used later to decide when and what to optimize and translate.
For example, if a given branch is highly biased,…
HW Support for Code Morphing
All registers holding x86 state are shadowed. (working/shadow copy)
Normal atoms only update the working copy of the register.
“commit” operation: working -> shadow regs.
“rollback” operation: shadow -> working regs.
Undoing changes to memory
Holding store data in a “gated store buffer”
Commit / rollback
Alias Hardware
When the translator moves a load operation ahead of a store operation,
it converts the load into a load-and-protect and the store into a store-under-alias-mask.
Always safe to reorder memory ld/stores.