02-05-2013, 02:11 PM
Intelligent RAM : IRAM
Intelligent RAM.ppt (Size: 396.5 KB / Downloads: 17)
Problem DescriptionProcessor-Memory Performance Gap
The development of processor and memory devices has proceeded independently. Advances in process technology, circuit design, and processor architecture have led to a near-exponential increase in processor speed and memory capacity. However, memory latencies have not improved as dramatically.
Technological trends have produced a large and growing gap between CPU and DRAM.
Von-Neumann Model
Semiconductor industry divides into microprocessor and memory camps
Separate chips, separate packages
Memory size is bigger but low power cost
CPU speed is faster and high power cost
Desktop: 1~2 CPU, 4~32 DRAMs
Server: 2~16 CPU, 32~256 DRAMs
Advantages of Von-Neumann Model
Fabrication lines can be tailored to a device
Packages are tailored to the pinout and power of a device
The number of memory chips in a computer is independent of the number of processors
Disadvantages of Von-Neumann Model
Performance gap: CPU (60% each year) vs. DRAM (7% each year)
Memory Gap Penalty: larger caches (60% on-chip area, 90% transistors)
Caches are purely performance enhancement mechanisms…. Correctness does not depend on them
No. of DRAM chips shrinking for PC config
In future it maybe a single DRAM chip
The required min. memory size, means application and OS memory use, grows only 50~75% of rate of DRAM capacity.
Problem Description Off-chip Memory Bandwidth Limitation
Pin bandwidth will be a critical consideration for future microprocessors. Many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements.Reduction of memory latency overhead aggravates bandwidth requirement for two reasons:
First, many of the techniques that reduce latency-related stalls increase the total traffic between main memory and the processor.
Second, the reduction of memory latency overhead increases the processor bandwidth – the rate at which the processor consumes and produces operands – by reducing total execution time.
Limitation of Present Solutions
Huge cache:
Slow and works well only if the working set fits cache and there is some kind of locality
Prefetching
Hardware prefetching
Cannot be tailored for each application
Behavior based on past and present execution-time behavior
Software prefetching
Ensure overheads of prefetching do not outweigh the benefits
Hard to insert prefetches for irregular access patterns
SMT
Enhance the utilization and throughput at thread level
Intelligent RAM
Unifying processing and memory into a single chip
Using DRAM rather than SRAM
DRAM is 25~50 times denser (3D structure)
Thus on chip memory much larger
Reason of IRAM:
Memory speed limits application
Processor uses 1/3 of the die, rest place big enough for Gbit DRAM to hold whole programs
More metal layers and faster transistors in DRAM in today’s technology, make DRAM as fast and dense as conventional logic process
IRAM--Summary
IRAM: Merging a microprocessor and DRAM on the same chip
Performance:
reduce latency by 5~10
Increase bandwidth by 50~100
Energy efficiency
Save at 2~4
Cost
Remove off-chip memory and reduce board area
IRAM is limited by amount of memory on Chip
Potential of network computer
Change the nature of semiconductor industry