06-03-2013, 12:32 PM
Seminar Report on Energy Efficiency of Intelligent Ram
Energy Efficiency.pdf (Size: 157.59 KB / Downloads: 26)
Abstract
Portable systems demand energy eciency in order to maximize battery life. IRAM
architectures, which combine DRAM and a processor on the same chip in a DRAM process,
are more energy ecient than conventional systems, The high density of DRAM permits
a much larger amount of memory on-chip than a traditional SRAM cache design in a
logic process. This allows most or all IRAM memory accesses to be satised on-chip.
Thus there is much less need to drive high-capacitance o-chip buses, which contribute
signicantly to the energy consumption of a system. To quantify this advantage we apply
models of energy consumption in DRAM and SRAM memories to results from cache
simulations of applications re
ective of personal productivity tasks on low power systems.
We nd that IRAM memory hierarchies consume as little as 22% of the energy consumed
by a conventional memory hierarchy for memory intensive applications, while delivering
comparable performance. Furthermore, the energy consumed by a system consisting of
an IRAM memory hierarchy combined with an energy ecient CPU core is as little as
40% of that of the same CPU core with a traditional memory hierarchy.
Introduction
Energy ecient computing is a eld growing fast with it's importance. Sales of laptop and
notebook computers have been steadily climbing. Due to the popularity of mobile phones,
mobile computing devices are also growing in importance. Applications for portable computing
are likely to continue to grow in the near future. Especially in areas such as PDA's (personal
digital assistants), smart phones, GPS (global positioning system) receivers, and other
" anywhere-anytime " consumer computing devices . The increasing prevalence of portable
computing has promoted energy eciency from a concern primarily of circuit designers to an
issue of general interest to the computer architecture community. Hence this area has received
signicant research attention.
Due to recent advances in
at-panel displays and disk power management, the share of
the energy in portable systems consumed by the processor and external memory is growing .
Within processors, often a large percentage of energy is devoted to on-chip memory . This is
proved by recent researches. Our goal is to reduce the energy consumed by the memory system.
Which will signicantly reduce the total energy consumed. Integrating a microprocessor and
DRAM memory on the same die, an idea that we call Intelligent RAM , oers the potential
for dramatic improvements in the energy consumption of the memory system. DRAM is much
denser than SRAM, which is traditionally used for on-chip memory. So most of the memory
access will be satised by the IRAM itself. Therefore, an IRAM will have much fewer external
memory accesses, which consume a great deal of energy to drive high-capacitance o-chip buses.
Even on-chip accesses will be more energy ecient, since on-chip DRAM consumes less energy
than either SRAM or o-chip DRAM.
Intelligent RAM
Introduction to IRAM
Given the growing processor-memory performance gap and the awkwardness of high capacity
DRAM chips, we believe that it is time to consider unifying logic and DRAM. We call such a
chip an "IRAM", standing for Intelligent RAM, since most of transistors on this merged chip
will be devoted to memory. The reason to put the processor in DRAM rather than increasing the
on-processor SRAM is that DRAM is in practice approximately 20 times denser than SRAM.
Thus, IRAM enables a much larger amount of on-chip memory than is possible in a conventional
architecture. Although others have examined this issue in the past, IRAM is attractive today
for several reasons. First, the gap between the performance of processors and DRAMs has been
widening at 50% per year for 10 years, so that despite heroic eorts by architects, compiler
writers, and applications developers, many more applications are limited by memory speed
today than in the past. Second, since the actual processor occupies only about one third of the
die, the upcoming gigabit DRAM has enough capacity that whole programs and data sets can
t on a single chip. In the past, so little memory could t on-chip with the CPU that IRAMs
were mainly considered as building blocks for multiprocessors. Third, DRAM dies have grown
about 50% each generation; DRAMs are being made with more metal layers to accelerate the
longer lines of these larger chips. Also, the high speed interface of synchronous DRAM will
require fast transistors on the DRAM chip.
Potential Advantages of IRAM
Higher Bandwidth
A DRAM naturally has extraordinary internal bandwidth, essentially fetching the square root
of its capacity each DRAM clock cycle; an on-chip processor can tap that bandwidth. The
potential bandwidth of the gigabit DRAM is even greater than indicated by its logical organization.
Since it is important to keep the storage cell small, the normal solution is to limit
the length of the bit lines, typically with 256 to 512 bits per sense amp. This quadruples the
number of sense ampliers. To save die area, each block has a small number of I/O lines, which
reduces the internal bandwidth by a factor of about 5 to 10 but still meets the external demand.
One IRAM goal is to capture a larger fraction of the potential on-chip bandwidth.
Lower Latency
To reduce latency, the wire length should be kept as short as possible. This suggests the fewer
bits per block the better. In addition, the DRAM cells furthest away from the processor will
be slower than the closest ones. Rather than restricting the access timing to accommodate the
worst case, the processor could be designed to be aware when it is accessing "slow" or "fast"
memory. Some additional reduction in latency can be obtained simply by not multiplexing the
address as there is no reason to do so on an IRAM. Also, being on the same chip with the
DRAM, the processor avoids driving the o-chip wires, potentially turning around the data
bus, and accessing an external memory controller. In summary, the access latency of an IRAM
processor does not need to be limited by the same constraints as a standard DRAM part. Much
lower latency may be obtained by intelligent
oor-planning, utilizing faster circuit topologies,
and redesigning the address/data bussing schemes.
Memory Size and Width
Another advantage of IRAM over conventional designs is the ability to adjust both the size
and width of the on-chip DRAM. Rather than being limited by powers of 2 in length or width,
as is conventional DRAM, IRAM designers can specify exactly the number of words and their
width. This
exibility can improve the cost of IRAM solutions versus memories made from
conventional DRAMs.
Energy
As anyone who has ever used a portable computer can attest, the overriding concern for a user
of such a system is battery life. Battery life is measured in units of energy, not power. This is a
misunderstood concept. Energy is the product of power and time, i.e., Energy = Power . Time.
For a given amount of work, what matters most to the user is how much energy is required
to do that work. Wastage of energy is avoided. Thus energy eciency, measured in energy
consumed per instruction, or MIPS per Watt, is a better metric than power for measuring how
a given machine best utilizes limited battery life. Energy per instruction and MIPS per Watt
are inversely proportional to each other.
Power can be a deceiving metric, since it does not directly relate to battery life. For instance,
if the clock rate of a processor is cut in half, then the processor will consume approximately
half the amount of power, assuming that the voltage is held constant, since Power = Frequency
. Capacitance . Voltage. However, if all else is kept equal, it will now take approximately
twice as long to execute a sequence of instructions. Thus the energy consumed by the processor
for some given amount of work will be roughly the same. Even worse, since the task will take
longer, the energy consumed by the display and other components of the system will be greater.
At the extreme, slowing the frequency down to zero and putting the processor in sleep
mode wins if processor power consumption is the only metric, yet this allows no work to be
accomplished.
Performance
Besides energy eciency, users are also concerned about performance. Users of portables are
willing to tolerate lower performance than a desktop system has, but performance still matters.
Portable systems are being used increasingly for more demanding applications that require
higher performance. Examples include preparing large, graphical presentations, audio and
video playback, handwriting recognition, and speech recognition. The goal is to be able to
deliver high performance while still maintaining energy eciency.
Energy Components of Current Systems
Given that portable devices, and hence energy ecient systems, are becoming more prevalent,
it is useful to examine where the energy is consumed in such a device. The power used to
be dominated by the screen, over time the CPU and memory are becoming an increasingly
signicant portion of the power budget. In smaller handheld portable devices there is no disk
and the screen consumes much less power. Hence, for these systems the power consumption of
the CPU and memory is an even larger fraction of the total.
The introduction of low power CPU cores places an even greater emphasis on the energy
consumed by the memory system, both on-chip and o-chip. Considering only the on-chip
costs, several studies show a roughly equal division of power or energy (dierent studies used
dierent metrics) between CPU and memory.