07-07-2012, 04:13 PM
An Embedded DRAM for CMOS ASICs
An Embedded DRAM.pdf (Size: 121.38 KB / Downloads: 55)
Abstract
The growing gap between on-chip gates and off-chip I/O bandwidth argues for ever larger
amounts of on-chip memory. Emerging portable consumer technology, such as digital
cameras, will also require more memory than can easily be supported on logic-oriented
ASIC processes. Most ASIC memory systems are P-load SRAM, but this circuit technology is
neither dense nor power efficient. This paper describes development of a DRAM,
compatible with a standard CMOS ASIC process, that provides memory density at least 4x
improved over P-load SRAM in the same layout rules. It runs at speeds comparable to
logic in the same process and uses circuitry that is reasonably simple and portable. The
design employs Vdd-precharge bit lines, half-capacitance, full-voltage dummy cells, and a
simple complementary sense amp. DRAM is organized as a number of small pages,
allowing simple circuit design and low-power operation at modest expense in area
overhead.
Introduction
Digital system designers, faced with a large and rapidly growing gap between available
chip I/O bandwidth and demand for bandwidth, attempt to find clever system partitioning
schemes to reduce demand. In a design environment in which millions of devices can be
economically integrated onto a chip, but only a few hundred I/O pins can be supported,
efficient system partitioning requires ever more on-chip storage.
ASIC designs now commonly require large amounts of on-chip memory, usually
implemented as static random-access memory (SRAM). SRAM with PFET loads can be
supported on any CMOS process but requires about 1,000 l2 of area per bit. Some vendors
offer special processes, adapted from commercial SRAM manufacture, that include
polysilicon resistor loads; resistor-load SRAM cells can be made as small as a few hundred
l2. Dynamic memory (DRAM) circuits, while potentially much more compact than SRAMs.
Application
The embedded DRAM serves as a register file for an array of 256 8-bit processors in a
graphics ‘enhanced memory chip’ (EMC) [3]; each processor ‘owns’ 384 bytes of
memory. The memory layout is bit-sliced, so the organization is 2,048 bits (columns) by
384 words (rows). In the word dimension, the memory is composed of ‘pages’, each a
block of 32 words. Each page is a self-contained memory system with bit cells arrayed
along a pair of bit lines, a local sense amp and precharger, and an interface to a (differential)
data bus that delivers data between the (12) pages and the processor. The column dimension
has no decoder; on every cycle of chip operation, data is read from or written to 2,048 bits
of memory. A block diagram of a bit slice through the DRAM is shown in Figure 1; the
column dimension is horizontal in the figure.
Data bus circuitry
More than half of the design time was spent investigating various alternatives for the data
bus and its interfaces. Before describing some of these alternatives, we motivate the
discussion by outlining the main design goal: reducing bus power to an acceptable level. In
this application, 2,048 bus pairs carry data between processors and memory, and data on the
busses can change on each cycle. These metal-3 busses are about 2.2 mm long and pass
over dense arrays of metal-2 wires running at right angles; capacitance is about 750fF. Bus
power for full-swing signals would be 2048 x 750fF x (3.6V)2 x 100MHz = 2 watts. The
only practical way to reduce power was to reduce bus voltage swing. 1 watt was the power
budget, so bus voltage swing had to be less than Vdd/2.
Summary, previous work, and acknowledgements
This paper has described a DRAM compatible with a standard CMOS ASIC process. It
provides memory density at least 4x improved over P-load SRAM in the same layout rules,
runs at speeds comparable to logic in the same process, and uses circuitry that is reasonably
simple and straightforward. The design employs Vdd-precharge bit lines, half-capacitance,
full-voltage dummy cells, and a simple complementary sense amp. DRAM is organized as a
number of small pages, allowing simple circuit design and low-power operation at modest
expense in area overhead. The paper also described a power-conserving low-voltage-swing
bus design that connects multiple pages to a full-voltage-swing interface. This work shows
that it is not only possible to integrate fairly dense DRAM with logic on an ASIC process, but
that it is relatively straightforward to do so, as postulated in [7].