18-07-2012, 10:20 AM
A Dynamic Memory Management Unit for Embedded Real-Time System-on-a-Chip
A Dynamic Memory Management.pdf (Size: 559.1 KB / Downloads: 63)
ABSTRACT
Dealing with global on-chip memory allocation/de-allocation in a
dynamic yet deterministic way is an important issue for upcoming
billion transistor multiprocessor System-on-a-Chip (SoC) designs.
To achieve this, we propose a new memory management hierarchy
called Two-Level Memory Management. To implement this
memory management scheme – which presents a paradigm shift in
the way designers look at on-chip dynamic memory allocation – we
present a System-on-a-Chip Dynamic Memory Management Unit
(SoCDMMU) for allocation of the global on-chip memory, which
we refer to as level two memory management (level one is the
operating system management of memory allocated to a particular
on-chip processor). In this way, heterogeneous processors in an SoC
can request and be granted portions of the global memory in twenty
clock cycles in the worst case for a four-processor SoC, which is at
least an order of magnitude faster than software-based memory
management. We present a sample implementation of the
SoCDMMU and compare hardware and software implementations.
INTRODUCTION
In the next five years it will be possible to fabricate integrated
circuits with close to one billion transistors on a single chip [11].
Such chips will no longer be individual components of a system but
“silicon boards.” A typical System-on-a-Chip (SoC), as shown in
Figure 1, will consist of multiple Processing Elements (PE's) of
various types (i.e., general purpose processors, domain-specific
CPU's such as DSP's, and custom hardware), large memory, analog
components and digital interfaces [1][5]. Architecture such as this
will be suitable for embedded real-time applications. Such
applications – especially multimedia – require great processing
power and large volume data management [6][12].
THE PROGRAMMING MODEL
We propose a programming model and memory management
scheme, which we call Two-Level Memory Management. Two-Level
Memory Management assumes that the SoCDMMU handles the
allocation of the global on-chip memory while each PE handles the
local dynamic memory allocation among the threads/processes
running on the PE, for example with a Real-Time Operating System
(RTOS) (a hardware accelerator can be used to accelerate the
memory allocation/de-allocation at the threads/processes level [4]).
The SoCDMMU manages distribution of memory between the PE's.
On the other hand, each PE manages the usage of memory by the
processes that run on that PE. So, typically, if a process requests a
memory allocation it will request it from the RTOS. If the PE has
currently allocated enough extra global memory to satisfy the
request, the RTOS will simply allocate the memory right away;
otherwise, the RTOS will request more memory from the
SoCDMMU. Thus, there are two levels: the process/thread level
managed by the RTOS (local allocation), and the PE level (global
allocation) managed by the SoCDMMU.
EXPERIMENTS AND RESULTS
To test the effectiveness of our approach, we simulated a model for
an SoC that utilizes an SoCDMMU using the Synopsys VCSTM
verilog simulator. The simulated system looks like the system
illustrated earlier in Figure 1. The system has four PE's (2 RISC
processors – e.g., MIPS – and 2 DSPs – e.g., Motorola DSP56k –),
16MB SRAM and the SoCDMMU.
The memory is divided into 256 blocks; each block is 64KB. The
SoCDMMU utilizes a 256-bit Allocation Vector and an Allocation
Table with 256 entries. We conducted a number of experiments to
test the quality of our approach. Figure 15 shows a screenshot of the
simulation of the PE-SoCDMMU interface, where four PE's are
connected to the SoCDMMU. The last four signals in the timing
diagram show the commands that are explained earlier in Example 4
being issued by the different PE’s.
Comparison with a Micro-controller Implementation
To demonstrate the importance of building the SoCDMMU in
custom hardware, we compared the SoCDMMU performance with
the performance of software running on RISC micro-controller
(Microchip PICTM micro-controller). This software performs the
same function as the SoCDMMU. Table 7 compares the worst-case
execution time of the hardware SoCDMMU with the best-case
execution time for the micro-controller implementation of the
SoCDMMU in software. The comparison is shown in clock cycles.
We assume that the hardware SoCDMMU and the micro-controller
both have the same clock rate (e.g., 400MHz).
CONCLUSION
In this paper, we described an approach to handle on-chip memory
allocation between PE's in an SoC. Our approach is based on a
hardware SoCDMMU that allows a dynamic, fast way to
allocate/de-allocate the on-chip memory. Moreover, the
SoCDMMU allocation/de-allocation of the memory blocks is
completely deterministic, which makes it suitable for real-time
applications. Thus, this approach fits in the gap between generalpurpose
fully shared memory multiprocessor SoCs and application
specific SoC designs with custom memory configurations.
Currently, different types of RTOS's are being modified to extend
their memory management schemes to support the hardware
SoCDMMU. We also plan to extend the SoCDMMU to support
G_alloc_rw of the same block by multiple PE’s, thus providing fully
shared memory blocks. Finally, for future work we plan to carry out
a study comparing our multiprocessor SoC to a SoCDMMU with
fully shared memory multiprocessor SoC like Hydra [5].