19-12-2012, 02:11 PM
Asynchronous vs. Synchronous Design Techniques for NoC
1Asynchronous.doc (Size: 442 KB / Downloads: 28)
Abstract - Network on Chip
Network-on-a-chip (NoC) is a new approach to System-on-a-chip (SoC) design. NoC-based systems can accommodate multiple asynchronous clocking that many of today's complex SoC designs use. The NoC solution brings a networking method to on-chip communication and brings notable improvements over conventional bus systems.
Network-on-Chip (NoC) is an emerging paradigm for communications within large VLSI systems implemented on a single silicon chip. In a NoC system, modules such as processor cores, memories and specialized IP blocks exchange data using a network as a "public transportation" sub-system for the information traffic. A NoC is constructed from multiple point-to-point data links interconnected by switches (routers), such that messages can be relayed from any source module to any destination module over several links, by making routing decisions at the switches.
Today design challenges push the engineers to bring interconnect subsystems to a performances level day by day higher. Network on Chip architectures popularity comes directly from the growing interest around System-on-Chip and Multi-Processor-System-on-Chip technologies. In a SoC oriented approach the designer integrates in the same chip different Intellectual Property cores, with different functionalities (ALUs, peripherals controllers, RAM blocks).
The design philosophy oriented to the concept of Multi-Processor-System-on-Chip is even more up-to-date, and surely pushes engineers to study and to improve the interconnect technologies available today.
Introduction and Motivation
To understand the need for NoC’s, it is useful to consider the challenges posed by SoC and MPSoC. Technological enhancements in the field of silicon technology (as explained in the International Roadmap for Semiconductors, in ten years it will be possible, for digital systems designers, to realize multi-billion transistors chips with clock frequencies about 10 GHz) open the road for huge performance improvements. Unfortunately this technological improvement to be fully exploited requires a rethought of current VLSI design flow.
One possible solution to these problems requires a stronger adoption of hardware reuse methodologies that lead the way to the creation of flexible platforms, while minimizing the design effort for building new systems. Tomorrow complex on-chip systems will be almost always created assembling a great number of IP modules, developed in house as well as by third party partners.
Furthermore, the growth of the number of elements that need to be interconnected is starting to increase the negative side effects of classical “shared bus” architectures.
Level of Explanation
This work is intended for reader, having basic knowledge in the Network on Chip architectures, Asynchronous and Synchronous design techniques. The work doesn't require previous knowledge of past architectures, but understanding of fundamentals is essential.
In spite of the fact that the work includes the basic explanation of some terminology and aspects of Network on Chip architectures, Asynchronous and Synchronous design techniques, it is strongly recommended to review additional publication in this subject. The best choices for additional reading about those subjects could be found in Reference chapter of this work.
The rest of this work is organized as follow. The third section is attended to synchronous on-chip networks architectures. The fourth section is attended to asynchronous on-chip networks architectures. The fifth section is attended to the comparison between the sync and async architectures, discussion and future work. The conclusion will summarize this work.
Single Cycle Speculative Router
Single cycle router made possible by use of speculation. Compare to the other design, clock period is almost unchanged (Approx. 30 FO4 - simple standard-cell design). Presence of clock simplifies design - arbitration (fast combinational matrix arbiters, can easily be extended to handle priority traffic etc.), speculation (aided by the clear notion of a clock “cycle”, simple abort logic).
Data-Driven Local Clock
Another idea is data-driven local clock. If data appear at any input, sample all inputs. Then there is a need to determine which inputs are to be admitted on next clock cycle (requires MUTEX). The next step is to ensure data that is not admitted is ‘locked out’ for next clock cycle. After all MUTEXes have made a decision (and never faster than the delay line) a clock pulse should be generated. This architecture is similar to stoppable GALS interface and asynchronous priority arbiters. There are still some small timing constraints and performance tweaks are possible. The possible extensions are to force synchronization on subset of inputs (some inputs must be present for clock to be generated), generate additional clock pulses to handle pipelining (counter & clock driven lock signal), select a different clock period (delay line) depending on which inputs have been granted (data-dependent clock period).
Synchronous Routers - Summary
The synchronous routers can design high-performance single cycle routers, the design is simplified by presence of global synchrony. The distribution of global clock can be eased by new clock generation/distribution techniques or by source synchronous communication. The network operating frequency relax global synchrony further, when data-driven clocking determines most appropriate router clock frequency automatically.
Conclusions
During the comparing process of two architectures, we saw the benefits in both approaches, for example, asynchronous architectures consume less power, but on the other hand require more area than synchronous architectures. The synchronous architecture proved more throughout than the asynchronous, but incurs a longer data delay than the competitor architecture.
Today, high cost associated with both global synchrony and delay-insensitive circuits hence can relax constraints in both directions. But which techniques achieve the best cost / benefit mix for on-chip networks? The Data-driven clocks in GALS network look promising.