10-08-2012, 03:16 PM
CLOCKLESS CHIPS
CLOCKLESS CHIPS.doc (Size: 546.5 KB / Downloads: 35)
INTRODUCTION
Over the years, the designers of microprocessors have resorted to all sorts of tricks to make their products run faster. Modern chips, for example, queue up several instructions in a “pipeline” and analyze them to see if switching the order in which they are executed can produce the correct result, only more quickly.
After a point, cranking up the clock speed becomes an exercise in diminishing returns. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip. The clock, through the work it must do to coordinate millions of transistors on a chip, generates its own overhead. The faster the clock, the greater the overhead becomes. The clock in a state-of-the-art microprocessor can consume up to 30 percent of the chip's computing capability, with that percentage increasing at an ever faster rate as clock speeds increase.
Faced with diminishing returns, however, chip designers are dusting down two technologies—called multi-threading and asynchronous logic—that were both invented decades ago. At the time, neither was competitive with conventional designs, but important uses have since emerged for each of them. Multi-threading can increase the performance of database- and web-servers, while asynchronous logic is ideal for wireless devices and smart cards.
CLOCK CONCEPT
The clock is a tiny crystal oscillator that resides in the heart of every microprocessor chip. The clock is what which sets the basic rhythm used throughout the machine. The clock orchestrates the synchronous dance of electrons that course through the hundreds of millions of wires and transistors of a modern computer.
Such crystals which tick up to 2 billion times each second in the fastest of today’s desktop personal computers, dictate the timing of every circuit in every one of the chips that add, subtract, divide, multiply and move the ones and zeros that are the basic stuff of the information age.
PROBLEMS WITH SYNCHRONOUS APPROACH
Synchronous circuits are digital circuits in which parts are synchronized by clock signals. In an ideal synchronous circuit, every change in the logical levels of its storage components is simultaneous. These transitions follow the level change of a special signal called the clock signal. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behavior of the whole circuit can be predicted exactly. Practically, some delay is required for each logical operation, resulting in a maximum speed at which each synchronous system can run.
LOW SPEED
A traditional CPU cannot "go faster" than the expected worst-case performance of the slowest stage/instruction/component. When an asynchronous CPU completes an operation more quickly than anticipated, the next stage can immediately begin processing the results, rather than waiting for synchronization with a central clock. An operation might finish faster than normal because of attributes of the data being processed (e.g., multiplication can be very fast when multiplying by 0 or 1, even when running code produced by a brain-dead compiler), or because of the presence of a higher voltage or bus
speed setting, or a lower ambient temperature, than ’normal’ or expected.
CLOCKLESS CHIPS IMPLEMENTATION
In order to achieve asynchronous as final goal one must implement the electronic circuits without using central clock and hence make the system free from tied components obeying clock. One tricky technique is to use clockless chips in the circuit design. Since these chips are not working with central clock and guarantee to free different components from being tied up together. Now as components can run on their own different performance and speed hence asynchronous is established.
THROWING AWAY GLOBAL CLOCK
There is no way one can success to implement asynchronous in circuits if there is global clock that is managing the whole system timing signals. Since the clock is installed only to enable the synchronization of components, by throwing away the global clock it is possible now for components to be completely not synchronized and the communication between them is only by handshaking mechanism.
STANDARDIZE OF COMPONENTS
In synchronous system all the components are closed up together as to be managed by central clock. Synchronous ness can be split up if these components are not bound together and hence standardizing these components is one of the alternatives. Here all the components are going to be standard in a given range of working performance and speed. There is average speed in which the design of system is dedicated to compile and the worst case execution will be avoided.
WORKING PRINCIPLE
HOW CLOCKLESS CHIPS WORK
There are no purely asynchronous chips yet. Instead, today’s clockless processors are actually clocked processors with asynchronous elements. Clockless elements use perfect clock gating, in which circuits operate only when they have work to do, not whenever a clock ticks. Instead of clock-based synchronization, local handshaking controls the passing of data between logic modules. The asynchronous processor places the location of the stored data it wants to read onto the address bus and issues a request for the information. The memory reads the address off the bus, finds the information, and places it on the data bus. The memory then acknowledges that it has read the data. Finally, the processor grabs the information from the data bus.
According to Jorgenson, “Data arrives at any rate and leaves at any rate. When the arrival rate exceeds the departure rate, the circuit stalls the input until the output catches up.”
The many handshakes themselves require more power than a clock’s operations. However, clockless systems more than offset this because, unlike synchronous chips, each circuit uses power only when it performs work.
BUCKET BRIGADE
To describe how asynchronous systems work, we often use the metaphor of the bucket brigade. A clocked system is like a bucket brigade in which each person must pass and receive buckets according to the tick tock rhythm of the clock. When the clock ticks, each person pushes a bucket forward to the next person down the line. When the clock tocks, each person grasps the bucket pushed forward by the preceding person.
The rhythm of this brigade cannot go faster than the time it takes the slowest person to move the heaviest bucket. Even if most of the buckets are light, everyone in the line must wait for the clock to tick before passing the next bucket.
CONCLUSION
Clocks are getting faster, while chips are getting bigger, both of which make clock distribution harder. Chips are also becoming more heterogeneous, with functions like memory and network interfaces being considered, all of which complicates the global timing analysis necessary for a synchronous design. Finally, we are entering an age when processors will be just about everywhere, and this will require very low power designs. It’s just not practical to expect a clean, skew-free clock for every (say) piece of clothing with a processing element.
But this can only happen if more focus, especially at the university level, is given to asynchronous design. Most of today’s designers don’t understand it well enough to use it, and may even regard it with suspicion. It is certainly a challenge, but just as the software community is moving towards more concurrency, the hardware community must move to incorporate asynchronous logic.
CLOCKLESS CHIPS.doc (Size: 546.5 KB / Downloads: 35)
INTRODUCTION
Over the years, the designers of microprocessors have resorted to all sorts of tricks to make their products run faster. Modern chips, for example, queue up several instructions in a “pipeline” and analyze them to see if switching the order in which they are executed can produce the correct result, only more quickly.
After a point, cranking up the clock speed becomes an exercise in diminishing returns. That's why a one-gigahertz chip doesn't run twice as fast as a 500-megahertz chip. The clock, through the work it must do to coordinate millions of transistors on a chip, generates its own overhead. The faster the clock, the greater the overhead becomes. The clock in a state-of-the-art microprocessor can consume up to 30 percent of the chip's computing capability, with that percentage increasing at an ever faster rate as clock speeds increase.
Faced with diminishing returns, however, chip designers are dusting down two technologies—called multi-threading and asynchronous logic—that were both invented decades ago. At the time, neither was competitive with conventional designs, but important uses have since emerged for each of them. Multi-threading can increase the performance of database- and web-servers, while asynchronous logic is ideal for wireless devices and smart cards.
CLOCK CONCEPT
The clock is a tiny crystal oscillator that resides in the heart of every microprocessor chip. The clock is what which sets the basic rhythm used throughout the machine. The clock orchestrates the synchronous dance of electrons that course through the hundreds of millions of wires and transistors of a modern computer.
Such crystals which tick up to 2 billion times each second in the fastest of today’s desktop personal computers, dictate the timing of every circuit in every one of the chips that add, subtract, divide, multiply and move the ones and zeros that are the basic stuff of the information age.
PROBLEMS WITH SYNCHRONOUS APPROACH
Synchronous circuits are digital circuits in which parts are synchronized by clock signals. In an ideal synchronous circuit, every change in the logical levels of its storage components is simultaneous. These transitions follow the level change of a special signal called the clock signal. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behavior of the whole circuit can be predicted exactly. Practically, some delay is required for each logical operation, resulting in a maximum speed at which each synchronous system can run.
LOW SPEED
A traditional CPU cannot "go faster" than the expected worst-case performance of the slowest stage/instruction/component. When an asynchronous CPU completes an operation more quickly than anticipated, the next stage can immediately begin processing the results, rather than waiting for synchronization with a central clock. An operation might finish faster than normal because of attributes of the data being processed (e.g., multiplication can be very fast when multiplying by 0 or 1, even when running code produced by a brain-dead compiler), or because of the presence of a higher voltage or bus
speed setting, or a lower ambient temperature, than ’normal’ or expected.
CLOCKLESS CHIPS IMPLEMENTATION
In order to achieve asynchronous as final goal one must implement the electronic circuits without using central clock and hence make the system free from tied components obeying clock. One tricky technique is to use clockless chips in the circuit design. Since these chips are not working with central clock and guarantee to free different components from being tied up together. Now as components can run on their own different performance and speed hence asynchronous is established.
THROWING AWAY GLOBAL CLOCK
There is no way one can success to implement asynchronous in circuits if there is global clock that is managing the whole system timing signals. Since the clock is installed only to enable the synchronization of components, by throwing away the global clock it is possible now for components to be completely not synchronized and the communication between them is only by handshaking mechanism.
STANDARDIZE OF COMPONENTS
In synchronous system all the components are closed up together as to be managed by central clock. Synchronous ness can be split up if these components are not bound together and hence standardizing these components is one of the alternatives. Here all the components are going to be standard in a given range of working performance and speed. There is average speed in which the design of system is dedicated to compile and the worst case execution will be avoided.
WORKING PRINCIPLE
HOW CLOCKLESS CHIPS WORK
There are no purely asynchronous chips yet. Instead, today’s clockless processors are actually clocked processors with asynchronous elements. Clockless elements use perfect clock gating, in which circuits operate only when they have work to do, not whenever a clock ticks. Instead of clock-based synchronization, local handshaking controls the passing of data between logic modules. The asynchronous processor places the location of the stored data it wants to read onto the address bus and issues a request for the information. The memory reads the address off the bus, finds the information, and places it on the data bus. The memory then acknowledges that it has read the data. Finally, the processor grabs the information from the data bus.
According to Jorgenson, “Data arrives at any rate and leaves at any rate. When the arrival rate exceeds the departure rate, the circuit stalls the input until the output catches up.”
The many handshakes themselves require more power than a clock’s operations. However, clockless systems more than offset this because, unlike synchronous chips, each circuit uses power only when it performs work.
BUCKET BRIGADE
To describe how asynchronous systems work, we often use the metaphor of the bucket brigade. A clocked system is like a bucket brigade in which each person must pass and receive buckets according to the tick tock rhythm of the clock. When the clock ticks, each person pushes a bucket forward to the next person down the line. When the clock tocks, each person grasps the bucket pushed forward by the preceding person.
The rhythm of this brigade cannot go faster than the time it takes the slowest person to move the heaviest bucket. Even if most of the buckets are light, everyone in the line must wait for the clock to tick before passing the next bucket.
CONCLUSION
Clocks are getting faster, while chips are getting bigger, both of which make clock distribution harder. Chips are also becoming more heterogeneous, with functions like memory and network interfaces being considered, all of which complicates the global timing analysis necessary for a synchronous design. Finally, we are entering an age when processors will be just about everywhere, and this will require very low power designs. It’s just not practical to expect a clean, skew-free clock for every (say) piece of clothing with a processing element.
But this can only happen if more focus, especially at the university level, is given to asynchronous design. Most of today’s designers don’t understand it well enough to use it, and may even regard it with suspicion. It is certainly a challenge, but just as the software community is moving towards more concurrency, the hardware community must move to incorporate asynchronous logic.