27-09-2016, 09:36 AM
1456243623-mereport.doc (Size: 467 KB / Downloads: 4)
1 Introduction of VLSI
VLSI stands for "Very Large Scale Integration". This is the field which involves packing more and more logic devices into smaller and smaller areas. VLSI, circuits that would have taken board furls of space can now be put into a small space few millimeters across! VLSI circuits are everywhere your computer, your car, your brand new state-of-the-art digital camera, the cell-phones, and what have you. All this involves a lot of expertise on many fronts within the same field, which we will look at in later sections. The way normal blocks like latches and gates are implemented is different from what students have seen so far, but the behavior remains the same. All the miniaturization involves new things to consider. A lot of thought has to go into actual implementations as well as design.
Large complicated circuits running at very high frequencies have one big problem to tackle - the problem of delays in propagation of signals through gates and wire even for areas a few micrometers across! The operation speed is so large that as the delays add up, they can actually become comparable to the clock speeds.
Another effect of high operation frequencies is increased consumption of power. This has two-fold effect - devices consume batteries faster, and heat dissipation increases. Coupled with the fact that surface areas have decreased, heat poses a major threat to the Stability of the circuit itself.
Laying out the circuit components is task common to all branches of electronics. What’s so special in our case is that there are many possible ways to do this; there can be multiple layers of different materials on the same silicon, there can be different arrangements of the smaller parts for the same component and soon. The choice between the two is determined by the way we chose the layout the circuit components. Layout can also affect the fabrication of VLSI chips, making it either easy or difficult to implement the components on the silicon.
1.2 Introduction to VHDL
A digital system can be described at different levels of abstraction and from different points of view. An HDL should faithfully and accurately model and describe a circuit, whether already built or under development, from either the structural or behavioral views, at the desired level of abstraction. Because HDLs are modeled after hardware, their semantics and use are very different from those of traditional programming languages.
1.3 Limitations of traditional programming languages
There are wide varieties of computer programming languages, from Frontend to C to Java. Unfortunately, they are not adequate to model digital hardware. To understand their limitations, it is beneficial to examine the development of a language. A programming language is characterized by its syntax and semantics. The syntax comprises the grammatical rules used to write a program, and the semantics is the “meaning” associated with language constructs. When a new computer language is developed, the designers first study the characteristics of the underlying processes and then develop syntactic constructs and their associated semantics to model and express these characteristics.
Most traditional general-purpose programming languages, such as C, are modeled after a sequential process. In this process, operations are performed in sequential order, one operation at a time. Since an operation frequently depends on the result of an earlier operation, the order of execution cannot be altered at will. The sequential process model has two major benefits. At the abstract level, it helps the human thinking process to develop an algorithm step by step. At the implementation level, the sequential process resembles the operation of a basic computer model and thus allows efficient translation from an algorithm to machine instructions.
The characteristics of digital hardware, on the other hand, are very different from those of the sequential model. A typical digital system is normally built by smaller parts, with customized wiring that connects the input and output ports of these parts. When signal changes, the parts connected to the signal are activated and a set of new operations is initiated accordingly. These operations are performed concurrently, and each operation will take a specific amount of time, which represents the propagation delay of a particular part, to complete. After completion, each part updates the value of the corresponding output port. If the value is changed, the output signal will in turn activate all the connected parts and initiate another round of operations. This description shows several unique characteristics of digital systems, including the connections of parts, concurrent operations, and the concept of propagation delay and timing. The sequential model used in traditional programming languages cannot capture the characteristics of digital hardware, and there is a need for special languages (HDLs) that are designed to model digital hardware.
VHDL includes facilities for describing logical structure and function of digital systems at a number of levels of abstraction, from system level down to the gate level. It is intended, among other things, as a modeling language for specification and simulation. We can also use it for hardware synthesis if we restrict ourselves to a subset that can be automatically translated into hardware.
VHDL arose out of the United States government’s Very High Speed Integrated Circuits (VHSIC) program. In the course of this program, it became clear that there was a need for a standard language for describing the structure and function of integrated circuits (ICs). Hence the VHSIC Hardware Description Language (VHDL) was developed. It was subsequently developed further under the auspices of the Institute of Electrical and Electronic Engineers (IEEE) and adopted in the form of the IEEE Standard 1076, Standard VHDL Language Reference Manual, in 1987. This first standard version of the language is often referred to as VHDL-87.
After the initial release, various extensions were developed to facilitate various design and modeling requirements. These extensions are documented in several IEEE standards:
Field-Programmable Gate Array
A field-programmable gate array (FPGA) is a semiconductor device that can be configured by the customer or designer after manufacturing—hence the name "field-programmable". To program an FPGA one must specify how they want the chip to work with a logic circuit diagram or a source code in a hardware description language (HDL). FPGAs can be used to implement any logical function that an application-specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping offers advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together"—somewhat like a one-chip programmable breadboard. Logic blocks can be configured to perform complex combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complete blocks of memory.
For any given semiconductor process, FPGAs are usually slower than their fixed ASIC counterparts. They also draw more power, and generally achieve less functionality using a given amount of circuit complexity. But their advantages include a shorter time to market, ability to re-program in the field to fix bugs, and lower non-recurring engineering costs. Vendors can also take a middle road by developing their hardware on ordinary FPGAs, but manufacture their final version so it can no longer be modified after the design has been committed.
Field Programmable Gate Array (FPGA) devices were introduced by Xilinx in the mid-1980s. They differ from CPLDs in architecture, storage technology, number of built-in features, and cost, and are aimed at the implementation of high performance, large-size circuits.
The basic architecture of an FPGA is illustrated in figure 2. It consists of a matrix of CLBs (Configurable Logic Blocks), interconnected by an array of switch matrices.
The internal architecture of a CLB is different from that of a PLD First, instead of implementing SOP expressions with AND gates followed by OR gates (like in SPLDs), its operation is normally based on a LUT (lookup table). Moreover, in an FPGA the number of flip-flops is much more abundant than in a CPLD, thus allowing the construction of more sophisticated sequential circuits. Besides JTAG support and interface to diverse logic levels, other additional features are also included in FPGA chips, like SRAM memory, clock multiplication (PLL or DLL), PCI interface, etc. Some chips also include dedicated blocks, like multipliers, DSPs, and microprocessors.
Another fundamental difference between an FPGA and a CPLD refers to the storage of the interconnects. While CPLDs are non-volatile (that is, they make use of antifuse, EEPROM, Flash, etc.), most FPGAs use SRAM, and are therefore volatile. This approach saves space and lowers the cost of the chip because FPGAs present a very large number of programmable interconnections, but requires an external ROM. There are, however, non-volatile FPGAs (with antifuse), which might be advantageous when reprogramming is not necessary.
FPGAs can be very sophisticated. Chips manufactured with state-of-the-art0.09mmCMOS technology, with nine copper layers and over 1,000 I/O pins, are currently available. A few examples of FPGA packages are illustrated in figure A6, which shows one of the smallest FPGA packages on the left (64 pins), a medium-size package in the middle (324 pins), and a large package (1,152 pins) on the right. Several companies manufacture FPGAs, like Xilinx, Actel, Altera, and Quick Logic.
[1]Content-addressable memory (CAM) circuits and architectures: A tutorial and survey K. Pagiamtzis and A. Sheikholeslami
We survey recent developments in the design of large-capacity content-addressable memory (CAM). A CAM is a memory that implements the lookup-table function in a single clock cycle using dedicated comparison circuitry. CAMs are especially popular in network routers for packet forwarding and packet classification, but they are also beneficial in variety of other applications that require high-speed table lookup. The main CAM-design challenge is to reduce power consumption associated with the large amount of parallel active circuitry, without sacrificing speed or memory density. In this paper, we review CAM-design techniques at the circuit level and at the architectural level. At the circuit level, we review low-power matchline sensing techniques and search line driving approaches. At the architectural level we review three methods for reducing power consumption.
[2]Nearly-optimal associative memories based on distributed constant weight codes V. Gripon and C. Berrou
A new family of sparse neural networks achieving nearly optimal performance has been recently introduced. In these networks, messages are stored as cliques in clustered graphs. In this paper, we interpret these networks using the formalism of error correcting codes. To achieve this, we introduce two original codes, the thrifty code and the clique code, that are both sub-families of binary constant weight codes. We also provide the networks with an enhanced retrieving rule that enables a property of answer correctness and that improves performance. Index Terms—associative memory, classification, constant weight codes, clique code, thrifty code, sparse neural networks, One can split the family of memories into two main branches. The first one contains indexed memories. In an indexed memory, data messages are stored at specific indexed. Thus, messages are not overlapping, and directly accessing a stored message requires to know its address. It is a convenient paradigm as far as data itself is not useful a priority. For example, a postman just needs to know your address to bring you mails, and does not care about the content of the mail nor the color of your front door. The second branch is that of associative memories. An associative memory is such that a previously learned message can be retrieved from part of its content. It is tricky to define how large is the “part” of the content that is necessary to retrieve the data. A reasonable definition is to consider this “part” to be close to the minimum amount of data required to unambiguously address a unique previously learned message. Contrary to indexed memories, it is likely that messages
overlap one another in associative memories.
This paradigm is convenient when trying to find data from other data. For example, a detective might be interested in remembering the name of that woman he questioned who owns a car of the same brand as that of the murderer .It is obviously possible to simulate one memory using the other if given unlimited computational power. Indeed, to obtain an associative memory, one can read all the stored messages in an indexed memory and compare them with the part of messages it is given as input. It then selects the one that matches the best the input.
[3]Architecture and implementation of an associative memory using sparse clustered networks H. Jarollahi, N. Onizawa, V. Gripon, and W. J. Gross,
Associative memories are alternatives to indexed memories that when implemented in hardware can benefit many applications such as data mining. The classical neural network based methodology is impractical to implement since in order to increase the size of the memory, the number of information bits stored per memory bit (efficiency) approaches zero. In addition, the length of a message to be stored and retrieved needs to be the same size as the number of nodes in the network causing the total number of messages the network is capable of storing (diversity) to be limited. Recently, a novel algorithm based on sparse clustered neural networks has been proposed that achieves nearly optimal efficiency and large diversity. In this paper, a proof-of-concept hardware implementation of these networks is presented. The limitations and possible future research areas are discussed.
[4]A low-power contentaddressable memory (CAM) using pipelined hierarchical search scheme. Pagiamtzis and A. Sheikholeslami
This paper presents two techniques to reduce power consumption in content-addressable memories (CAMs). The first technique is to pipeline the search operation by breaking the match-lines into several segments. Since most stored words fail to match in their first segments, the search operation is discontinued for subsequent segments, hence reducing power. The second technique is to broadcast small-swing search data on less capacitive global search-lines, and only amplify this signal to full swing on a shorter local search-line. As few match-line segments are active, few local search-lines will be enabled, again saving power. We have employed the proposed schemes in a 1024 x 144-bit ternary CAM in 1.8-V 0.18-μm CMOS, illustrating an overall power reduction of 60% compared to a non pipelined, nonhierarchical architecture. The ternary CAM achieves a 7-ns search cycle time at 2.89 fJ/bit/search
[5]Use of selective pre charge for low power on the match lines of content-addressable memories C. Zukowski and S.-Y. Wang
With current architectures, CAMs typically take more area, power, and sometimes delay compared o location addressed memories of the same capacity. Ifthese penalties are traded against each other, there will be many new applications for CAMs that are not feasible or practical today. Our work is aiming to combine various CAM design methods used in literature which only aim to improve a single aspect of the problem, with our further improvements in a way to address multiple problems simultaneously to meet the requirements of to day's new applications. In this report, we overview the most active methods found in our survey at a circuit level and an architectural level. By combining the current race scheme with pre computation and selective precharge we can achieve considerable power savings
CHAPTER 3
EXISTING AND PROPOSED SYSTEM
3.1 EXISTING SYSTEM
A content-addressable memory (CAM) is a type of memory that can be accessed using its contents rather than an explicit address. In order to access a particular entry in such memories, a search data word is compared against previously stored entries in parallel to find a match. Each stored entry is associated with a tag that is used in the comparison process. Once a search data word is applied to the input of a CAM, the matching data word is retrieved within a single clock cycle if it exists. This prominent feature makes CAM a promising candidate for applications where frequent and fast look-up operations are required, such as in translation look-aside buffers (TLBs), network routers database accelerators, image processing, parametric curve extraction, Hough transformation, Huffman coding/decoding, virus detection Lempel–Ziv compression, and image coding.
A new family of associative memories based on sparse clustered networks (SCNs) has been recently introduced, and implemented using field-programmable gate arrays (FPGAs). Such memories make it possible to store many short messages instead of few long ones as in the conventional Hopfield with significantly lower level of computational complexity. Furthermore, a significant improvement is achieved in terms of the number of information bits stored per memory bit (efficiency). In this paper, a variation of this approach and a corresponding architecture are introduced to construct a classifier that can be trained with the association between a small portion of the input tags and the corresponding addresses of the output data. The term CAM refers to binary CAM (BCAM) throughout this paper. Originally included in preliminary results were introduced for architecture with particular parameters conditioned on uniform distribution of the input patterns. In this paper, an extended version is presented that elaborates the effect of the design’s degrees of freedom, and the effect of non-uniformity of the input patterns on energy consumption and the performance.
The architecture (SCN-CAM) of this paper consists of an SCN-based classifier coupled to a CAM-array. The CAM-array is divided into several equally sized sub-blocks, which can be activated independently. For a previously trained network and given an input tag, the classifier only uses a small portion of the tag and predicts very few sub-blocks of the CAM to be activated. Once the sub-blocks are activated, the tag is compared against the few entries in them while keeping the rest deactivated and thus lowers the dynamic energy dissipation.