01-11-2016, 11:15 AM
1462850901-EfficientParallelArchitectureforLinearFeedbackShiftRegisters2.docx (Size: 4.46 MB / Downloads: 5)
ABSTRACT:
This project proposes a new parallel architecture for linear feedback shift registers (LFSRs), which can be used to accomplish high-throughput BCH or CRC encoders for storage and communication systems. While previous parallel LFSR architectures have computed values by using the past input messages and the register outputs, the proposed parallel architecture based on the transposed serial LFSR calculates the output by using only the past feedback values. As a result, the proposed architecture reduces the area-time product compared to the recent architecture.
INTRODUCTION
1.1 PREVIOUS WORK
polynomial by dividing a message polynomial by a generator polynomial, and widely employed in implementing BCH and CRC encoders. A new parallel architecture based on the unaccustomed serial LFSR is proposed in the section to effectively reduce both the hardware complexity and the critical path delay. The output of the systematic encoder, y(n), is composed of the past feedback values, f(n). Therefore, it is inefficient to store non-feedback values in the parallel architecture. Since the registers of the conventional serial LFSR are used to store incomplete outputs instead of the feedback values, the straightforward parallelization based on the conventional serial LFSR may lead to redundant elements and calculations. For instance, the parallel architecture in generates the outputs by using the past inputs and outputs stored in registers.
PROPOSED WORK:
In the proposed system, by using the proposed LFSR structure shown above figure, we are implementing the encoder design, in which the LFSR (Linear Feedback Shift Register) is very much useful. By using our proposed parallel LFSR, we can implement the data encryption and also can decrypt the encrypted data. The data encryption is very much useful in data security system, where the confidentiality matters.
1.3 Literature survey:
LINEAR feedback shift registers (LFSRs) are conventional-ly used to compute a remainder polynomial by dividing a message polynomial by a generator polynomial, and widely employed in implementing BCH and CRC encoders [1], [2]. As the technology advances, the encoding throughput required in digital communication and storage systems increases accord-ingly. Since a serial LFSR dealing with a message bit at a time is not sufficient for such a high throughput, parallel architec-tures have been adopted in many BCH and CRC encoders.
The conventional method [3] to derive a parallel LFSR from a serial LFSR is shown in Fig. 1, where a simple LFSR corresponding to a generator polynomial g(x) = 1 + x2 + x3 is exemplified. To make a 3-parallel architecture, the combi-national logic part of the serial LFSR is concatenated three times as shown in Fig. 1©. The cascaded structure makes the combinational logic network long, and incurs significant performance degradation due to the long critical path resulting from the serial concatenation. Therefore, the conventional approach is limited to low-performance applications.
To alleviate the increase of critical path delay, many meth-ods have been presented for LFSR parallelization [4]–[9]. The cascaded combinational logic part was optimized by adopting tree-structured computation [5] and by sharing sub-expressions [6]. Though the optimization is quite effective in reducing the hardware complexity, it is not sufficient for reducing the critical path delay. A partially pipelined architecture was proposed in [7] to overcome the limitation, which is to conduct a state-space transformation to the conventional parallel LFSR circuit. As a result, the calculation outside the feedback loop can be pipelined. However, this approach increases the overall hardware complexity, as additional hardware components are necessary for the transformation.
Based on the IIR filter topology, other approaches [8], [9] have been presented to derive a new architecture equivalent to the serial LFSR. As the input part is not associated with the feedback loop in the new architecture, that part can be pipelined with ease, enabling a high-speed decoder without much increasing the hardware complexity. All the previous architec-tures resort to the conventional LFSR architecture calculating the redundant intermediate values and thus have a limitation in reducing the critical path delay and the hardware cost.
To eliminate redundant calculations and registers, this brief proposes a novel parallel LFSR architecture based on the transposed LFSR. The proposed architecture stores the feedback values only, whereas the previous architectures [7]–[9] need to store both the past inputs and outputs. Compared to the previous works [7]–[9], the proposed architecture reduces the number of registers by half if the pipelining registers are not counted, and requires a less number of XOR gates, leading to a shortened critical path as well.
Although the proposed technique is explained for the encoding of cyclic codes, it can be applied to similarapplications such as LFSR-based pseudo-random number generators widely used for VLSI testing [10], [11] and stream cryptography
State-Space Transformed Architecture
To reduce the critical path delay of the conventional parallelization, a state-space transformed architecture was proposed in [7]. The general LFSR circuit that computes the remainder polynomial by dividing the message polynomial by the r-th order generator polynomial g(x) = g0 + g1x + g2x2 + + gr-1xr-1 + grxris depicted in Fig. 2, where u(n) and xr-1(n) are the input message bit and the output of the LFSR at time n, respectively. In the state-space vector representation, the serial LFSR is denoted as
and the r * 1 matrix b is
The state-space representation is depicted in Fig. 3(a). For a p-parallel system, (1) can be rewritten as
where m 0, 1, , K / p1 if the message length is K, andBpis a r p matrix specified as
The p-parallel architecture shown in Fig. 3(b) induces a large hardware overhead caused by Ap. Applying a proper linear
Fig. 3. (a) Serial LFSR architecture and (b) p-parallel LFSR architecture.
Fig. 1.4. State-space transformed p-parallel architecture.
transformation can alleviate the hardware overhead. Theexplicit output y is defined as y(mp) = Cpx(mp), where Cpis a r r identity matrix, and the linear transformation is defined as
The Bluetooth Encryption/Decryption algorithm demands Linear Feedback Shift Registers (LFSRs) in order to reduce the length of the encryption key. A typical implementation needs sixteen different LFSRs. In this paper a low power 128-bit LFSR for efficient use in portable Bluetooth telecommunication systems is proposed. The new LFSR design techniques can be also useful in any reconfigurableLFSR. Two methods to reduce the conventional LFSR switching activity are introduced. Up to 110%LFSR power consumption reduction was achieved by using the clock-gating technique and the Gray code representation. The whole LFSR design was captured by using VHDL language and for synthesis a 0.7 μm CMOS standard cell library was used[15].
A test pattern generator (TPG) for built-in self-test (BIST), which can reduce heat dissipation during test application, is proposed. The proposed TPG, called dual-speed LFSR (DS-LFSR), consists of two linear feedback shift registers (LFSRs), a slow LFSR and a normal-speed LFSR. The slow LFSR is driven by a slow clock whose speed is width that of the normal clock which drives the normal-speed LFSR, The use of DS-LFSR lowers the transition density at the circuit inputs driven by the slow LFSR, leading to a reduction in heat dissipation during test application. A procedure is presented to design a DS-LFSR so as to achieve high fault coverage by ensuring that patterns generated by it are unique and uniformly distributed. A new gain function, and a method to compute its value for each circuit input, is proposed to select inputs to be driven by the slow LFSR. Also, a procedure to increase the number of inputs driven by the slow LFSR by combining compatible inputs is presented to further decrease the heat dissipation, Finally, DS-LFSRs are designed for the ISCAS85 and ISCAS89 benchmark circuits and shown to provide 13% to 70% reduction in the numbers of transitions with no loss of fault coverage and at very slight area overheads[16].
Because of their efficient encoding and decoding algorithms, cyclic codes-an interesting class of linearcodes- are widely used in communication systems, storage devices, and consumer electronics. BCHcodes form a special class of cyclic codes, and are usually among the best cyclic codes. A subclass of good BCH codes is the narrow-sense primitive BCH codes. However, the dimension and minimum distance of these codes are not known in general. The main objective of this paper is to study the dimension and minimum distances of a subclass of the narrow-sense primitive BCH codes with design distance δ = (q - ℓ0)qm-ℓ1-1 -1 for certain pairs (ℓ0, ℓ1), where 0 ≤ ℓ0 q - 2 and 0 ≤ ℓ1 m - 1. The parameters of other related classes of BCH codes are also investigated, and some open problems are proposed in this paper[17]
According the universal serial cyclic redundancy check (CRC) technology, one of the new CRCalgorithm based on matrix is referred, which describe an new parallel CRC coding circuit structure with r matrix transformation and pipeline technology. According to the method of parallel CRC coding in high-speed data transmitting, it requires a lot of artificial calculation. Due to the large amount of calculation, it is easy to produce some calculation error. According to the traditional thought of the serial CRC, the algorithm of parallel CRC based on the thought of matrix transformation and iterative has been deduced and expressed. The improved algorithm by pipeline technology has been applied in other systems which require high timing requirements of problem, The design has been implemented through Verilog hardware description language in FPGA device, which has achieved a good validation. It has become a very good method for high-speed CRC coding and decoding
INTRODUCTION
A linear feedback shift register (LFSR) is a shift register whose input bit is a linear function of its previous state.The only linear functions of single bits are xor and inversexor; thus it is a shift register whose input bit is driven by the exclusive-or (xor) of some bits of the overall shift register value.The initial value of the LFSR is called the seed, and because the operation of the register is deterministic, the sequence of values produced by the register is completely determined by its current (or previous) state. Likewise, because the register has a finite number of possible states, it must eventually enter a repeating cycle. However, a LFSR with a well-chosen feedback function can produce a sequence of bits which appears random and which has a very long cycle.An LFSR is a shift register that, when clocked, advances the signal through the register from one bit to the next most-significant bit (see Figure 2.1). Some of the outputs are combined in exclusive-OR configuration to form a feedback mechanism. A linear feedback shift register can be formed by performing exclusive-OR on the outputs of two or more of the flip-flops together and feeding those outputs back into the input of one of the flip-flops
PSEUDORANDOM PATTERN GENERATION
Linear feedback shift registers make extremely good pseudorandom pattern generators. When the outputs of the flip-flops are loaded with a seed value (anything except all 0s, which would cause the LFSR to produce all 0 patterns) and when the LFSR is clocked, it will generate a pseudorandom pattern of 1s and 0s. Note that the only signal necessary to generate the test patterns is the clock [11].
2.3 WORKING OF LFSR
Linear Feedback Shift Registers sequence through (2n -1) states, where n is the number of registers in the LFSR. At each clock edge, the contents of the registers are shifted right by one position. There is feedback from predefined registers or taps to the left most register through an exclusive-NOR (XNOR) or an exclusive-OR (XOR) gate. A value of all "1"s is illegal in the case of a XNOR feedback. A count of all "0"s is illegal for an XOR feedback. This state is illegal because the counter would remain locked-up in this state. The LFSR shown below is implemented with XNOR feedback. A 4-bit LFSR sequences through (24 - 1) = 15 states (the state 1111 is in the lock-up/illegal state). From (Table 1) the feedback taps are 4, 3. On the other hand, a 4-bit binary up-counter would sequence through 24 = 16 states with no illegal states. LFSR counters are very fast since they use no carry signals. However, the dedicated carry in Virtex devices is rarely a speed limiting factor because it is intrinsically fast. LFSRs can replace conventional binary counters in performance critical applications where the count sequence is not important (e.g., FIFO). LFSRs are also used as pseudo-random bit stream generators. They are important building blocks in the implementation of encryption and decryption algorithms.The list of the bits positions that affect the next state is called the tap sequence. In the diagram below, the sequence is (16,14,13,11) • The outputs that influence the input are called taps. • A maximal LFSR produces an n-sequence(i.e. cycles through all possible states within the shift register), unless it contains all zeros, in which case it will never change. The sequence of numbers generated by a LFSR can be considered a binary numeral system just as valid as Gray code or the natural binary code.The tap sequence of an LFSR can be represented as a polynomial mod 2. This means that the coefficients of the polynomial must be 1's or 0's. This is called the feedback polynomial or characteristic polynomial. For example, if the taps are at the 16th, 14th, 13th and 11th bits (as below), the resulting LFSR polynomial is
x 11 + x 13 + x 14 + x 16 + 1
The 'one' in the polynomial does not correspond to a tap. The powers of the terms represent the tapped bits, counting from the left.
• If (and only if) this polynomial is a primitive, then the LFSR is maximal
• The LFSR will only be maximal if the number of taps is even.
• The tap values in a maximal LFSR will be relatively prime.
• There can be more than one maximal tap sequence for a given LFSR length
2.4 LINEAR FEEDBACK SHIFT REGISTER IN WIRELESS COMMUNICATION
Linear Feedback Shift Registers (LFSRs) are commonly used in wireless applications where pseudorandom bit streams are required. LFSRs are the functional building blocks of circuits like the pseudo-random noise (PN) code generator and Gold code generators commonly used in Code Division Multiple Access (CDMA) systems. LFSRs can be used for performance-critical binary counters used to generate sequences of random numbers. LFSRs will often satisfy this requirement, although the generated sequence is pseudorandom in nature. Pseudo-random patterns repeat over time; the longer the LFSR, however, the longer the sequence of random numbers before pattern repetition occurs. When longer sequences are desired, the physical size of the hardware is increased. Conventionally, in the older FPGA architectures, flip-flops would be used. With two flipflops in each of the older-architecture CLBs, an n-bit LFSR will take up at least n/2 CLBs. LFSRs sequence through 2N –1 states, where N is the number of flip-flops in the LFSR. At each clock edge, the contents of the flip-flops are shifted right by one position. There is a feedback path from predefined flip-flops to the leftmost flip-flop through an exclusive-NOR (XNOR) or an exclusive-OR (XOR) gate. A value of all "1’s" is illegal in the case of an XNOR feedback, and a value of all "0's" is illegal for XOR feedback. The illegal state causes the counter to remain in its present state, locking out any further new values from being registered
2.5 LFSR TERMINOLOGY
LFSRs sequence through 2N –1 states, where N is the number of registers in the LFSR. The contents of the registers are shifted right by one position at each clock cycle. The feedbacks from predefined registers or taps to the leftmost register are XORed together. LFSRs have several variables:
• The number of stages in the shift registers
• The number of taps in the feedback path
• The position of each tap in the shift registers stage
• The initial starting condition of the shift register, often referred to as the FILL state In the case of LFSRs with an XOR feedback, the FILL value must be non-zero to avoid the LFSR locking up in the next state.
2.5.1 Shift Register Length (N)
The shift register length is often referred to as the degree, and the longer the shift register, the longer the duration of the PN sequence before it repeats. For a shift register of fixed length N, the number and duration of the sequences it can generate are determined by the number and position of taps used to generate the parity feedback bit.
2.5.2 Shift Register Taps
The combination of taps and their location is often referred to as a polynomial, and expressed as
P(x) = X 7 +X 3 + 1
Various conventions are used to map the polynomial terms to register stages in the shift register implementation. In the polynomial P(x) = X 7 +X 3 + 1, the trailing "1" represents X 0 , which is the output of the last stage of the shift register. X 3 is the output of register stage 3 and X 7 the output of the XOR. A few points to note about LFSRs and the polynomial used to describe them: • The last tap of the shift register is the leading "1" and is always used in the shift register feedback path.
• The length of the shift register can be deduced from the exponent of the highest order term in the polynomial.
• The highest order term of the polynomial is the signal connecting the final XOR output to the shift register input. It does not feed back into the parity calculation along with the other taps identified in the polynomial.
2.6 LFSR IMPLEMENTATION
There are two implementation styles of LFSRs, Galois implementation and Fibonacci implementation.
2.6.1 Galois Implementation
As shown in (Figure 2.3), the data flow is from left to right and the feedback path is from right to left. The polynomial increments from left to right with X 0 term (the "1" in the polynomial) as the first term. This is referred to as a Tap polynomial, as it indicates which taps are to be fed back from the shift register. Since the XOR gate is in the shift register path, the Galois implementation is also known as an in-line or modular type (Mtype) LFSR.
Fibonacci Implementation
In (Figure 3.4), the data flow is from left to right and the feedback path is from right to left, similar to the Galois implementation. However, the Fibonacci implementation polynomial decrements from left to right with X 0 as the last term in the polynomial. This polynomial is referred to as a Reciprocal Tap polynomial and the feedback taps are incrementally annotated from right to left along the shift register. Since the XOR gate is in the feedback path, the Fibonacci implementation is also known as an out-of-line or simple type (S-type) LFSR
SHIFT REGISTER LOOK-UP TABLE (LUT) MODE FOR AREA- EFFICIENT LFSRS
With the SRL16 primitive, it is possible to implement an n-bit LFSR in a fraction of the space used by a flip-flop design. A 16-bit LFSR would take up at least eight slices using flip-flops, since there are just two flip-flops per slice. The same 16-bit LFSR can be implemented in just four slices when using the SRL16s. Virtex-II devices have a new macro, SRLC16 in addition to the SRL16/E. Two outputs of the SRLC16 can be accessed simultaneously. One output is determined by the value of the 4-bit address line (i.e., A [3] A [0]) and the other output is the cascadable output (the 16th bit of the shift register.) In− Virtex devices, only a single tap or output of the SRL16s can be accessed at a time. The SRL16 output to be accessed is determined by the value of the 4-bit address line A [3] – A [0]. The SRL16 primitive will shift data on every clock cycle. A second primitive (SRL16E) provides the same shift register functionality, but adds a shift register Clock Enable. Both the SRL16 and SRL16E implement area-efficient shift registers in one LUT. It is worth noting, however, that parallel access to multiple taps is not possible, as the primitives have only one data output pin. Thus, for every single output that must be accessed, another SRL16 must be created if that output is in the same 16-bit set as the other. Because the number of taps rarely exceeds four, creating multiple instances of the SRL16 primitive is not a concern. It should be noted that because flip-flop based LFSRs will only consume as many flip-flops as there are stages in the shift register, the size at which it becomes more area efficient to use flip-flops is less than or equal to eight. To overcome the loss of parallel access, the following two approaches are reviewed [16]. 3.4.1
2.8MULTIPLE SHIFT REGISTERS WITH PARALLEL TAP ACCESS
(Figure2.5) demonstrates a 16-stage LFSR with four selectable tap points designed with four SRL16 primitives. An additional 4-input LUT is used to implement a parallel XOR parity calculation (Figure 2.6) that is fed back into the shift register as the new bit in the sequence. Tap D is the last stage in the shift register and so represents the LFSR output. This circuit is clocked at a frequency known as the chip rate
SINGLE SHIFT REGISTER WITH MULTICYCLE TAP ACCESS
(Figure 2.7) demonstrates how a single SRL16E primitive, with some additional logic, implements a 16-stage shift register that is clocked at a frequency called the chip rate. The SRL16 primitive address lines are multiplexed at four times the chip rate allowing four of the 16 shift register taps to be accessed during one chip rate period. The status of each accessed tap is input to a single XOR gate whose output is registered by a flip-flop also clocked at four times the chip rate. During one chip rate period, four taps are read and sequentially XORed to create the parity calculation. The final XOR state is available to the input of the shift register at the chip rate. This circuit enables any four of the sixteen shift register taps to be read, XORed together, and presented to the input of the shift register at the chip rate. Accessing the shift register taps over multiple cycles enables parallel access to four of the shift register taps at the chip rate. The multicycle clock rate should be twice the chip rate if only two taps are required. To access an odd number of taps during one chip rate period, the multicycle clock can be a binary-powerof-two multiple of the chip rate.