28-06-2014, 11:49 AM
The National Institute of Standards and Technology
The National Institute of Standards and Technology.docx (Size: 733 KB / Downloads: 9)
INTRODUCTION
The National Institute of Standards and Technology (NIST) have initiated a process to develop a Federal Information Processing Standard (FIPS) for the Advanced Encryption Standard (AES). In October of 2000, NIST announced Rijndael as the winner of Advanced Encryption Standard (AES) contest in effort to address threatened key size of Data Encryption Standard (DES). Currently, most AES algorithms are implemented in software; thereby, the secret key is vulnerable to attacks and relies on underlying programs. The input data is processed using the key and the resultant data is available at the output terminal named CIPHER/PLAIN TEXT. The major advantage of AES is the possibility of efficient implementation on various platforms. AES is suitable for small 8-bit microprocessor platforms and common 32-bit processors, and it is appropriate for dedicated hardware implementations. Hardware implementations can reach throughput rates in the gigabit range
1 Data Encryption Standard
It is based on a Symmetric-key algorithm that uses a 56-bit key .DES is now considered insecure because a brute force attack is possible. Brute force attack is a method of defeating a cryptographic scheme by systematically trying a large number of possibilities.
Cipher Detail: Key sizes : 56 bits
Block sizes : 64 bits
Structure : Feistel network
Feistel cipher is a symmetric structure used in the construction of block ciphers
AES
The Advanced Encryption Standard (AES) is a block cipher adopted as an encryption standard. AES began immediately to replace the Data Encryption Standard (DES). AES outperforms DES in improved long-term security because of, among other things, larger key sizes (128, 192, or 256 bits). The AES algorithm is an iterative algorithm. Each iteration can be called a round, and the total number of rounds, Nr, is 10, 12, or 14, when the key length is 128, 192, or 256 bits, respectively. The Rijndael Algorithm was chosen since it had the best overall scores in security, performance, efficiency, implementation ability and flexibility
MATHEMATICAL MODEL
Every byte in the AES algorithm is interpreted as a finite field element. All Finite field elements can be added and multiplied. However, these operations differ from those used for numbers and their use requires investigation
RIJNDAEL ALGORITHM
Rijndael was selected as the AES in Oct-2000 and issue d as FIPS PUB 197 standard in Nov-2001 .Its designed by Joan Rijimen and Vincent Daemen in Belgium .It has 128/192/256 bit keys, 128 bit data an iterative rather than Feistel cipher processes data as block of 4 columns of 4 bytes operates on entire data block in every round .The algorithm is designed to be resistant against known attacks, speed and code compactness on many CPUs, design simplicity.
There are also other algorithms like MARS, RC6, RIJNDAEL, SERPENT and TWOFISH.
Bytes
The basic unit of processing in the AES algorithm is a byte, which is a sequence of eight bits treated as a single entity. The input, output and Cipher Key bit sequences are processed as arrays of bytes that are formed by dividing these sequences into groups of eight contiguous bits to form arrays of bytes. For an input, output or Cipher Key denoted by a, the bytes in the resulting array are referenced using one of the two forms, an or a[n], where n will be in a range that depends on the key length. For a key length of 128 bits, n lies in the range 0 ≤ n <16. For a key length of 192 bits, n lies in the range 0 ≤ n < 24. For a key length of 256 bits, n lies in the range 0≤ n < 32.All byte values in the AES algorithm are presented as the concatenation of the individual bit values, (0 or 1), between braces in the order {b7b6b5b4b3b2b1b0 }. These bytes are interpreted as finite field elements using a polynomial representation
Arrays of Bytes
Arrays of bytes are represented in the form a0 a1 a2•••a15. The bytes and the bit ordering within bytes are derived from the 128-bit input sequence, input0 input1 input2 ••• input126 input127 as a0 = {input0, input1, •••, input7}, a1 = {input8, input9, •••, input15} with the pattern continuing up to a15 = {input 120, input 121, •••, input127}. The pattern can be extended to longer sequences associated with 192 and 256 bit keys. In general, an = {input8n, input8n+1, •••, input8n+7}.
The State as an Array of Columns
The four bytes in each column of the State form 32-bit words, where the row number “r” provides an index for the four bytes within each word. Therefore, the state can be interpreted as a one-dimensional array of 32 bit words, which is symbolized by w0...w3. The column number c provides an index into this linear State array. The State can be considered as an array of four words where
Mixing of Columns Transformation
The Mix Columns transformation operates on the State column-by-column, treating each column as a four-term polynomial .The columns are considered as polynomials over GF (28) and multiplied modulo x4 + 1 with a fixed polynomial a(x), given by the following equation
Key Schedule Generation
Each round key is a 4-word (128-bit) array generated as a product of the previous round key, a constant that changes each round, and a series of S-Box lookups for each 32-bit word of the key. The first round key is the same as the original user input. Each byte (w0 – w3) of initial key is XOR’d with a constant that depends on the current round, and the result of the S-Box lookup for wi, to form the next round key. The number of rounds required for three different key lengths is presented in figure
Inverse Shift Rows Transformation
Inverse Shift Rows Transformation Inverse Shift Rows is the inverse of the Shift Rows transformation presented. The bytes in the last three rows of the State are cyclically shifted over different numbers of bytes. The first row, r = 0, is not shifted. The bottom three rows are cyclically shifted by Nb-shift(r, Nb) bytes, where the shift value shift(r, Nb) depends on the row number. Specifically, the Inverse Shift Rows ( ) transformation proceeds as follows
Inverse Mixing of Columns Transformation
Inverse Mixing of Columns Transformation Inverse Mix Columns is the inverse of the Mix Columns transformation presented.
Inverse Mix Columns operates on the State column-by-column, treating each column as a four term polynomial. The columns are considered as polynomials over GF (2^8) and multiplied modulox4 + 1 with a fixed polynomial a-1(x), given
AES IMPLEMENTATION
Encryption Implementation
The encryption process of Rijndael Algorithm is written in VHDL hardware description language here and is simulated by using Modelsim software. The implementation is done in FPGA using Xilinx software
Decryption Implementation
In the similar way the decryption process is written in VHDL hardware description language and is simulated by using Modelsim software. The implementation is done in FPGA using Xilinx software. The key schedule generation module is modified in the reverse order in which last round key is treated as the first round and decreasing order follows.
By using an efficient inter-round and intra-round pipeline design, it achieves a low power, reduced clock cycle and high throughput. The pipelining between each of the rounds will achieve a high performance encryption and Decryption implementation
WHY PIPELINING
An efficient design increasing the throughput for the AES algorithm using a pipelined architecture. By using an efficient inter-round and intra-round pipeline design, it achieves a low power, reduced clock cycle and high throughput. The pipelining between each of the rounds will achieve a high performance encryption implementation. Although implementing an iterative pipelining based approach is one option.
The pipelined method is one of the techniques often used for computer architecture. It takes advantage of parallelism, and in this case used the method to place registers between each round to form a pipelined stage so that several blocks of data are processed by the circuit at the same time. Also, the concept of pipelined architecture is generally used to increase the amount of data processed by the digital circuit. During the process, the currently processing data is moved to the next stage which the next block of data takes its place. When the processed data meets the end of its depth, it is fed back to the beginning of the loop process
LOOP UNROLLING
Iterative looping is a subset of loop unrolling in that only one round is unrolled whereas a loop unrolling architecture allows for the unrolling of multiple rounds, up to the total number of rounds required by the cipher. As opposed to an iterative looping architecture, a loop unrolling architecture where all n rounds are unrolled and implemented as a single combinatorial logic block maximizes the hardware required for round function implementation while the hardware required for round key and S-Box multiplexing is completely eliminated. However, while
PARTIAL PIPELINING
It is possible to cut the pipeline process to return to the user asking for more data and resume the process later. To accomplish this, add the entry to the pipeline setting before any entry that returns an instance. When it’s time to resume the process just redirect the user to view. By default the pipeline will be resumed in the next entry after, but this can be modified by setting the following setting to the import path of the pipeline entry to resume processing
SUB PIPELINING
A large delay of clock cycles is created when the rounds of operation of the Pipelined design is complicated. Sub pipelined architecture would be the perfect Solution for solving a situation like this. The theorem of the pipelined architecture is also applied to the sub pipelined algorithm. Sub pipelining not only places registers between each round, but also puts them inside each round unit while applying the idea of the loop-unrolled structure. Compared with the other two approaches, the sub pipelined architecture accomplishes the maximum speed up
PROPOSED SYSTEM
PIPELINED ARCHITECTURE
By using pipelining and sub pipelining we can process multiple blocks of data simultaneously. Among these architectural optimizations sub pipelining gives maximum speed and better throughput/area. Inter round pipelining is implemented using a D-flip flop. It pipelines the output of one round as an input to next round for all 22 rounds.
Normally there is to transmit 128 bits between each round. It needs 128 clock cycles to send every 128 bit to next round for transformation. This decreases the throughput of the system. To overcome this, we use four 32 bit D-flip flop between each round so as to transmit 128 bit to reduce the clock cycle easily.
By using this response of the system is increased (i.e) the throughput of the system is made speeder. The area may increase but the delay decreases. The clock is common to all flip-flops given synchronously. We use four stages pipelining between the rounds
FLIPFLOP
The D flip-flop is widely used. It is also known as a data or delay flip-flop. The D flip-flop captures the value of the D-input at a definite portion of the clock cycle (such as the rising edge of the clock). That captured value becomes the Q output. At other times, the output Q does not change. The D flip-flop can be viewed as a memory cell or a delay line
Implementation of pipeline using DFF
Pipelining is a standard way of decomposing an operation into concurrently operating stages to increase throughput at a moderate increase in area. A wide variety of applications such as digital filters, video compression and general purpose microprocessors, can all be decomposed into pipeline structures.
Pipelines can be implemented both synchronously and asynchronously. In a synchronous pipeline the communication of data between stages is regulated by the global clock. It is assumed that each stage takes no longer than the period of the clock and data is transferred between consecutive stages simultaneously. In asynchronous pipeline the communication of data between the stages is regulated by local communication between stages
CONCLUSION
The paper presented a VLSI architecture for the Rijndael, AES algorithm. The proposed architecture uses sub pipelining. We perform both the encryption and decryption modules, with data block and key equal to 128 bits .The data path of the architecture comprises of rounds of rijndael basic block which consists of four sequential operations and the final processing element which implements the
output transformation. S-boxes are used for the implementation of the multiplicative inverses and are shared between encryption and decryption. There is a trade-off between speed and the use of resources . In the proposed architecture, the standard round is duplicated for 9 times followed by the final round. These rounds are cascaded by using the pipelining registers. Using this architecture 1 blocks of data can be transformed at the same time, which results in high throughput. The limitation for this architecture is that the high throughput is achieved at the cost of area. The Rijndael algorithm supports a key length of 128, 192 or 256 bits. The implementation of the key unit in the proposed architecture, can be extended for the keys of length 192 and 256 bits at the cost of throughput