27-06-2013, 02:49 PM
High-Speed VLSI Architectures for the AES Algorithm
High-Speed VLSI.pdf (Size: 562.6 KB / Downloads: 31)
Abstract
This paper presents novel high-speed architectures
for the hardware implementation of the Advanced Encryption
Standard (AES) algorithm. Unlike previous works which rely on
look-up tables to implement the SubBytes and InvSubBytes transformations
of the AES algorithm, the proposed design employs
combinational logic only. As a direct consequence, the unbreakable
delay incurred by look-up tables in the conventional approaches
is eliminated, and the advantage of subpipelining can be further
explored. Furthermore, composite field arithmetic is employed to
reduce the area requirements, and different implementations for
the inversion in subfield (24) are compared. In addition, an
efficient key expansion architecture suitable for the subpipelined
round units is also presented. Using the proposed architecture, a
fully subpipelined encryptor with 7 substages in each round unit
can achieve a throughput of 21.56 Gbps on a Xilinx XCV1000 e-8
bg560 device in non-feedback modes, which is faster and is 79%
more efficient in terms of equivalent throughput/slice than the
fastest previous FPGA implementation known to date.
INTRODUCTION
CRYPTOGRAPHY plays an important role in the security
of data transmission. In January 1997, the National Institute
of Standards and Technology (NIST) invited proposals for
new algorithms for the Advanced Encryption Standard (AES)
to replace the old Data Encryption Standard (DES). After two
rounds of evaluation on the 15 candidate algorithms, NIST selected
the Rijndael as the AES algorithm [1] in October 2000.
The AES algorithm has broad applications, including smart
cards and cellular phones, WWW servers and automated teller
machines (ATMs), and digital video recorders. Compared to
software implementations, hardware implementations of the
AES algorithm provide more physical security as well as higher
speed. Three architectural optimization approaches can be employed
to speed up the hardware implementations: pipelining,
subpipelining, and loop-unrolling. Among these approaches,
the subpipelined architecture can achieve maximum speedup
and optimum speed–area ratio in non-feedback modes. In order
to explore the advantage of subpipelining further, each round
unit needs to be divided into more substages with equal delay.
THE SUBPIPELINED ARCHITECTURE OF THE AES AND
COMPOSITE FIELD ARITHMETIC
The AES Algorithm
The AES algorithm is a symmetric-key cipher, in which both
the sender and the receiver use a single key for encryption and
decryption. The data block length is fixed to be 128 bits, while
the key length can be 128, 192, or 256 bits, respectively. In addition,
the AES algorithm is an iterative algorithm. Each iteration
can be called a round, and the total number of rounds, , is 10,
12, or 14, when the key length is 128, 192, or 256 bits, respectively.
The 128-bit data block is divided into 16 bytes. These
bytes are mapped to a 4 4 array called the State, and all the
internal operations of the AES algorithm are performed on the
State.
The Subpipelined Architecture
Three architectural optimization approaches can be used to
speed up the AES algorithm in non-feedback modes by duplicating
hardware for implementing each round, which is also
called round unit. These architectures are based on pipelining,
subpipelining and loop-unrolling. The pipelined architecture is
realized by inserting rows of registers between each round unit.
Similar to the pipelining, subpipelining also inserts rows of
registers among combinational logic, but registers are inserted
both between and inside each round unit. In pipelining and
subpipelining, multiple blocks of data are processed simultaneously.
Comparatively, loop unrolled or unfolded architectures
can process only one block of data at a time, but multiple rounds
are processed in each clock cycle. Among these architectural
optimization approaches, subpipelining can achieve maximum
speedup and optimum speed/area ratio in non-feedback modes.
CONCLUSION
In this paper, efficient subpipelined architectures of the AES
algorithm are presented. In order to explore the advantage
of subpipelining further, the SubBytes/InvSubBytes is implemented
by combinational logic to avoid the unbreakable delay
of LUTs in the traditional designs. Additionally, composite
field arithmetic is used to reduce the hardware complexity
and different approaches for the implementation of inversion
in subfield are compared. As an example of our
proposed architecture, fully subpipelined encryptors using
128-bit key are implemented on FPGA devices. Decryptors
can be easily incorporated by using the subpipelined joint
encryptor/decryptor round unit architecture presented in this
paper, and we expect the throughput will be slightly lower
than the encryptor-only implementations. Meanwhile, fully
subpipelined encryptors/decryptors using other key lengths
can be implemented by adding more copies of round units
and modifying the key expansion unit slightly.