02-11-2012, 05:38 PM
Gigabits per Second Implementation of the IDEA Cryptographic Algorithm
Gigabits per Second Implementation.pdf (Size: 189.43 KB / Downloads: 48)
Abstract.
IDEA (International Data Encryption Algorithm) is one of
the strongest secret-key block ciphers. The algorithm processes data in
16-bit subblocks and can be fully pipelined. The implementation of a
fully pipelined IDEA algorithm achieves a clock rate of 105.9 MHz on
Xilinx’ XCV1000E-6BG560 FPGA of the Virtex-E device family. The
implementation uses 18105 logic cells and achieves a throughput of 6.78
Gbps with a latency of 132clo ck cycles.
Introduction
Cryptography is the science of keeping communication secure, so that eavesdroppers
cannot decipher the transmitted messages. The transmission speeds of core
networks require hardware-based cryptographic modules, since software-based
cryptography cannot meet the required throughput requirements.
Field programmable gate arrays (FPGAs) are ideal components for fast cryptographic
algorithms. The large capacities of FPGAs enable the fitting of fully
pipelined algorithms on a single chip. The reprogrammability of FPGAs enables
using the same hardware platform as a cryptographic engine for a multitude of
communications protocols.
Cryptographic algorithms are divided into public-key and secret-key algorithms.
In public-key algorithms both public and private keys are used, with the
private key computed from the public key. Secret-key algorithms rely on secure
distribution and management of the session key, which is used for encrypting and
decrypting all messages. When it comes to both software- and hardware-based
implementations, secret-key algorithms are 100 to 1000 times faster than publickey
algorithms. For this reason, dual-key sessions use a secret-key algorithm for
the bulk of communication, whereas the session specific secret keys are agreed
on and distributed with a public key algorithm.
The International Data Encryption Algorithm (IDEA) was introduced by
Lai and Massay in 1990 [1], and modified the following year [2]. IDEA has been
patented in the U.S. and several European countries, but the non-commercial
use of IDEA is free everywhere. The patent holder was originally Ascom AG,
M. Glesner, P. Zipf, and M. Renovell (Eds.): FPL 2002, LNCS 2438, pp. 760–769, 2002.
Springer-Verlag Berlin Heidelberg 2002
6.78 Gbps Implementation of the IDEA Cryptographic Algorithm 761
but in 1999 the intellectual property rights were transferred to Mediacrypt AG.
Part of the fame of IDEA is due to its usage in Pretty Good Privacy (PGP).
IDEA is considered highly secure, and it has resisted all forms of attack tried
by the academic community. No published attack (with the exception of attacks
on weak keys) is better than exhaustive search on the 128-bit key space, which
is computationally infeasible. The security of IDEA appears bounded only by
the weaknesses arising from the relatively small (compared to its keylength)
blocklength of 64 bits. [3]
Unlike many other cryptographic algorithms, IDEA can easily be implemented
on 16-bit microcontrollers, since the algorithm operates on 16-bit subblocks.
In 1999, a software-based implementation of four parallel IDEA algorithms
(4-way IDEA) achieved a throughput of the order of 72 megabits per
second (Mbps) on a 166 MHz MMX Pentium processor [4]. If this result is scaled
to modern 2.533 GHz Pentium 4 processors, a software-based implementation
of a 4-way IDEA achieves a throughput of 1.1 gigabits per second (1.1 Gbps).
This sets a reference point for hardware-based implementations.
There have been several reported hardware implementations of IDEA in the
published literature. Mediacrypt AG sells two hardware-based IDEA solutions,
the IDEACrypt Coprocessor and the IDEACrypt Kernel. The IDEACrypt kernel
is faster of these two and implements the IDEA algorithm with a three-stage
pipeline. A 0.25 micron implementation has a throughput of 720 Mbps at a
clock rate of 100 MHz. [5]
The Improved IDEA chip by Salom˜ao et al. [6] achieved a throughput of 809
Mbps at a 100 MHz clock rate. If eight of these devices are connected in series,
an estimated throughput of 6.5 Gbps can be achieved.
Cheung et al. [7] have investigated the tradeoffs in parallel and serial implementations
of the IDEA algorithm. The parallel implementation achieved a
throughput of 1.17 Gbps on a Xilinx Virtex XCV300-6 at a clock rate of 82
MHz, whereas the serial implementation achieved a throughput of 600 Mbps on
the same device at a clock rate of 150 MHz. When two rounds of the IDEA algorithm
were implemented, the parallel implementation required 79.56 per cent of
the logic resources of an XCV300-6. The serial implementation of a single round
required 93.68 per cent of the logic resources of the same device. It was estimated,
that by utilizing linear scaling of the area requirements, a fully pipelined
parallel implementation of the IDEA algorithm would fit into a XCV1000 Virtex
device with a device utilization of 94.42 per cent. This would correspond to
throughput of 5.25 Gbps, if the clock rate remained unchanged.
2 Description of the IDEA Algorithm
IDEA encrypts 64-bit plaintext blocks into 64-bit ciphertext blocks using a 128-
bit input key K. The algorithm consists of eight identical rounds followed by
an output transformation. Each round uses six 16-bit subkeys K®
i , 1 ≤ i ≤ 6,
to transform a 64-bit input X into an output of four 16-bit blocks, which are
then input to the next round. All subkeys are derived from the 128-bit input
762An tti H¨am¨al¨ainen, Matti Tommiska, and Jorma Skytt¨a
key K. The subkey derivation process is different in decryption mode from the
encryption mode, but otherwise encryption and decryption are performed with
identical hardware.
IDEA uses only three operations on 16-bit sub-blocks a and b: bitwise XOR
denoted by ⊕, unsigned addition mod (216) denoted by and modulo (216 +
1) multiplication, denoted by . All these three operations are derived from
different algebraic groups of 216 elements, which is crucial to the algorithmic
strength of IDEA. Of the three arithmetic operations, bitwise XOR and unsigned
addition mod (216) are trivial to implement, whereas a both area-efficient and
fast implementation of modulo (216 + 1) multiplication requires careful design
and bit-level optimisation.