12-07-2014, 04:53 PM
Low Latency Systolic Montgomery Multiplier for Finite
Field Based on Pentanomials
Low Latency.pdf (Size: 857.98 KB / Downloads: 32)
Abstract—
In this paper, we present a low latency systolic Montgomery
multiplier over based on irreducible pentanomials. An efficient
algorithm is presented to decompose the multiplication into a number of
independent units to facilitate parallel processing. Besides, a novel so-called
“pre-computed addition” technique is introduced to further reduce the
latency. The proposed design involves significantly less area-delay and
power-delay complexities compared with the best of the existing designs.
It has the same or shorter critical-path and involves nearly one-fourth of
the latency of the other in case of the National Institute of Standards and
Technology recommended irreducible pentanomials.
INTRODUCTION
Finite field multipliers over have wide applications in
elliptic curve cryptography (ECC) and error control coding systems
[1], [2]. Polynomial basis multipliers are popularly used because they
are relatively simple to design, and offer scalability for the fields
of higher orders. Efficient hardware design for polynomial-based
multiplication is therefore important for real-time applications [3], [4].
The pentanomial-based Galois field is widely used, since the National
Institute of Standards and Technology (NIST) has recommended
three pentanomials for ECC application [5]. There are a few systolic
realizations of pentanomial-based multiplier for high-throughput
implementation [6]–[8]. In a recent paper [9], Meher has presented
an efficient systolic design of multiplier based on irreducible pentanomial.
It is found that the systolic structure for field multiplication
in [9] has a latency of cycles. In this paper, we have extended the
structure of [9] further to obtain a lower latency systolic structure. First
of all, an efficient Montgomery algorithm for pentanomial is proposed
where the multiplication is decomposed into a number of independent
components, which could be processed in parallel. Furthermore, we
have introduced a novel “pre-computed addition” (PCA) technique
such that the latency of a multiplier can be reduced further. The
proposed structure achieves significantly less time complexity than
the corresponding existing structures.
The remainder of the paper is organized
CONCLUSION
In this paper, we have presented a novel PCA technique and modular
reduction scheme for Montgomery multiplication over
based on irreducible pentanomials. To illustrate the efficiency of the
proposed approach, we have designed the multiplier for the irreducible
pentanomial , for simplicity of discussion.
We have decomposed theMontgomerymultiplication into two
concurrent blocks andwe have derived a lower-latency multiplier using
the proposed modular reduction scheme using PCA. The proposed design
involves significantly less area-delay and power-delay complexities
than the newly reported multiplier for irreducible pentanomial,
with nearly one-fourth of the latency of the other, for the NIST recommended
pentanomials.