02-04-2012, 09:58 AM
Implementation and Evaluation of Image ProcessingAlgorithms on Reconfigurable Architecture using C-based Hardware Descriptive Languages
Abstract
With the advent of mobile embedded multimedia devices that are required
to perform a range of multimedia tasks, especially image processing tasks, the
need to design efficient and high performance image processing systems in a
short time-to-market schedule needs to be addressed. Image Processing
algorithms implemented in hardware have emerged as the most viable solution
for improving the performance of image processing systems. The introduction
of reconfigurable devices and system level hardware programming languages
has further accelerated the design of image processing in hardware. Most of
the system level hardware programming languages introduced and commonly
used in the industry are highly hardware specific and requires intermediate to
advance hardware knowledge to design and implement the system. In order to
overcome this bottleneck various C-based hardware descriptive languages
have been proposed over the past decade [25]. These languages have greatly
simplified the task of designing and verifying hardware implementation of the
system. However, the synthesis process of the system to hardware was not
completely addressed and was conducted using manual methods resulting in
duplication of the implementation process. Handel-C is a new C-based
language proposed that provides direct implementation of hardware from the
C-based language description of the system. Handel-C language and the IDE
tool introduced by Celoxica Ltd. provides both simulation and synthesis
capabilities. This work evaluates the performance and efficiency of Handel-C
language on image processing algorithms and is compared at simulation level
with another popular C-based system level language called SystemC and at
synthesis level with the industry standard Hardware Descriptive language,
10 Daggu Venkateshwar Rao et al
Verilog. The image processing algorithms considered include, image filtering,
image smoothing and edge detection. Comparison parameters at simulation
level include, man-hours for implementation, compile time and lines of code.
Comparison parameters at synthesis level include logic resources required,
maximum frequency of operation and execution time. Our evaluations show
that the Handel-C implementation perform better at system and synthesis level
compared to other languages considered in this work. The work also proposes
a novel hardware architecture to implement Canny’s edge detection algorithm.
The proposed architecture is capable of producing one edge-pixel every clock
cycle. A comparison of our architecture for Canny’s edge detection with other
architecture is also discussed.
1. Introduction
Digital image processing is an ever expanding and dynamic area with applications
reaching out into our everyday life such as medicine, space exploration, surveillance,
authentication, automated industry inspection and many more areas. Applications such
as these involve different processes like image enhancement and object detection [1].
Implementing such applications on a general purpose computer can be easier, but not
very time efficient due to additional constraints on memory and other peripheral
devices. Application specific hardware implementation offers much greater speed than
a software implementation. With advances in the VLSI (Very Large Scale Integrated)
technology hardware implementation has become an attractive alternative.
Implementing complex computation tasks on hardware and by exploiting parallelism
and pipelining in algorithms yield significant reduction in execution times [2].
There are two types of technologies available for hardware design. Full custom
hardware design also called as Application Specific Integrated Circuits (ASIC) and
semi custom hardware device, which are programmable devices like Digital signal
processors (DSPs) and Field Programmable Gate Arrays (FPGA’s). Full custom ASIC
design offers highest performance, but the complexity and the cost associated with the
design is very high. The ASIC design cannot be changed and the design time is also
very high. ASIC designs are used in high volume commercial applications. In
addition, during design fabrication the presence of a single error renders the chip
useless. DSPs are a class of hardware devices that fall somewhere between an ASIC
and a PC in terms of the performance and the design complexity. DSPs are
specialized microprocessors, typically programmed in C, or with assembly code for
improved performance. It is well suited to extremely complex math intensive tasks
such as image processing. Knowledge of hardware design is still required, but the
learning curve is much lower than other design choices [3]. Field Programmable Gate
Arrays are reconfigurable devices. Hardware design techniques such as parallelism
and pipelining techniques can be developed on a FPGA, which is not possible in
dedicated DSP designs. Implementing image processing algorithms on reconfigurable
hardware minimizes the time-to-market cost, enables rapid prototyping of complex
Implementation and Evaluation of Image Processing Algorithms 11
algorithms and simplifies debugging and verification. Therefore, FPGAs are an ideal
choice for implementation of real time image processing algorithms [4].
FPGAs have traditionally been configured by hardware engineers using a
Hardware Design Language (HDL). The two principal languages used are Verilog
HDL (Verilog) and Very High Speed Integrated Circuits (VHSIC) HDL (VHDL)
which allows designers to design at various levels of abstraction. Verilog and VHDL
are specialized design techniques that are not immediately accessible to software
engineers, who have often been trained using imperative programming languages.
Consequently, over the last few years there have been several attempts at translating
algorithmic oriented programming languages directly into hardware descriptions.
C-based hardware descriptive languages have been proposed and developed since
the late 1980s. Some of the C-based hardware descriptive languages include Cones
[33], HardwareC [30], Transmogrifier C [27], SystemC [28], OCAPI [31], C2Verilog
[32], Cyber [34], SpecC [26], Nach C [29], CASH [24]. A new C like hardware
description language called Handel-C introduced by Celoxica [2, 5], allows the
designer to focus more on the specification of an algorithm rather than adopting a
structural approach to coding.
Given the importance of digital image processing and the significance of their
implementations on hardware to achieve better performance, this work addresses
implementation of image processing algorithms like median filter, morphological,
convolution and smoothing operation and edge detection on FPGA using Handel-C
language. Also novel architectures for the above mentioned image processing
algorithms have been proposed. The Canny’s edge detection algorithm
implementation is compared at simulation and synthesis levels with other hardware
descriptive languages. The Handel-C implementation is compared with systemC
language at simulation level and with Verilog at synthesis level. The Canny’s edge
detection implementation is also compared with standard C implementations and with
design implemented using SA-C language [35].
2. Prior Work
The last few years have seen an unprecedented effort by researchers in the field of
image processing in hardware. Prior research can be categorized based on the type of
hardware and the image processing algorithm implemented. The type of hardware
considered for image processing acceleration inlcude Application Specific Integrated
Chips (ASIC), Digital Signal Processors (DSP) and Reconfigurable Logic Devices
(FPGA). The image processing algorithms considered for hardware implementation
include: convolution, image filtering and edge detection (Sobel’s, Prewitt’s and
Canny’s edge detection). Some researchers have also considered hardware
implementations specific to FPGA vendors like Xilinx, Amtel and Altera.
12 Daggu Venkateshwar Rao et al
Convolution and Image Filtering
G.S. Richard [6] discusses the idea of parameterized program generation of
convolution filters in an FPGA. A 2-D filter is assembled from a set of multipliers and
adders, which are in turn generated from a canonical serial-parallel multiplier stage.
Atmel application note [7] discusses a 3x3 convolution filter with run-time
reconfigurable vector multiplier in Atmel FPGA. Lorca, Kessal and Demigny [8]
proposed a new organization of filter at 2D and 1D levels, which reduces the memory
size and the computation cost by a factor of two for both software and hardware
implementations. A. Nelson [11] implemented the Rank Order Filter, Erosion,
Dilation, Opening, Closing and Convolution algorithms using VHDL and MATLAB
on Altera FLEX 10K FPGA and Xilinx Virtex FPGA. S. Hirai, M. Zakouji and T.
Tsuboi [12] implemented three image processing algorithms for computation of the
image gravity center, detection of object orientation using a radial projection and
computation of Hough transform. They developed an FPGA-based vision system that
used the Xilinx Virtex2E mounted on the FPGA board. The algorithms were coded
using C/C++ and compiled into the HDL language CycleC using SystemCompiler.
CycleC can be converted into VHDL and Verilog.
Edge Detection
Fahad Alzahrani and Tom Chen [9] present a high performance, pipelined edge
detection VLSI architecture for real time image processing applications. It is capable
of producing one edge-pixel every clock cycle at a clock rate of 10 MHz and the
architecture can process 30 frames per second. Peter Baran et.al. [13] implemented the
edge detection convolution algorithm. They wrote the code in ImpulseC, compiled
using Impulse Co-developer and synthesized for Altera Nios processor using Altera
Quartus tools. Various IP core vendors offer VHDL codes for image processing
algorithms like edge detection using Sobel’s and Prewitt’s edge detection algorithms,
Laplacian filer, low-pass filter, convolution etc. [14]. These codes can be synthesized
into FPGAs using appropriate synthesis software. Altera has implemented Prewitt
edge detection Using SOPC Builder & DSP Builder Tool Flow on the Altera Nios II
processor [15]. H. Neoh and A. Hazanchuk present implementation of the Canny edge
detection algorithm and Prewitt filter for the Altera FPGAs using DSP Builder, a
development tool that interfaces between the Altera Quartus II design software and
MATLAB/Simulink tools [16]. Draper et. al. have implemented many image
processing algorithms on FPGAs [17]. In one of their implementations, they measured
the performance of SA-C routines for the first three steps of canny edge detection
algorithm using the SA-C compiler and executing them on an Annapolis
Microsystems WildStar using a single Xilinx XV-2000E FPGA.
P. Mc Curry, F. Morgan and L. Kilmartin have compared the performance of
FPGA and distributed RAM architecture with current programmable DSP-based
implementations by implementing among others, using the RC 1000-PP Virtex FPGA
based development platform and Handel-C HDL. Laplacian edge detection,
Implementation and Evaluation of Image Processing Algorithms 13
morphological algorithms and several image processing algorithms are implemented
[18]. W. Luk and T. Wu and I. Page have devised an approach to partition algorithmic
procedures into ASIC and general processors. They tested their approach for the canny
edge detection algorithm on an FPGA [19]. Trost et. al. designed a reconfigurable
system using FPGA modules based on Xilinx XC4008E, Spartan XCS40 and Virtex
XCV100 FPGA devices for implementation of real-time image processing circuits.
They used the VHDL language to test the system for image processing algorithms like
image filtering (low pass, high pass, non-linear) edge detection (Sobel, Canny, Nonlinear
Laplace) and image rotation [20,21].
This work presents a novel architecture model of four image processing algorithm
(Median Filtering, Morphological Operations, Convolution and Edge detection) on
reconfigurable architecture using Handle-C. For performance comparison of our
implementation the Canny’s edge detection algorithm was also implemented using
SystemC and Verilog hardware descriptive languages. For the Handel-C
implementation the algorithms were developed using the DK2 development
environment and was implemented using the Xilinx Vertex FPGA based PCI board.
The reason for selecting the Handle-C language pardigram is its C based syntax and
the DK2 IDE’s co-design framework that allows seamless integration of software and
hardware image processing modules. The front-end graphical user interface (GUI) was
developed using Visual C++ environment. Only simulation level design was
implemented using SystemC and was compiled using OSCI SystemC compiler. The
verilog implementation was simulated using ALDEC’s Active HDL simulator and
synthesized using Xilinx’s Xilinx Synthesis Tool (XST). The implementation
environment for verilog was similar to the Handel-C implementation.
3. Image Processing Algorithms
This section discusses the theory of most commonly used image processing
algorithms like, 1) Filtering, 2) Morphological Operations, 3) Convolution and 4)
Edge detection.