General Purpose Computation on Graphics Processing Units (GPGPU) using CUDA pdf

**project girl** · 12-12-2012, 11:44 AM

General Purpose Computation on Graphics Processing Units (GPGPU) using CUDA

.pdf

General Purpose Computation.pdf (Size: 1.48 MB / Downloads: 27)

Introduction

Graphics processing units (GPUs) are special processors which traditionally were used to accelerate computer graphics by offloading work from the CPU. Today, GPUs are highly parallel many-core processors which enable general-purpose computation on graphics processing units (GPGPU). GPGPU has already been an issue since 2002 but a huge inter-est did not evolve until Nvidia released the CUDA platform in 2007. Developers and re-searchers started to use CUDA for parallel programming. The current high visibility in science and practice, especially in parallel, scientific and high-performance computing (HPC), is one reason for this paper. Further motivation arises through interest in computer graphics and parallel computing.
Nvidia’s architecture CUDA was chosen for this paper because it was the dominating plat-form for GPGPU at the time of writing. Nvidia was the first vendor that diffused a com-prehensive architecture combining huge programmability, performance, and ease of use. However, CUDA is challenged by AMD’s alternative ATI Stream [AMD09a] as well as two standardization approaches, OpenCL [Kh09b] and DirectX11 DirectCompute [MS09d]. Nvidia is also facing competition in other markets. Intel and AMD both use a platform strategy, combining x86 CPUs, graphics, and chipsets and trying to put Nvidia out of the chipset market [Wi09]. While Nvidia confirms that it has no intention of con-structing x86 processors [Cr09], GPGPU, HPC, and parallel computing have become a major strategic pathway. Effects of this are huge research, marketing, and collaboration efforts, e.g. lectures, tutorials, student scholarships, and partnerships with professors, uni-versities, and software development companies, which resulted in a large amount of scien-tific publications and parallel applications [NVI09m].

GPGPU Basics

The following chapter will introduce basics of computer graphics and graphics hardware which form the background of GPGPU. During this, some basic terms for computer graph-ics objects will be needed: Geometric primitives are simple atomic geometric objects like points, lines, triangles, or other polygons. The corner points of these objects are called ver-tices. Another basic object is a fragment which is the basis for a pixel. In addition to the color value, the fragment also contains other information that is needed before the pixel is drawn, e.g. the position, the depth, or the alpha value (for transparency) [Ha06, Ih09 pp. 9-10].
2.1 Graphics Pipeline
A graphics pipeline (also called rendering pipeline) is a model that describes different steps performed to render a scene. The pipeline concept can be compared to the CPU instruction pipeline: The individual steps are done in parallel, but are blocked until the latest step is finished. One simple model of a (fixed-function) graphics pipeline is depicted in Fig. 1.

Graphics APIs

Graphics APIs provide programmers a high level of abstraction and simplify the software development process by hiding complexity and capabilities of graphics hardware and de-vice drivers. The two most important graphics APIs will be briefly introduced in the fol-lowing.
Direct3D is an API for drawing 3D graphics and the most prominent component of the comprehensive DirectX API collection for multimedia applications on Microsoft platforms (Windows and Xbox). An advantage for programmers using DirectX is the huge market penetration, which enables Microsoft to define minimum hardware specifications for graphics components in collaboration with the graphics vendors. Disadvantages like the fact that it is proprietary, low backward compatibility, and short release cycles can be criti-cized [BB03 p. 4]. However, the last two arguments also provide the basis for innovations: Until Direct3D 10, the most interesting development for GPGPU was the introduction of different shader models (cf. Section 2.3). The current version 11 has been released in Oc-tober 2009 and features hardware support for tessellation, which increases the amount of polygons through subdivision of polygons at runtime within the pipeline of the GPU, in-creased multi-threading support (for multi-core CPUs), and DirectCompute, Microsoft’s new approach to GPGPU [Be09a].

Graphics Hardware

In modern PCs, GPUs are either present on a dedicated graphics card or on the mother-board as integrated graphics solution. The latter usually have little or no own graphics memory, compete with the CPU in utilizing main memory, and reside at the lower price and performance spectrum. However, the computing power is generally sufficient for sim-ple 2D and 3D graphics tasks. Problems arise e.g. with complex 3D video games in high resolutions, CAD software, or GPGPU. High-performance GPUs are typically only availa-ble as dedicated graphics cards. The cards are connected to the system via an expansion slot, currently PCI Express (PCIe) v2.0 which uses point-to-point serial links. The serial links are composed of one to 32 lanes, each lane carrying 500 MB/s. Most contemporary cards are connected via 16 lanes which allows for a data transfer speed of 8 GB/s (full dup-lex) [PCI09]. It will later become clear that this is a major bottleneck for GPGPU applica-tions.

Traditional GPGPU

Traditional GPGPU was already possible in 2002. The requirements for this were the in-creasing performance and programmability, the latter realized through graphics shaders and the introduction of more complex and precise data types. Early GPUs operated with eight bit integers (pixels with 256 colors). Floating point data types with different grades of precisions were added later [Ha06 ch. 2]. The first GPGPU programs directly used the graphics APIs and hence, were written in HLSL, GLSL, or Cg. The programs had to utilize the computational units on the graphics card in a restrictive and differentiated way. The texture unit was used as read only memory, the framebuffer as write only memory. The vertex and pixel shaders were used to execute the kernels. The rasterizer was used for ad-dress calculation [Ha06].

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Design and Implementation of High-Performance FPGA Signal Processing Datapaths	seminar class	1	300,513	20-09-2017, 01:31 PM Last Post: jaseela123
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Color Image Indexing Using BTC	seminar tips	1	1,436	19-09-2017, 02:52 PM Last Post: jaseela123
	Mobile Messenger Using Ad-hoc Networks	seminar code	1	682	19-09-2017, 02:50 PM Last Post: jaseela123
	System Analysis (Modeling of the Existing and Proposed System using OOD)	seminar flower	1	2,459	15-09-2017, 03:39 PM Last Post: jaseela123
	DESIGN AND PERFORMANCE ANALYSIS OF OPTICAL CDMA SYSTEM USING NEWLY DESIGNED MULTIWAVE	project girl	1	1,270	15-09-2017, 01:34 PM Last Post: jaseela123
	Secure Online Examination Management using XML full report	seminar class	1	294,875	14-09-2017, 12:51 PM Last Post: jaseela123
	Network Security Using Hybrid Port Knocking	seminar tips	1	1,121	13-09-2017, 10:06 AM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.