GPU Computing: The Democratization of Parallel Computing

**study tips** · 02-08-2013, 12:15 PM

GPU Computing: The Democratization of Parallel Computing

.pdf

GPU Computing.pdf (Size: 2.52 MB / Downloads: 69)

Parallel Computing’s Golden Age

1980s, early `90s: a golden age for parallel computing
Particularly data-parallel computing
Architectures
Connection Machine, MasPar, Cray
True supercomputers: incredibly exotic, powerful, expensive
Algorithms, languages, & programming models
Solved a wide variety of problems
Various parallel algorithmic models developed

Enter the GPU

GPUs are massively multithreaded manycore chips
NVIDIA Tesla products have up to 128 scalar processors
Over 12,000 concurrent threads in flight
Over 470 GFLOPS sustained performance
Users across science & engineering disciplines are
achieving 100x or better speedups on GPUs
CS researchers can use GPUs as a research platform
for manycore computing: arch, PL, numeric, ...

Enter CUDA

CUDA is a scalable parallel programming model and a
software environment for parallel computing
Minimal extensions to familiar C/C++ environment
Heterogeneous serial-parallel programming model
NVIDIA’s TESLA GPU architecture accelerates CUDA
Expose the computational horsepower of NVIDIA GPUs
Enable general-purpose GPU computing

Device Emulation Mode

An executable compiled in device emulation mode
(nvcc -deviceemu) runs completely on the host
using the CUDA runtime
No need of any device and CUDA driver
Each device thread is emulated with a host thread
When running in device emulation mode, one can:
Use host native debug support (breakpoints, inspection,
etc.)
Access any device-specific data from host code and vice-
versa
Call any host function from device code (e.g. printf) and
vice-versa
Detect deadlock situations caused by improper usage of
__syncthreads

Host Synchronization

All kernel launches are asynchronous
control returns to CPU immediately
kernel executes after all previous CUDA calls have
completed
cudaMemcpy is synchronous
control returns to CPU after copy completes
copy starts after all previous CUDA calls have completed
cudaThreadSynchronize()
blocks until all previous CUDA calls complete

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Green Computing-A Critical Necessity For Making ICT Judicious	seminar flower	1	3,105	19-09-2017, 03:35 PM Last Post: jaseela123
	Cloud Computing	seminar code	1	646	13-09-2017, 03:36 PM Last Post: jaseela123
	Cloud Computing	seminar code	1	762	13-09-2017, 12:58 PM Last Post: jaseela123
	Green Computing for Efficient use of Energy and Electronic Waste Minimization Report	project girl	1	1,357	12-09-2017, 12:37 PM Last Post: jaseela123
	A Short History of Computing	seminar ideas	1	2,398	09-09-2017, 10:02 AM Last Post: jaseela123
	Parallel Virtual Machine : Seminar Report and PPT	seminar projects maker	1	629	02-09-2017, 01:01 PM Last Post: jaseela123
	Study and Evaluating Cloud Computing Services	seminar paper	1	2,010	28-08-2017, 04:24 PM Last Post: jaseela123
	Unicode And Multilingual Computing	computer science crazy	0	4,646,633	25-08-2017, 09:32 PM Last Post: computer science crazy
	A visual computing environment for very large scale biomolecularmodeling	computer science crazy	0	10,949,947	25-08-2017, 09:32 PM Last Post: computer science crazy
	64-Bit Computing	Computer Science Clay	0	12,787,677	25-08-2017, 09:32 PM Last Post: Computer Science Clay

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.