22-08-2012, 03:02 PM
Getting Started with CUDA
CUDA.pdf (Size: 1.74 MB / Downloads: 36)
What is CUDA?
CUDA is a scalable parallel programming model
and a software environment for parallel computing
Minimal extensions to familiar C/C++ environment
Heterogeneous serial-parallel programming model
NVIDIA’s TESLA architecture accelerates CUDA
Expose the computational horsepower of NVIDIA GPUs
Enable GPU computing
CUDA also maps well to multicore CPUs
Some Design Goals
Scale to 100’s of cores, 1000’s of parallel
threads
Let programmers focus on parallel algorithms
Not on the mechanics of a parallel programming
language
Enable heterogeneous systems (i.e. CPU + GPU)
CPU and GPU are separate devices with separate
DRAMs
CUDA Kernels and Threads
Parallel portions of an application are executed on
the device as kernels
One kernel is executed at a time
Many threads execute each kernel
Differences between CUDA and CPU threads
CUDA threads are extremely lightweight
Very little creation overhead
Instant switching
CUDA uses 1000s of threads to achieve efficiency
Multi-core CPUs can use only a few
Thread Cooperation
The Missing Piece: threads may need to cooperate
Thread cooperation is valuable
Share results to avoid redundant computation
Share memory accesses
Drastic bandwidth reduction
Thread cooperation is a powerful feature of CUDA
Cooperation between a monolithic array of threads
is not scalable
Cooperation within smaller batches of threads is
scalable
CUBLAS
Implementation of BLAS (Basic Linear Algebra
Subprograms) on top of CUDA driver
Self contained at the API level, no direct interaction with
CUDA driver
Basic model for use
Create matrix and vector objects in GPU memory space
Fill objects with data
Call CUBLAS functions
Retrieve data
CUBLAS library helper functions
Creating and destroying data in GPU space
Writing data to and retrieving data from objects