7-20-2010-CUDA lib summary

Report
CUDA Library and Demo
Yafeng Yin, Lei Zhou, Hong Man
07/21/2010
Outline
• Basic CUDA computation library
 GPULib, CUBLAS, CUFFT
• Advanced CUDA computation library
 CULA /MAGMA , VSIPL
• CUDA FIR Demo(UMD)
• Discuss and future work
Basic lib - GPULib
• GPULib provides a library of mathematical
functions
– addition, subtraction, multiplication, and division,
as well as unary functions, including sin(), cos(),
gamma(), and exp(),
– interpolation, array reshaping, array slicing, and
reduction operations
Basic lib - CUBLAS
• BLAS-- Basic Linear Algebra Subprograms
• CUBLAS
Provide a set of functions for basic vector and
matrix operations, such as matrix‐vector copy,
sort, dot product, Euclidean norm etc
– Real data
• Level 1 (vector-vector O(N) )
• Level 2 (matrix-vector O(N2) )
• Level 3 (matrix-matrix O(N3) )
– Complex data
• Level 1
CUBLAS-Level 2 function
cublasSgbmv()
y = alpha * op(A) * x + beta * y
cublasSgemv()
y = alpha * op(A) * x + beta * y
cublasSger()
A = alpha * x * yT + A
cublasSsbmv()
y = alpha * A * x + beta * y ,
cublasSspmv()
y = alpha * A * x + beta * y
cublasSspr()
A = alpha * x * xT + A
cublasSspr2()
A = alpha * x * yT + alpha * y * xT + A ,
cublasSsymv()
y = alpha * A * x + beta * y
cublasSsyr()
A = alpha * x * xT + A
cublasSsyr2()
A = alpha * x * yT + alpha * y * xT + A ,
cublasStbmv()
x = op(A) * x
cublasStbsv()
op(A) * x = b , output x
Basic lib - CUFFT
• CUFFT is the CUDA FFT library
– Provides a simple interface for computing
parallel FFT on an NVIDIA GPU
– Allows users to leverage the floating-point power
and parallelism of the GPU without having to
develop a GPU-based FFT implementation
– cufftPlan1d() ,cufftPlan2d() ,cufftPlan3d()
Creates a 1D,2D or 3D FFT plan configuration for a
specified signal size
Advanced lib – CULA and MAGMA
• CULA: GPU Accelerated Linear Algebra
– provide LAPACK (Linear Algebra PACKage) function
on CUDA GPUs
• MAGMA: Matrix Algebra on GPU and
Multicore Architectures
– develop a dense linear algebra library similar to
LAPACK but for heterogeneous/hybrid
architectures and "Multicore+GPU" systems
Advanced lib -CULA function
• Linear Equation Routines
– Solves a general system of linear equations AX=B.
• Orthogonal Factorizations
– LQ ,RQ factorization
• Least Squares Routines
• Symmetric and non- Symmetric Eigenvalue
Routines
• Singular Value Decomposition (SVD) Routines
Advanced lib - MAGMA
• LAPACK on CUDA GPUs
– LU, QR, and Cholesky factorizations in both real and
complex arithmetic (single and double)
– Linear solvers based on LU, QR, and Cholesky in real
arithmetic (single and double)
– Mixed-precision iterative refinement solvers based on
LU, QR, and Cholesky in real arithmetic
– Reduction to upper Hessenberg form in real
arithmetic (single and double)
– MAGMA BLAS in real arithmetic (single and double),
Advanced lib -VSIPL
• VSIPL: Vector Image Signal Processing Library
– Generalized matrix product
– Fast FIR filtering
– Correlation
– Fast Fourier Transform
– QR decomposition
– Random number generation
– Elementwise arithmetic, logical, and comparison
operators, linear algebra procedures
CUDA library Summary
• Basic vector or matrix computation
– GPULib, CUBLAS, CUFFT
– vector or matrix: addition, subtraction, multiplication, and
division sin(), cos(), sort, dot product,
• Libraries can be used for Signal Processing
– CULA /MAGMA , VSIPL
– LU, QR, and Cholesky factorizations
– SVD decompostion
CUDA Demo (FIR)
GPU: NVIDIA GeForce 8600 GT
CPU: Intel Duo CPU 2.33G
Software: Visual Studio 2005
CUDA Demo (FIR)
Output NO GPU Run Memory
Total Time
Time(msec) Time(msec) CPU +GPU
1000
0.312121
0.166641
10000
0.667264
0.284254
100000
4.210870
1.489784
1000000
39.460812
5.597150
10000000
391.816345
48.080204
CPU Only
Time(msec)
CUDA Demo (FIR)
FIR Performance
5000
4500
CPU
4000
CPU+GPU
3500
3000
msec
2500
2000
1500
1000
500
0
1000
10000
100000
1000000
10000000
Discuss and future work
• how to connect CUDA to the SSP re-hosting
demo
• how to change the sequential executed codes
in signal processing system to CUDA codes
• how to transfer the XML codes to CUDA codes
to generate the CUDA input.
Reference
• CUDA Zone
http://www.nvidia.com/object/cuda_home_new.ht
ml
• http://en.wikipedia.org/wiki/CUDA

similar documents