Linear Algebra Libraries - svmoore

```Linear Algebra Libraries: BLAS,
LAPACK, ScaLAPACK, PLASMA,
MAGMA
Shirley Moore
[email protected]
CPS5401 Fall 2014
svmoore.pbworks.com
November 24, 2014
1
Learning Objectives
• After completing this lesson, you should be able to
– List and describe advantages of using linear algebra
libraries
– List types of computations performed by linear algebra
libraries
– Describe functionality of the BLAS
– Locate and use documentation on linear algebra libraries
– Insert calls to linear algebra library routines into your
program and compile and run the resulting program
– Describe current research on numerical linear algebra for
multicore and heterogeneous architectures
2
Numerical Linear Algebra
• Algorithms for performing matrix operations on
computers
• Widely used in scientific, engineering, and
financial applications
• Fundamental algorithms
–
–
–
–
–
Basic matrix and vector operations
LU decomposition
QR decomposition
Singular value decomposition
Eigenvalues
3
BLAS
• Basic Linear Algebra Subprograms
• De facto standard (all implementations use the same
calling interface)
• First published in 1979
• http://www.netlib.org/blas/
• BLA Quick Reference Guide:
http://www.netlib.org/lapack/lug/node145.html
• Tuned versions implemented by vendors (Intel MKL,
AMD ACML, Cray LibSci, IBM ESSL)
• Routines to perform basic operations such as vector
and matrix multiplication
4
BLAS Functionality and Levels
• Level 1
This level contains vector operations of the form
as well as scalar dot products and vector norms, among other things.
• Level 2
This level contains matrix-vector operations of the form
as well as solving for with being triangular, among other things.
• Level 3
This level contains matrix-matrix operations of the form
as well as solving
for triangular matrices , among other
things. This level contains the widely used General Matrix Multiply (GEMM)
operation.
5
General Matrix Multiply (GEMM)
•where TRANSA and TRANSB determine if the matrices A and B are to be
transposed
•M is the number of rows in matrix C and, depending on TRANSA, the
number of rows in the original matrix A or its transpose.
•N is the number of columns in matrix C and, depending on TRANSB, the
number of columns in the matrix B or its transpose.
•K is the number of columns in matrix A (or its transpose) and rows in
matrix B (or its transpose).
•LDA, LDB and LDC specify the size of the first dimension of the matrices,
as laid out in memory; meaning the memory distance between the start
of each row/column, depending on the memory structure.
•Precision (x) – S for single, D for double, C for complex single, Z for
complex double
6
LAPACK
•
•
•
•
•
•
•
•
•
Linear Algebra PACKage
www.netlib.org/lapack/
De facto standard
Successor to the linear equations and linear least-squares routines of
LINPACK and the eigenvalue routines of EISPACK
Routines for solving systems of linear equations, linear least squares,
eigenvalue problems, and singular value decomposition
Routines to implement the associated matrix factorizations such as LU,
QR, Cholesky and Schur decomposition
Handles real and complex matrices in both single and double precision
Depends on the BLAS to effectively exploit caches on modern cache-based
architectures
Tuned versions implemented in vendor libraries (e.g., AMD ACML, Intel
MKL, Cray LibSci, IBM ESSL)
7
LAPACK Naming Scheme
• A LAPACK subroutine name is in the form pmmaaa, where:
– p is a one-letter code denoting the type of numerical constants used.
S, D stand for real floating point arithmetic respectively in single and
double precision, while C and Z stand for complex arithmetic with
respectively single and double precision.
– mm is a two-letter code denoting the kind of matrix expected by the
algorithm. The actual data are stored in a different format depending
on the specific kind; e.g., when the code DI is given, the subroutine
expects a vector of length n containing the elements on the diagonal,
while when the code GE is given, the subroutine expects an n×n array
containing the entries of the matrix.
– aaa is a one- to three-letter code describing the actual algorithm
implemented in the subroutine, e.g. SV denotes a subroutine to solve
linear system
• For example, the subroutine to solve a linear system with a general
(non-structured) matrix using real double-precision arithmetic is
called DGESV.
• For details, see the LAPACK User’s Guide at
www.netlib.org/lapack/lug/
8
Intel MKL
• Stands for Math Kernel Library
• https://software.intel.com/en-us/intel-mkl
• Vectorized and threaded linear algebra, FFTs, and
statistics functions
• Uses standard BLAS and LAPACK APIs
• MIT’s FFTW C interface
• Direct sparse solver (not standardized)
• Support for AVX-512 Advanced Vector Extensions
9
ACML
• AMD Core Math Library
• http://developer.amd.com/tools/cpu-development/amdcore-math-library-acml/
• ACML consists of the following main components:
– A full implementation of Level 1, 2 and 3 Basic Linear Algebra
Subprograms (BLAS), with optimizations for AMD Opteron
processors.
– A full suite of Linear Algebra (LAPACK) routines.
– A comprehensive suite of Fast Fourier transform (FFTs) in single, double-, single-complex and double-complex data types.
– Fast scalar, vector, and array math transcendental library
routines
– Random Number Generators in both single- and doubleprecision
10
ScaLAPACK
• Scalable Linear Algebra PACKage
• www.netlib.org/scalapack/
• Library of high-performance linear algebra routines for parallel
distributed memory machines
• Solves dense and banded linear systems, least squares problems,
eigenvalue problems, and singular value problems
• Key ideas
– block cyclic data distribution for dense matrices and a block data
distribution for banded matrices, parameterizable at runtime
– block-partitioned algorithms to ensure high levels of data reuse
• Efficient low-level communication implemented by BLACS (Basic
Linear Algebra Communication Subprograms)
• Will run on any machine with BLAS, LAPACK, and BLACS
11
Current Efforts
• Parallel Linear Algebra Software for Multicore
Architectures (PLASMA)
– www.netlib.org/plasma/
– http://icl.eecs.utk.edu/plasma/
• Matrix Algebra on GPU and Multicore
Architectures (MAGMA)
– http://icl.eecs.utk.edu/magma/
• OpenBLAS
– http://c2.com/cgi/wiki?OpenBlas
12
MKL on hpc.utep.edu
• Libraries in /shared/intel
• Examples in
/shared/intel_11/Compiler/11.1/080/mkl/exa
mples
• See documentation at
https://software.intel.com/en-us/articles/intel-math-kernel-library-documentation
13
Scalapack
• www.citutor.org
– Introduction to MPI Chapter 10
14
```