Part I – Interacting with Matlab

Report
MATLAB
CENTER
FOR
INTEGRATED RESEARCH
COMPUTING
http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Running Matlab Interactively
 To use Matlab's GUI you must connect through suitable
environment
 Why NX?

Faster than using X11 forwarding (compresses data)

Has clients for all major operating systems

Saves your session when you are disconnected

You don’t have to restart Matlab if your network connection drops.
 Instructions for obtaining/installing/connecting through NX
can be found at:
http://www.circ.rochester.edu/wiki/index.php/NX_Cluster
Running Matlab Interactively
 To use GUI you must connect through suitable environment
 Why NX?

Faster than using X11 forwarding (compresses data)

Has clients for all major operating systems

Saves your session when you are disconnected

You don’t have to restart Matlab if your network connection drops.
http://www.circ.rochester.edu/wiki/index.php/NX_Cluster
 The link to Matlab on the NX desktop menu bar actually
launches a script that submits a job to the blue hive cluster. It
does not run Matlab locally, but instead uses X11 forwarding
between compute nodes and the NX server.
Running Matlab Interactively
 We could also launch a terminal on the NX desktop and submit
an interactive job from there.
Running Matlab Interactively
 We could also launch a terminal on the NX desktop and submit
an interactive job from there.
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
module load matlab-R2013a-local
matlab -singlecompthread
Running Matlab Interactively
 We could also launch a terminal on the NX desktop and submit
an interactive job from there.
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
module load matlab
matlab -singlecompthread
Occasionally the Matlab Desktop will respond slowly to
commands which can be VERY frustrating. One work around is
to use the terminal window as the "desktop" – while still retaining
the ability to plot windows / access help etc...

matlab -nodesktop -nosplash
Running Matlab Interactively
 We could also launch a terminal on the NX desktop and submit
an interactive job from there.
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
module load matlab
matlab -singlecompthread
Occasionally the Matlab Desktop will respond slowly to
commands which can be VERY frustrating. One work around is
to use the terminal window as the Desktop – while still retaining
the ability to plot windows / access help etc...

matlab -nodesktop -nosplash
And finally you may not need to plot anything on the screen – or
use any of the GUI features. In that case you can...

matlab -nodisplay
Running Matlab Interactively
 If you are running Matlab without a connected display you can still
make plots directly to a file in Matlab
H=hilb(1000);
Z=fft2(H);
f=figure('Visible','off'); imagesc(log(abs(Z))); print('-dpdf','-r300', 'fig1.pdf')
 You may also find it useful to enter many commands into a script file
and then execute the script – so you can do something else while
Matlab creates several figures etc... This is also a good way to develop a
script for batch jobs.
Running Matlab Interactively
 If you are running a machine that has an X-server – you can
bypass NX and just use X11 Forwarding. Though if your
connection drops – your Matlab session (and your interactive job)
will terminate
ssh -X [email protected]
qsub -I -X -q interactive –l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
 Also if you do use NX and you finish using Matlab – please
terminate your session instead of just disconnecting. This will
cleanup any jobs you have running and free up resources for other
users.
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Running Matlab in Batch Mode
 To submit a job in batch mode we need to create a batch script
#PBS -N Matlab
sample_script.pbs
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "sample_script"
 And a Matlab script containing the commands to run
H=hilb(1000);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300','fig1-batch.pdf');
sample_script.m
 And we should place both files in a folder on /scratch where we
will submit the job from.
qsub sample_script.pbs
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Using Job Arrays in Batch Mode
 To use a job array we need to use the “-t” PBS option
#PBS -t 0-3
sample_script.pbs
#PBS -N Matlab
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "sample_function($PBS_ARRAYID)"
 And turn our Matlab script into a function that takes arguments.
(sample_function.m)
sample_function.m
function sample_function(n)
H=hilb(n);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300', sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Symmetric Multi-Processing
 By default Matlab uses all cores on a given node for operations
that can be threaded – regardless of the submission script.
Arrays and matrices • Basic information: ISFINITE, ISINF, ISNAN, MAX, MIN • Operators: +, -, .*, ./, .\, .^, *, ^, \ (MLDIVIDE), /
Symmetric Multi-Processing
 To be sure you only use the resources you request, you should
either request an entire node and all of the CPU’s...
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=8,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab
 Or request a single cpu and specify that Matlab should only use a
single thread
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=1,vmem=4gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -singleCompThread
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Using GPUs with Matlab
 Matlab can use GPUs to do calculations, provided a GPU is
available on the node Matlab is running on.
qsub -I -X -q blugpu -l walltime=1:00:00,nodes=1:ppn=1:gpus=1,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
module load cuda
matlab
 We can query the connected GPUs from within Matlab using
gpuDeviceCount()
gpuDevice()
Using GPUs with Matlab
 Matlab can use GPUs to do calculations, provided a GPU is
available on the node Matlab is running on.
qsub -I -X -q blugpu -l walltime=1:00:00,nodes=1:ppn=1:gpus=1,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
module load cuda
matlab
 We can query the connected GPUs from within Matlab using
gpuDeviceCount()
gpuDevice()
 And obtain a list of GPU supported functions using
methods('gpuArray')
Using GPUs with Matlab
 So there is a 2D FFT – but no Hilbert function...
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
Z=gather(Z_);
imagesc(log(abs(Z)));
Distribute data to GPU
FFT performed on GPU
Gather data from GPU onto CPU
 We could do the log and abs functions on the GPU as well.
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
imagesc(gather(log(abs(Z_)));
Using GPUs with Matlab
 For our example, doing the FFT on the GPU is 7x faster. (4x if you
include moving the data to the GPU and back)
>> H=hilb(5000);
>> tic; A=gather(gpuArray(H)); toc
Elapsed time is 0.161166 seconds.
>> tic; A=gather(fft2(gpuArray(H))); toc
Elapsed time is 0.348159 seconds.
>> tic; A=fft2(H); toc
Elapsed time is 1.210464 seconds.
Using GPUs with Matlab
 Matlab has no built in hilb() function that can run on the GPU –
but we can write our own function(kernel) in cuda – and save it to
hilbert.cu
__global__ void HilbertKernel( double * const out, size_t const numRows, size_t const
numCols)
{
const int rowIdx = blockIdx.x * blockDim.x + threadIdx.x;
const int colIdx = blockIdx.y * blockDim.y + threadIdx.y;
if ( rowIdx >= numRows ) return;
if ( colIdx >= numCols ) return;
size_t linearIdx = rowIdx + colIdx*numRows;
out[linearIdx] = 1.0 / (double)(1+rowIdx+colIdx) ;
}
 And compile it with nvcc to generate a Parallel Thread eXecution
file
nvcc -ptx hilbert.cu
Using GPUs with Matlab
 We have to initialize the kernel and specify the grid size before
executing the kernel.
H_=gpuArray.zeros(1000);
hilbert_kernel=parallel.gpu.CUDAKernel('hilbert.ptx','hilbert.cu');
hilbert_kernel.GridSize=size(H_);
H_=feval(hilbert_kernel, H_, 1000,1000);
Z_=fft2(H_);
imagesc(gather(log(abs(Z_))));
 The default for matlab kernel’s is 1 thread per block, but we could
create fewer blocks that were each 10 x 10 threads.
hilbert_kernel.ThreadBlockSize=[10,10,1];
hilbert_kernel.GridSize=[100,100];
 In any event, our speedup is a factor of 50 compared to 1 CPU.
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Parallel Computing Toolbox
 As an alternative you can also use the Parallel Computing
Toolbox. This supports parallelism via MPI
qsub -I -X -q interactive -l walltime=1:00:00,nodes=1:ppn=8,vmem=16gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -singleCompThread
 You can enable a pool of matlab workers using matlabpool
matlabpool(8)
 You should create a pool that is the same size as the number of
processors you requested in your job submission. Matlab also sells
licenses for using a Distributed Computing Server which allows for
matlabpools that use more than just the local node.
Parallel Computing Toolbox
 You can achieve parallelism in several ways:

parfor loops – execute for loops in parallel

smpd – execute instructions in parallel (using ‘labindex’ or ‘drange’)

pmode – interactive version of smpd

distributed arrays – very similar to gpuArrays.
Parallel Computing Toolbox
 You can achieve parallelism in several ways:

parfor loops – execute for loops in parallel

smpd – execute instructions in parallel (using ‘labindex’ or ‘drange’)

pmode – interactive version of smpd

distributed arrays – very similar to gpuArrays.
matlabpool(4)
parfor n=1:100
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off'); imagesc(log(abs(Z)));
print('-dpdf','-r300', sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
end
matlabpool close
Parallel Computing Toolbox
 You can achieve parallelism in several ways:

parfor loops – execute for loops in parallel

smpd – execute instructions in parallel (using ‘labindex’ or ‘drange’)

pmode – interactive version of smpd

distributed arrays – very similar to gpuArrays.
matlabpool(4)
spmd
for n=drange(1:100)
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off');
imagesc(log(abs(Z)));
end
end
matlabpool close
matlabpool(4)
spmd
for n=labindex:numlabs:100
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off');
imagesc(log(abs(Z)));
end
end
matlabpool close
Parallel Computing Toolbox
 You can achieve parallelism in several ways:

parfor loops – execute for loops in parallel

smpd – execute instructions in parallel (using ‘labindex’ or ‘drange’)

pmode – interactive version of smpd

distributed arrays – very similar to gpuArrays.
pmode start 4
n=labindex;
H=hilb(n);
Z=fft2(H);
f=figure('Visible','off'); imagesc(log(abs(Z)));
print('-dpdf','-r300', sprintf('%s%03d%s','fig1-batch_',n,'.pdf'));
pmode lab2client H 3 H3
H3
pmode close
Parallel Computing Toolbox
 You can achieve parallelism in several ways:

parfor loops – execute for loops in parallel

smpd – execute instructions in parallel (using ‘labindex’ or ‘drange’)

pmode – interactive version of smpd

distributed arrays – very similar to gpuArrays
Example using distributed arrays
Example using gpuArray
H=hilb(1000);
H_=gpuArray(H);
Z_=fft2(H_);
Z=gather(Z_);
imagesc(log(abs(Z)));
matlabpool(8)
H=hilb(1000);
H_=distributed(H);
Z_=fft(fft(H_,[],1),[],2);
Z=gather(Z_);
imagesc(log(abs(Z)));
matlabpool close
Parallel Computing Toolbox
 What about building hilbert matrix in parallel?
matlabpool(4)
spmd
Define partition
codist=codistributor1d(1,[250,250,250,250],[1000,1000]);
[i_lo, i_hi]=codist.globalIndices(1);
Get local indices in x-direction
Allocate space for local part
H_local=zeros(250,1000);
for i=i_lo:i_hi
for j=1:1000
Initialize local array with
H_local(i-i_lo+1,j)=1/(i+j-1);
Hilbert values.
end
end
H_ = codistributed.build(H_local, codist);
Assemble codistributed array
end
Now it's distributed like before!
Z_=fft(fft(H_,[],1),[],2);
Z=gather(Z_);
imagesc(log(abs(Z)));
matlabpool close
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Using the Matlab Distributed Compute Engine
 To get started, first cd into an empty directory
and run
mdce_init
 This will generate 4 files:

mdce_job.pbs – pbs submission script

mdce_script.m – sample matlab script that uses parallel computing toolbox

mdce_profile.m – matlab function that uses your environment variables to locate the
matlab compute cluster for your job

mdce_cleanup is an epilogue script that cleans up the matlab distributed compute
server when your job terminates
 Then you can submit the sample job with
qsub mdce_job.pbs
Using the Matlab Distributed Compute Engine
 Here is the job submission script
#!/bin/bash
#PBS -N Matlab_mdce
#PBS -j oe
#PBS -l nodes=2:ppn=8,pvmem=2000mb
#PBS -l walltime=1:00:00
#PBS -l epilogue=mdce_cleanup
#PBS -q standard
#PBS -o matlab.log
. /usr/local/modules/init/bash
module load matlab-R2013a-local
cd $PBS_O_WORKDIR
pbs_mdce_start
matlab -nodisplay -r "mdce_script"
mdce_job.pbs
This epilogue script is
important to ensure that the
cluster is taken down when
your job terminates
Note that other versions of
matlab could take hours to
start the matlab cluster!!!
 This script loads the matlab module, starts the cluster with
pbs_mdce_start, and runs the matlab script "mdce_script.m"
Using the Matlab Distributed Compute Engine
 And here is the sample matlab script
profile=mdce_profile()
mdce_script.m
matlabpool('open', profile)
parfor n=1:matlabpool('size')
H=hilb(n);
Z=fft2(H);
imagesc(log(abs(Z)));
print('-dpdf','-r300',sprintf('%s%03d%s','fig1-batch',n,'.pdf'));
end
matlabpool('close')
 The mdce_profile() function returns a profile that can be used to
connect to the mdce cluster for your job. You can then use
matlabpool or pmode, or spmd etc... to startup parallel
computations across the matlab cluster.
Using the Matlab Distributed Compute Engine
 For interactive mode, you can use the qMatlab_mdce script. This
script will inherit your matlab path from your environment, so be
sure to load the matlab-R2013a-local module to speed up the
initilization of the cluster.
mkdir /scratch/jcarrol5/matlab_mdce
cd /scratch/jcarrol5/matlab_mdce
module load matlab-R2013a-local
qMatlab_mdce 4 8 16
 This will create a matlab cluster which in this case consists of 4
nodes each with 8 workers and 16 GB of memory per. To use the
matab cluster, load the profile using the mdce_profile() function
and then open the pool of workers with matlabpool – or pmode
etc...profile=mdce_profile()
matlabpool('open', profile)
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Using pMatlab
 pMatlab is an alternative method to get distributed matlab
functionality without relying on Matlab’s Distributed Computing
Server.
 It is built on top of MapMPI (an MPI implementation for matlab
– written in matlab - that uses file I/O for communication)
 It supports various operations on distributed arrays (up to 4D)

Remapping, aggregating, finding non-zero entries, transposing, ghosting

Elementary math functions (trig, exponential, complex, remainder/rounding)

2D Convolutions, FFTs, Discrete Cosine Transform
FFT's
need to be properly mapped (cannot be distributed along transform dimension).
 It does not have as much functionality as the parallel
computing toolbox – but it does support ghosting and more
flexible partitioning!
Using pMatlab
 Since pMatlab works by launching other Matlab instances – we
need them to startup with pMatlab functionality. To do so we need
to add a few lines to our startup.m file in our matlab path.
addpath('/usr/local/pMatlab/MatlabMPI/src');
addpath('/usr/local/pMatlab/src');
rehash;
pMatlabGlobalsInit;
Running pMatlab in Batch Mode
 To submit a job in batch mode we need to create a batch script
#PBS -N Matlab
sample_script.pbs
#PBS -q standard
#PBS -l walltime=1:00:00,nodes=2:ppn=8,vmem=32gb,pvmem=-1
. /usr/local/modules/init/bash
module load matlab
matlab -nodisplay -r "pmatlab_launcher"
 And a Matlab script to launch the pMatlab script
[sreturn, nProcs]=system('cat $PBS_NODEFILE | wc -l');
pmatlab_launcher.m
nProcs=str2num(nProcs);
[sreturn, machines]=system('cat $PBS_NODEFILE | uniq');
machines=regexp(machines, '\n', 'split');
machines=machines(1:size(machines,2)-1);
eval(pRUN('pmatlab_script',nProcs,machines));
Running pMatlab in Batch Mode
 And finally we have our pmatlab script.
Xmap=map([Np 1],{},0:Np-1);
H_=zeros(1000,1000,Xmap);
[I1,I2]=global_block_range(H_);
H_local=zeros(I1(2)-I1(1)+1,I2(2)-I2(1)+1);
for i=I1(1):I1(2)
for j=I2(1):I2(2)
H_local(i-I1(1)+1,j-I2(1)+1)=1/(i+j-1);
end
end
H_=put_local(H_,H_local);
Z_=fft(fft(H_,[],2),[],1);
Z=agg(Z_);
if (pMATLAB.my_rank == pMATLAB.leader)
f=figure('Visible','off');
imagesc(log(abs(Z)));
print('-dpdf','-r300', 'fig1.pdf');
end
map for distributing array
Distributed matrix constructor
Indices for local portion of array
Allocate and populate local
portion of array with
Hilbert matrix values
X = put_local(X,
fft(local(X),[],2));
Copy
local values into
distributed array
Z=
transpose_grid(X);
Do
y-fft
and do x-fft. Z_ has different map
Z = put_local(Z,
fft(local(Z),[],1));
Collect
resulting matrix
onto 'leader'
Plot result from 'leader' matlab
process
pmatlab_script.m
Using pMatlab
 PBS is unaware of matlab sessions launched from 'pRUN' and
therefore cannot properly clean up if something goes wrong (job runs
out of walltime etc...) To avoid leaving orphaned Matlab processes on
other machines, modify your PBS script
#PBS -l epilogue=epilogue_script.sh
to run this epilogue script which must have user-execute permissions
#!/bin/bash
cd $PBS_O_WORKDIR/MatMPIa
echo "running prologue"
pwd;
for i in `ls pid.*`; do
machine=`echo $i | awk -F '.' '{print $2}'`;
pid=`echo $i | awk -F '.' '{print $3}'`\;
ssh $machine "(kill -9 $pid)" && rm -rf $i;
done
epilogue_script.sh
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Compiling Mex Code
 There is a configuration file for mex that you can place in your
~/.matlab/R2012b/ folder – or whatever version of matlab you are
using. The file can be downloaded from the CIRC wiki
http://www.circ.rochester.edu/wiki/index.php/Mexopts.sh
Compiling Mex Code
 C, C++, or Fortran routines can be called from within Matlab.
#include "fintrf.h"
subroutine mexfunction(nlhs, plhs, nrhs, prhs)
mwPointer :: plhs(*), prhs(*)
integer :: nlhs, nrhs
mwPointer :: mxGetPr
mwPointer :: mxCreateDoubleMatrix
real(8) :: mxGetScalar
mwPointer :: pr_out
integer :: n
n = nint(mxGetScalar(prhs(1)))
plhs(1) = mxCreateDoubleMatrix(n,n, 0)
pr_out = mxGetPr(plhs(1))
call compute(%VAL(pr_out),n)
end subroutine mexfunction
subroutine compute(h, n)
integer :: n
real(8) :: h(n,n)
integer :: i,j
do i=1,n
do j=1,n
h(i,j)=1d0/(i+j-1d0)
end do
end do
end subroutine compute
mex hilbert.f90
>> H=hilbert(10)
Outline
Part I – Interacting with Matlab
 Running Matlab interactively
 Accessing the GUI
 Using the terminal for command entry
 Using just the terminal
 Running Matlab in batch mode
 Using PBS job arrays to do embarrassingly parallel computations
Part II – Speeding up Matlab Computations
 Symmetric Multi-Processing with Matlab
 Accelerating Matlab computations with GPUs
 Running Matlab in distributed memory environments
 Using the Parallel Computing Toolbox
 Using the Matlab Distributed Compute Engine Server
 Using pMatlab
Part III – Mixing Matlab and Fortran/C code
 Compiling MEX code from C/Fortran
 Turning Matlab routines into C code
Turning Matlab code into C
 First we create a log_abs_fft_hilb.m function
function result = log_abs_fft_hilb(n)
result=log(abs(fft2(hilb(n))));
 And then we run
>> codegen log_abs_fft_hilb.m –args {uint32(0)}
 This will produce a mex file that we can test.
>> A=log_abs_fft_hilb_mex(uint32(16));
>> B=log_abs_fft_hilb(16);
>> max(max(abs(A-B)))
ans = 8.8818e-16
 We could have specified the type of 'n' in our matlab function
function result = log_abs_fft_hilb(n)
assert(isa(n,'uint32'));
result=log(abs(fft2(hilb(n))));
Turning Matlab code into C
 Now we can also export a static library that we can link to:
>> codegen log_abs_fft_hilb.m -config coder.config('lib') -args {'uint32(0)'}
 This will create a subdirectory codegen/lib/log_abs_fft_hilb that
will have the source files '.c and .h' as well as a compiled object files
'.o' and a library 'log_abs_fft_hilb.a'
 The source files are portable to any platform with a 'C' compiler
(ie BlueStreak). We can rebuild the library on BlueStreak by
running
mpixlc –c *.c
ar rcs log_abs_fft_hilb.a *.o
Turning Matlab code into C
 To use the function, we still need to write a main subroutine that
links to it. This requires working with matlab's variable types
(which include dynamically resizable arrays)
#include "stdio.h"
#include "rtwtypes.h"
Matlab type definitions
#include "log_abs_fft_hilb_types.h"
void main() {
uint32_T n=64;
Argument to Matlab function
emxArray_real_T *result;
Return value of Matlab function
int32_T i,j;
emxInit_real_T(&result, 2);
Initialize Matlab array to have rank 2
log_abs_fft_hilb(n, result);
Call matlab function
for(i=0;i<result->size[0];i++) {
for(j=0;j<result->size[1];j++) {
printf("%f ",result->data[i+result->size[0]*j]);
Output result in
}
column major order
printf("\n");
}
Free up memory associated with return array
emxFree_real_T(&result);
}
Exported code was 2x slower.
Turning Matlab code into C
 And here is another example of calling 2D fft's on real data
void main() {
int32_T q0;
int32_T i;
int32_T n=8;
emxArray_creal_T *result;
emxArray_real_T *input;
emxInit_creal_T(&result, 2);
emxInit_real_T(&input, 2);
q0 = input->size[0] * input->size[1];
input->size[0]=n;
input->size[1]=n;
emxEnsureCapacity((emxArray__common *)input,
q0, (int32_T)sizeof(real_T));
for(j=0;j<input->size[1];j++ {
for(i=0;i<input->size[0];i++) {
input->data[i+input->size[0]*j]=1.0 / (real_T)(i+j+1);
}
}
my_fft(input, result);
for(i=0;i<result->size[0];i++) {
for(j=0;j<result->size[1];j++) {
printf("[% 10.4f,% 10.4f] ",
result->data[i+result->size[0]*j].re,
result->data[i+result->size[0]*j].im);
}
printf("\n");
}
emxFree_creal_T(&result);
emxFree_real_T(&input);
}
Turning Matlab code into C
 Exported FFT's only work on vectors of length 2N
 Error checking is disabled in exported C code
 Mex code will have the same functionality as exported C code, but
will also have error checking. It will warn about doing FFT's on
arbitrary length vectors, etc...
 Always test your mex code!
Matlab code is not that different from C code
#include <stdio.h>
#include <math.h>
#include <complex.h>
#include <fftw3.h>
void main() {
int n=4096;
int i,j;
double complex temp[n][n], input[n][n];
double result[n][n];
fftw_plan p;
p=fftw_plan_dft_2d(n, n, &input[0][0], &temp[0][0],
FFTW_FORWARD, FFTW_ESTIMATE);
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
input[i][j]=(double complex)(1.0/(double)(i+j+1));
}
}
fftw_execute(p);
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
result[i][j]=log(cabs(temp[i][j]));
}
}
for (i=0;i<n;i++){
for(j=0;j<n;j++) {
printf("%f ",result[i][j]);
}
}
fftw_destroy_plan(p);
}
Or you can write your own
'C' code that uses open
source mathematical
libraries (ie fftw).

similar documents