CS2403 Programming Languages
Chung-Ta King
Department of Computer Science
National Tsing Hua University
(Slides are adopted from Concepts of Programming Languages, R.W. Sebesta)
Parallel architecture and programming
 Language supports for concurrency
 Controlling concurrent tasks
 Sharing data
 Synchronizing tasks
Sequential Computing
von Neumann arch. with Program Counter (PC)
dictates sequential execution
 Traditional programming thus follows a single
thread of control
 The sequence of program
points reached as control
flows through the
Program counter
(Introduction to Parallel Computing, Blaise Barney)
Sequential Programming Dominates
Sequential programming has dominated
throughout computing history
 Why is there no need to change programming style?
2 Factors Help to Maintain Perf.
IC technology: ever shrinking feature size
 Moore’s law, faster switching, more functionalities
Architectural innovations to remove bottlenecks
in von Neumann architecture
 Memory hierarchy for reducing memory latency:
registers, caches, scratchpad memory
 Hide or tolerate memory latency: multithreading,
prefetching, predication, speculation
 Executing multiple instructions in parallel: pipelining,
multiple issue (in-/out-of-order, VLIW), SIMD
multimedia extensions (inst.-level parallelism, ILP)
(Prof. Mary Hall, Univ. of Utah)
End of Sequential Programming?
Infeasible for continuing improving performance
of uniprocessors
 Power, clocking, ...
Multicore architecture prevails (homogeneous or
 Achieve performance gains with simpler processors
Sequential programming still alive!
 Why?
 Throughput versus execution time
Can we live with sequential prog. forever?
Parallel Programming
A programming style that specify concurrency
(control structure) & interaction (communication
structure) between concurrent subtasks
 Still in imperative language style
Concurrency can be expressed at various levels
of granularity
 Machine instruction level, high-level language
statement level, unit level, program level
Different models assume different architectural
 Look at parallel architectures first
(Ananth Grama, Purdue Univ.)
An Abstract Parallel Architecture
How is parallelism managed?
 Where is the memory physically located?
 What is the connectivity of the network?
(Prof. Mary Hall, Univ. of Utah)
Flynn’s Taxonomy of Parallel Arch.
Distinguishes parallel architecture by instruction
and data streams
 SISD: classical uniprocessor architecture
Single Instruction,
Single Data
Single Instruction,
Multiple Data
Multiple Instruction,
Single Data
Multiple Instruction,
Multiple Data
(Introduction to Parallel Computing, Blaise Barney)
Parallel Control Mechanisms
(Prof. Mary Hall, Univ. of Utah)
2 Classes of Parallel Architecture
Shared memory multiprocessor architectures
 Multiple processors can operate independently but
share the same memory system
 Share a global address space where each processor
can access every memory location
 Changes in a memory location
effected by one processor are
visible to all other processors
 like a bulletin board
(Introduction to Parallel Computing, Blaise Barney;
Prof. Mary Hall, Univ. of Utah)
2 Classes of Parallel Architecture
Distributed memory architectures
 Processing units (PEs) connected by an interconnect
 Each PE has its own distinct address space without a
global address space, and they explicitly
communicate to exchange data
 Ex.: PC clusters of connected by commodity Ethernet
(Introduction to Parallel Computing,
Blaise Barney; Prof. Mary Hall, Univ.
of Utah)
Shared Memory Programming
Often as a collection of threads of control
 Each thread has private data, e.g., local stack, and a
set of shared variables, e.g., global heap
 Threads communicate implicitly by writing and
reading shared variables
 Threads coordinate through locks and barriers
implemented using shared variables
(Prof. Mary Hall,
Univ. of Utah)
Distributed Memory Programming
Organized as named processes
 A process is a thread of control plus local address
space -- NO shared data
 A process cannot see the memory contents of other
processes, nor can it address and access them
 Logically shared data is partitioned over processes
 Processes communicate by explicit send/receive. i.e.,
asking the destination process to access its local data
on behalf of the requesting process
 Coordination is implicit in communication events
 blocking/non-blocking send and receive
(Prof. Mary Hall, Univ. of Utah)
Distributed Memory Programming
Private memory looks like mailbox
(Prof. Mary Hall, Univ. of Utah)
Specifying Concurrency
What language supports are needed for parallel
 Specifying (parallel) control flows
 How to create, start, suspend, resume, stop
processes/threads? How to let one process/thread
explicitly wait for events or another process/thread?
Specifying data flows among parallel flows
 How to pass a data generated by one process/thread
to another process/thread?
 How to let multiple process/thread access common
resources, e.g., counter, with conflicts
Specifying Concurrency
Many parallel programming systems provide
libraries and perhaps compiler pre-processors to
extend a traditional imperative language, such
as C, for parallel programming
 Examples: Pthread, OpenMP, MPI,...
Some languages have parallel constructs built
directly into the language, e.g., Java, C#
 So far, the library approach works fine
Shared Memory Prog. with Threads
Several thread libraries:
 PThreads: the POSIX threading interface
 POSIX: Portable Operating System Interface for UNIX
 Interface to OS utilities
 System calls to create and synchronize threads
OpenMP is newer standard
 Allow a programmer to separate a program into serial
regions and parallel regions
 Provide synchronization constructs
 Compiler generates thread program & synch.
 Extensions to Fortran, C, C++ mainly by directives
(Prof. Mary Hall, Univ. of Utah)
Thread Basics
A thread is a program unit that can be in
concurrent execution with other program units
 Threads differ from ordinary subprograms:
 When a program unit starts the execution of a
thread, it is not necessarily suspended
 When a thread’s execution is completed, control may
not return to the caller
 All threads run in the same address space but have
own runtime stacks
Message Passing Prog. with MPI
MPI defines a standard library for messagepassing that can be used to develop portable
message-passing programs using C or Fortran
 Based on Single Program, Multiple Data (SPMD)
 All communication, synchronization require subroutine
calls  no shared variables
 Program runs on a single processor just like any
uniprocessor program, except for calls to message
passing library
 It is possible to write fully-functional messagepassing programs by using only six routines
(Prof. Mary Hall, Univ. of Utah; Prof. Ananth Grama, Purdue Univ. )
Message Passing Basics
The computing systems consists of p processes,
each with its own exclusive address space
 Each data element must belong to one of the
partitions of the space; hence, data must be explicitly
partitioned and placed
 All interactions (read-only or read/write) require
cooperation of two processes - the process that has
the data and one that wants to access the data
 All processes execute asynchronously unless they
interact through send/receive synchronizations
(Prof. Ananth Grama, Purdue Univ. )
Controlling Concurrent Tasks
 Program starts with a single master thread, from
which other threads are created
errcode = pthread_create(&thread_id,
&thread_fun, &fun_arg);
 Each thread executes a specific function,
thread_fun(), representing thread’s computation
 All threads execute in parallel
 Function pthread_join() suspends execution of
calling thread until the target thread terminates
(Prof. Mary Hall, Univ. of Utah)
Pthreads “Hello World!”
#include <pthread.h>
void *thread(void *vargp);
int main() {
pthread_t tid;
pthread_create(&tid, NULL, thread, NULL);
pthread_join(tid, NULL);
pthread_exit((void *)NULL);
void *thread(void *vargp){
printf("Hello World from thread!\n");
pthread_exit((void *)NULL);
Controlling Concurrent Tasks (cont.)
 Begin execution as a single process and fork multiple
threads to work on parallel blocks of code
 single program multiple data
 Parallel constructs are
specified using
(Prof. Mary Hall, Univ. of Utah)
OpenMP Pragma
All pragmas begin: #pragma
 Compiler calculates loop bounds for each thread and
manages data partitioning
 Synchronization also automatic (barrier)
(Prof. Mary Hall, Univ. of Utah)
OpenMP “Hello World!”
#include <omp.h>
int main (int argc, char *argv[]) {
int th_id, nthreads;
#pragma omp parallel private(th_id)
{ th_id = omp_get_thread_num();
printf("Hello World: %d\n", th_id);
#pragma omp barrier
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf("%d threads\n",nthreads); }
Controlling Concurrent Tasks (cont.)
 The concurrent units in Java are methods named run
 A run method code can be in concurrent execution
with other such methods
 The process in which the run methods execute is
called a thread
Class myThread extends Thread {
public void run () {...}
Thread myTh = new MyThread ();
Controlling Concurrent Tasks (cont.)
Java Thread class has several methods to
control the execution of threads
 The yield is a request from the running thread to
voluntarily surrender the processor
 The sleep method can be used by the caller of the
method to block the thread
 The join method is used to force a method to delay
its execution until the run method of another thread
has completed its execution
Controlling Concurrent Tasks (cont.)
Java thread priority:
 A thread’s default priority is the same as the thread
that create it
 If main creates a thread, its default priority is
 Threads defined two other priority constants,
 The priority of a thread can be changed with the
methods setPriority
Controlling Concurrent Tasks (cont.)
 Programmer writes the code for a single process and
the compiler includes necessary libraries
mpicc -g -Wall -o mpi_hello mpi_hello.c
 The execution environment starts parallel processes
mpiexec -n 4 ./mpi_hello
(Prof. Mary Hall, Univ. of Utah)
MPI “Hello World!”
#include "mpi.h"
int main(int argc, char *argv[]) {
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf(”Hello World from process %d of
%d\n", rank, size);
return 0;
(Prof. Mary Hall, Univ. of Utah)
Sharing Data
 Variables declared outside of main are shared
 Object allocated on the heap may be shared (if
pointer is passed)
 Variables on the stack are private: passing pointer to
these around to other threads can cause problems
 Shared variables can be read and written directly by
all threads  need synchronization to prevent races
 Synchronization primitives, e.g., semaphores, locks,
mutex, barriers, are used to sequence the executions
of the threads to indirectly sequence the data passed
through shared variables
(Prof. Mary Hall, Univ. of Utah)
Sharing Data (cont.)
 shared variables are shared; default is shared
 private variables are private
 Loop index is private
int bigdata[1024];
void* foo(void* bar) {
int tid;
#pragma omp parallel \
shared (bigdata) private (tid)
/* Calc. here */ }
(Prof. Mary Hall, Univ. of Utah)
Sharing Data (cont.)
int main( int argc, char *argv[]) {
int rank, buf;
MPI_Status status;
MPI_Init(&argv, &argc);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
buf = 123456;
MPI_Send(&buf, 1, MPI_INT, 1, 0,
else if (rank == 1) {
MPI_Recv(&buf, 1, MPI_INT, 0, 0,
MPI_COMM_WORLD, &status);}
(Prof. Mary Hall, Univ. of Utah)
Synchronizing Tasks
A mechanism that controls the order in which
tasks execute
 Two kinds of synchronization
 Cooperation: one task waits for another, e.g., for
passing data
task 1
task 2
a = ...
... = ... a ...
 Competition: tasks compete for exclusive use of
resource without specific order
task 1
task 2
sum += local_sum
sum += local_sum
Synchronizing Tasks (cont.)
 Provide various synchronization primitives, e.g.,
mutex, semaphore, barrier
 Mutex: protects critical sections -- segments of code
that must be executed by one thread at any time
 Protect code to indirectly protect shared data
 Semaphore: synchronizes between two threads using
sem_post() and sem_wait()
 Barrier: synchronizes threads to reach the same point
in code before going any further
Pthreads Mutex Example
pthread_mutex_t sum_lock;
int sum;
main() {
pthread_mutex_init(&sum_lock, NULL);
void *find_min(void *list_ptr) {
int my_sum;
sum += my_sum;
Synchronizing Tasks (cont.)
 OpenMP has reduce operation
sum = 0;
#pragma omp parallel for reduction(+:sum)
for (i=0; i < 100; i++) {
sum += array[i]; }
 OpenMP also has critical directive that is executed by
all threads, but restricted to only one thread at a time
#pragma omp critical [( name )] new-line
sum = sum + 1;
(Prof. Mary Hall, Univ. of Utah)
Synchronizing Tasks (cont.)
 A method that includes the synchronized modifier
disallows any other method from running on the
object while it is in execution
public synchronized void deposit(int i)
public synchronized int fetch() {…}
 The above two methods are synchronized which
prevents them from interfering with each other
Synchronizing Tasks (cont.)
 Cooperation synchronization is achieved via wait,
notify, and notifyAll methods
 All methods are defined in Object, which is the root
class in Java, so all objects inherit them
 The wait method must be called in a loop
 The notify method is called to tell one waiting
thread that the event it was waiting has happened
 The notifyAll method awakens all of the threads
on the object’s wait list
Synchronizing Tasks (cont.)
 Use send/receive to complete task synchronizations,
but semantics of send/receive have to be specialized
 Non-blocking send/receive:
 Non-blocking send/receive: send() and receive() calls
will return no matter whether data has arrived
 Blocking send/receive:
 Unbuffered blocking send() does not return until
matching receive() is encountered at receiving process
 Buffered blocking send() will return after the sender
has copied the data into the designated buffer
 Blocking receive() forces the receiving process to wait
(Prof. Ananth Grama, Purdue Univ. )
Unbuffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
Buffered Blocking
(Prof. Ananth Grama, Purdue Univ. )
Concurrent execution can be at the instruction,
statement, subprogram, or program level
 Two fundamental programming style: shared
variables and message passing
 Programming languages must provide supports
for specifying control and data flows

similar documents