C++11 in Parallel

Report
Joe Hummel, PhD
UC-Irvine
[email protected]
http://www.joehummel.net/downloads.html

New standard of C++ has been ratified
◦ “C++0x” ==> “C++11”

Lots of new features

Workshop will focus on concurrency features
Going Parallel with C++11
2
 Async programming:
 Parallel programming:

Better responsiveness…

Better performance…

GUIs (desktop, web, mobile)

Financials

Cloud

Scientific

Windows 8

Big data
Going Parallel with C++11
3
#include <thread>
#include <iostream>
A simple function for thread to do…
void func()
{
std::cout << "**Inside thread "
<< std::this_thread::get_id() << "!" << std::endl;
}
int main()
{
std::thread t;
t = std::thread( func );
t.join();
return 0;
Create and schedule thread…
Wait for thread to finish…
}
Going Parallel with C++11
4

Hello world…
Going Parallel with C++11
5
#include <thread>
#include <iostream>
(1) Thread function must do exception handling;
unhandled exceptions ==> termination…
void func()
{
std::cout << "**Hello world...\n";
}
int main()
{
std::thread t;
t = std::thread( func );
void func()
{
try
{
// computation:
}
catch(...)
{
// do something:
}
}
t.join();
return 0;
}
Going Parallel with C++11
(2) Must join, otherwise termination… (avoid
use of detach( ), difficult to use safely)
6

Old school:
◦ distinct thread functions (what we just saw)

New school:
◦ lambda expressions (aka anonymous functions)
Going Parallel with C++11
7

A thread that loops until we tell it to stop…
When user presses
ENTER, we’ll tell
thread to stop…
Going Parallel with C++11
8
void loopUntil(bool *stop)
{
#include <thread>
#include <iostream>
#include <string>
auto duration = chrono::seconds(2);
using namespace std;
.
.
.
}
while (!(*stop))
{
cout << "**Inside thread...\n";
this_thread::sleep_for(duration);
}
int main()
{
bool
stop(false);
thread t(loopUntil, &stop);
getchar();
// wait for user to press enter:
stop = true;
t.join();
return 0;
// stop thread:
}
Going Parallel with C++11
9
Closure semantics:
[ ]: none, [&]: by ref, [=]: by val, …
int main()
{
bool
thread
{
lambda arguments
stop(false);
t [&]
= ()
thread( [&]()
t(
auto duration = chrono::seconds(2);
while (!stop)
{
cout
cout<<
<<"**Inside
"**Insidethread...\n";
thread...\n";
this_thread::sleep_for(duration);
this_thread::sleep_for(duration);
}
lambda expression
}
);
getchar();
stop = true;
t.join();
return 0;
}
// wait for user to press enter:
// stop thread:
10

Lambdas:
◦ Easier and more readable -- code remains inline
◦ Potentially more dangerous ([&] captures everything by ref)

Functions:
◦ More efficient -- lambdas involve class, function objects
◦ Potentially safer -- requires explicit variable scoping
◦ More cumbersome and illegible
Going Parallel with C++11
11

Multiple threads looping…
When user presses
ENTER, all threads
stop…
Going Parallel with C++11
12
vector<thread> workers;
#include
#include
#include
#include
#include
<thread>
<iostream>
<string>
<vector>
<algorithm>
using namespace std;
for (int i = 1; i <= 3; i++)
{
workers.push_back(
thread([i, &stop]()
{
while (!stop)
{
cout << "**Inside thread " << i << "...\n";
this_thread::sleep_for(chrono::seconds(i));
}
})
);
}
int main()
{
cout << "** Main Starting **\n\n";
bool stop = false;
.
.
.
getchar();
cout << "** Main Done **\n\n";
.
.
.
return 0;
}
Going Parallel with C++11
stop = true;
// stop threads:
// wait for threads to complete:
for ( thread& t : workers )
t.join();
13

Matrix multiply…
Going Parallel with C++11
14
int
int
int
int
rows
extra
start
end
=
=
=
=
// 1 thread per core:
numthreads = thread::hardware_concurrency();
N / numthreads;
N % numthreads;
0;
// each thread does [start..end)
rows;
vector<thread>
workers;
for (int t = 1; t <= numthreads; t++)
{
if (t == numthreads) // last thread does extra rows:
end += extra;
workers.push_back( thread([start, end, N, &C, &A, &B]()
{
for (int i = start; i < end; i++)
for (int j = 0; j < N; j++)
{
C[i][j] = 0.0;
for (int k = 0; k < N; k++)
C[i][j] += (A[i][k] * B[k][j]);
}
}));
}
start = end;
end
= start + rows;
for (thread& t : workers)
t.join();
15

Parallelism alone is not enough…
HPC == Parallelism + Memory Hierarchy ─ Contention
Expose parallelism
Maximize data locality:
• network
• disk
• RAM
• cache
• core
Going Parallel with C++11
Minimize interaction:
• false sharing
• locking
• synchronization
16
X
Going Parallel with C++11
17

Loop interchange is first step…
workers.push_back( thread([start, end, N, &C, &A, &B]()
{
for (int i = start; i < end; i++)
for (int j = 0; j < N; j++)
C[i][j] = 0.0;
for (int i = start; i < end; i++)
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
C[i][j] += (A[i][k] * B[k][j]);
}));
Next step is to block multiply…
Going Parallel with C++11
18
Going Parallel with C++11
19

No compiler as yet fully implements C++11

Visual C++ 2012 has best concurrency support
◦ Part of Visual Studio 2012

gcc 4.7 has best overall support
◦ http://gcc.gnu.org/projects/cxx0x.html

clang 3.1 appears very good as well
◦ I did not test
◦ http://clang.llvm.org/cxx_status.html
Going Parallel with C++11
20
# makefile
# threading library: one of these should work
# tlib=thread
tlib=pthread
# gcc 4.6:
ver=c++0x
# gcc 4.7:
# ver=c++11
build:
g++
Going Parallel with C++11
-std=$(ver)
-Wall
main.cpp
-l$(tlib)
21
Concept
Header
Summary
Threads
<thread>
Standard, low-level, type-safe; good basis for
building HL systems (futures, tasks, …)
Futures
<future>
Via async function; hides threading, better
harvesting of return value & exception handling
Locking
<mutex>
Standard, low-level locking primitives
Condition Vars
Atomics
<condition_
Low-level synchronization primitives
variable>
<atomic>
Predictable, concurrent access without data race
Memory Model
“Catch Fire” semantics; if program contains a data
race, behavior of memory is undefined
Thread Local
Thread-local variables [ problematic => avoid ]
Going Parallel with C++11
22

Use mutex to protect against concurrent access…
#include <mutex>
mutex
int
m;
sum;
thread t1([&]()
{
}
);
Going Parallel with C++11
thread t2([&]()
{
m.lock();
m.lock();
sum += compute();
sum += compute();
m.unlock();
m.unlock();
}
);
23

“Resource Acquisition Is Initialization”
◦ Advocated by B. Stroustrup for resource management
◦ Uses constructor & destructor to properly manage resources
(files, threads, locks, …) in presence of exceptions, etc.
Locks m in constructor
thread t([&]()
{
m.lock();
thread t([&]()
{
lock_guard<mutex> lg(m);
sum += compute();
sum += compute();
m.unlock();
}
});
);
Unlocks m in destructor
should be written as…
Going Parallel with C++11
24

Use atomic to protect shared variables…
◦ Lighter-weight than locking, but much more limited in applicability
#include <atomic>
atomic<int> count;
count = 0;
thread t1([&]()
{
thread t2([&]()
{
count++;
count++;
});
});
X
thread t3([&]()
{
count = count + 1;
});
Going Parallel with C++11
not safe…
25

Atomics enable safe, lock-free programming
◦ “Safe” is a relative word…
int
x;
atomic<bool> done;
done = false;
done
flag
thread t1([&]()
{
x = 42;
done = true;
});
thread t2([&]()
{
while (!done)
;
assert(x==42);
});
int
x;
atomic<bool> initd;
initd = false;
thread t1([&]()
{
if (!initd) {
lock_guard<mutex> _(m);
x = 42;
initd = true;
}
<< consume x, … >>
lazy
init
});
thread t2([&]()
{
if (!initd) {
lock_guard<mutex> _(m);
x = 42;
initd = true;
}
<< consume x, … >>
});
26

Prime numbers…
Going Parallel with C++11
27

Futures provide a higher-level of abstraction
◦ Starts an asynchronous operation on some thread, await result…
#include <future>
.
.
return type…
future<int> fut = async( []() -> int
{
int result = PerformLongRunningOperation();
return result;
}
);
try
.
{
.
int x = fut.get(); // join, harvest result:
cout << x << endl;
}
catch(exception &e)
{
cout << "**Exception: " << e.what() << endl;
}
Going Parallel with C++11
28



May run on the current thread
May run on a new thread
Often better to let system decide…
// run on current thread when someone asks for value (“lazy”):
future<T> fut1 = async( launch::sync, []() -> ... );
future<T> fut2 = async( launch::deferred, []() -> ... );
// run on a new thread:
future<T> fut3 = async( launch::async, []() -> ... );
// let system decide:
future<T> fut4 = async( launch::any, []() -> ... );
future<T> fut5 = async( []() ... );
Going Parallel with C++11
29

Netflix data-mining…
Netflix
Movie
Reviews
(.txt)
Going Parallel with C++11
Computes average review
for a movie…
Netflix Data
Mining App
30

C++ committee thought long and hard on
memory model semantics…
◦ “You Don’t Know Jack About Shared Variables or Memory Models”,
Boehm and Adve, CACM, Feb 2012

Conclusion:
◦ No suitable definition in presence of race conditions

Solution:
◦ Predictable memory model *only* in data-race-free codes
◦ Computer may “catch fire” in presence of data races
Going Parallel with C++11
31
int
x, y, r1, r2;
x = y = r1 = r2 = 0;
thread t1([&]()
{
x = 1;
r1 = y;
}
);
thread t2([&]()
{
y = 1;
r2 = x;
}
);
t1.join();
t2.join();
What can we say
about r1 and r2?
Going Parallel with C++11
32
If we think in terms of all possible
thread interleavings
(aka “sequential consistency”),
then we know r1 = 1, r2 = 1, or both
In C++ 11? Not only are the
values of x, y, r1 and r2
undefined, but the program
may crash!
Going Parallel with C++11
33
Def: two memory accesses conflict if they
1. access the same scalar object or contiguous sequence of bit fields, and
2. at least one access is a store.
Def: two memory accesses participate in a data race if they
1. conflict, and
2. can occur simultaneously.

A program is data-race-free (DRF) if no
sequentially-consistent execution results in a
data race. Avoid anything else.
via independent threads, locks, atomics, …
Going Parallel with C++11
34
Going Parallel with C++11
35

Tasks are a higher-level abstraction
Task: a unit of work; an object denoting an
ongoing operation or computation.
◦ Idea:
 developers identify work
 run-time system deals with execution details
Going Parallel with C++11
36

Microsoft PPL: Parallel Patterns Library
#include <ppl.h>
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
C[i][j] = 0.0;
// for (int i = 0; i < N; i++)
Concurrency::parallel_for(0, N, [&](int i)
{
for (int k = 0; k < N; k++)
for (int j = 0; j < N; j++)
C[i][j] += (A[i][k] * B[k][j]);
});
Going Parallel with C++11
Matrix Multiply
37
parallel_for( ... );
task
tasktasktask
Windows Process
Parallel Patterns Library
worker
thread
worker
thread
worker
thread
worker
thread
Task Scheduler
Thread Pool
global work queue
Resource Manager
Windows
38
Going Parallel with C++11
39

Presenter: Joe Hummel
◦ Email:
[email protected]
◦ Materials: http://www.joehummel.net/downloads.html

References:
◦ Book: “C++ Concurrency in Action”, by Anthony Williams
◦ Talks: Bjarne and friends at MSFT’s “Going Native 2012”
 http://channel9.msdn.com/Events/GoingNative/GoingNative-2012
◦ Tutorials: really nice series by Bartosz Milewski
 http://bartoszmilewski.com/2011/08/29/c11-concurrency-tutorial/
◦ FAQ: Bjarne Stroustrup’s extensive FAQ
 http://www.stroustrup.com/C++11FAQ.html
Going Parallel with C++11
40

similar documents