### Verification for Concurrency

```Verification for Concurrency
Part 2: Incomplete techniques and bug finding
Contents
 Race detection
 Context bounding and Sequentialization
 Odds and ends
Race detection
Data Races
 “Two threads simultaneously access the same memory
location, with at least one access being a write”
 Most concurrent software is written to avoid data-races
 Extremely tricky to write racy code that is correct
 For racy code, correctness on one architecture and
complier does not imply correctness for all
 Race-detection is a very efficient light-weight bug
detection technique
 Dynamic techniques: Work on actual executions
The Lockset algorithm
 Lockset assumption: Whenever two different threads
access a location (with one of them being a write), they
will both hold a common lock.
 Possible to write non-racy code violating this assumption
 False positives from code that violates this assumption
Vanilla Lockset algorithm
for each tid: LocksHeld[tid] = ∅
for each v : Cands[v] = set of all locks
for each instruction in trace:
// Maintain lockset
if (instruction = lock(l))
LocksHeld[tid] = LocksHeld[tid] ∪ { l }
if (instruction = unlock(l))
LocksHeld[tid] = LocksHeld[tid] \ { l }
// Update candidate locks
Cands[v] := Cands[v] ∩ LocksHeld[tid]
if (Cands[v] = ∅)
Output “Potential race on v”
 Based on a simple locking discipline
 Each variable is protected by some lock
Vanilla Lockset algorithm: Example
lock (l1)
x := x + 4
unlock (l1)
lock(l1)
lock(l2)
x := x – y
y := y + 4
unlock(l2)
unlock(l1)
lock (l2)
y = y + 5
x := x - 3
unlock (l2)
x := x - 3
Vanilla LockSet: False positive
// Initialize
g := 1729
// Share
lock(l)
published := true
unlock(l)
lock(l)
assume(published)
unlock(l)
y := g
lock(l)
assume(published)
unlock(l)
z := g
y := f(l, y);
z := g(l, z)
output(y)
output(z)
 Simple locking discipline leads to false positives in many
common cases
 Lazy initialization
 Initialize followed by read-only access
Virgin
Shared
Modified
write
Exclusive
write
Shared
update State[v]
if State[v] = Exclusive
// Do nothing
else if State[v] = Shared
// Update lockset, don’t report any races
Cands[v] = Cands[v] ∩ LocksHeld[tid]
else if State[v] = Shared-Modified
// Update lockset, report races
Cands[v] := Cands[v] ∩ LocksHeld[tid]
if (Cands[v] = ∅)
Output “Potential race on v”
 Works well for initialize-and-publish, lazy initialization, etc
 Doesn’t work with ownership-transfer, etc
// Initialize
g := 1729
// Share
lock(l)
published := true
unlock(l)
lock(l)
assume(published)
unlock(l)
y := g
lock(l)
assume(published)
unlock(l)
z := g
y := f(l, y);
z := g(l, z)
output(y)
output(z)
…
…
await (Obj.owner == 1);
Obj.foo()
Obj.bar := baz();
…
Obj.owner := 2
…
await (Obj.owner = 2);
Obj.bar()
Obj.foo = baz();
…
Happens-Before relation
 Proposed by Lamport in 1978 for distributed systems
 Many race detection algorithms try and approximate the
happens-before relation
 Basic idea: two events are related if and only if
communication allows information-flow between them
 We write ei  ej for event-i happens-before event-j
 Informally, if ei  ej, ei happens-before ej in all variations of the
trace
 A race if two events ei and ej accesses the same location,
and neither ei  ej holds, nor ej  ei holds
Happens-Before Relation: Defintion
 If two events are from the same thread, the earlier one
happens-before the later
 thread(ei) = thread(ej) ∧ i < j  ei  ej
 Happens-before is transitive
 (ei  ej) ∧ (ej  ek ) (ei  ek)
 Every synchronization gives some happens-befores
 LOCK: if ei is a unlock and ej is a lock later, ei  ej
 WAIT/NOTIFY: if ei is a notify and ej is a corresponding wait, ei
 ej
 …
Happens-Before: Examples
obj := new Foo()
Notify(obj)
Wait(obj)
obj.data = data
Notify(obj)
Wait(obj)
lock(l)
obj.data = obj.data + 4
unlock(l)
…
lock(l)
obj.data = obj.data - 4
unlock(l)
…
Computing the Happens-Before:
Vector Clocks
 The happens-before relation is usually very expensive to
compute
 Few dynamic techniques actually compute the full relation
 Classical method proposed by Lamport himself
 Vector clocks: with each event, associate a “vector
clock” storing the last event from the other threads that
affects it
 ei  ej if and only if VC[ej][thread(i)] >= ei
Vector clocks
VC[e] = [ e1, e2, e3, … , en ]
Last relevant event from thread 1
// regular events
VC[e][tid] := VC[prev(e)][tid]
Last relevant event from thread 2
Can be extended to
other synchronization
primitives!
// Acquire locks
VC[e][tid] := max(VC[prev(e)][tid], LVC[tid])
ei  ej if and only if VC[ej][thread(i)] >= ei
// Release locks
VC[e][tid] := VC[prev(e)][tid]
LVC[tid] := max(VC[e][tid], LVC[tid])
Vector clocks: Examples
T1_0: obj := new Foo()
T1_1: Notify(obj)
T2_1: Wait(obj)
T2_2: obj.data = data
T2_3: Notify(obj)
T1_2:
T1_3:
T1_4:
T1_5:
…
Wait(obj)
lock(l)
obj.data = obj.data + 4
unlock(l)
T2_4: lock(l)
T2_5: obj.data = obj.data - 4
T2_6: unlock(l)
…
What does HB miss?
 Every race or false race reported by happens-before
based methods are also reported by LockSet based
methods
 Fewer false positives, Potential false negatives
 Why? Every happens-before relation is not really a true
synchronization
y = y + 1
lock(l)
x = x + 1
unlock(l)
lock(l)
x = x + 1
unlock(l)
y = y + 1
Race detection: summary
 First line of defence for most concurrent programs
 Many bugs just show up as race conditions
 Lockset is fast
 Lots of false positives
 Happens-before is slow
 Reports only true data races
 Potential false negatives
 There are hybrid techniques
 Compute approximations of LockSet and HB
Context bounding and
Sequentialization
Context bounding
 Folk knowledge: Most concurrency bugs are shallow in
terms of required context-switches
 Most bugs require very few bug fixes
 Most concurrency bugs are atomicity violations or order
violations
 For an empirical study, see Shan Lu et al. 2006…2008
 Why not check concurrent programs only up to a few
context switches?
 Much more efficient
CHESS: Systematic exploration
 Culmination of techniques proposed by Qadeer et al in
2004
 Correctness primarily given by assertions in the code
 Can also use monitors
 Can detect data-races, deadlocks, etc
 Main idea: Use a scheduler that explores traces of the
program deterministically, prioritizing traces having few
context-switches
CHESS: Controlling scheduler
 Non-determinism source:
 Input
 Scheduling
 Timing and library
 Input non-determinism controlled by specifying fixed inputs
 Scheduling non-determinism controlled by writing
deterministic scheduler
 Library non-determinism: model library code
State-space explosion
x=1
…
…
…
…
y=k
x=1
…
…
…
…
y=k
…
polynomial amount of space
• Remember c spots for context
switches
• Permutations of the n+c atomic
blocks
 Exploring k steps in each
 Number of executions is
O(nnk)
 Exploring k steps in each
context-switches
 Number of executions is
O((n2k)c.n!)
 Not exponential in k
Scheduling: Picking pre-emption
points
void Deposit100() {
ChessSchedule();
EnterCriticalSection(&cs);
balance += 100;
ChessSchedule();
LeaveCriticalSection(&cs);
}
void Withdraw100() {
int t;
ChessSchedule();
EnterCriticalSection(&cs);
t = balance;
ChessSchedule();
LeaveCriticalSection(&cs);
ChessSchedule();
EnterCriticalSection(&cs);
balance = t -100;
ChessSchedule();
LeaveCriticalSection(&cs);
}
 Heuristics: More pre-emption points in critical code, etc
 Coverage guarantee: When 2 context-switches are explored,
every remaining bug requires at least 3 context-switches
CHESS: Summary
 Build a deterministic scheduler
 Complications: Fairness and Live locks, weak memory
models
 Runs real code on real systems
 Only scheduler has been replaced
 Is mostly program agnostic
 Exhaustive testing
Sequentialization
 CHESS approach: Concurrent program + bound on context
switches  explore all interleavings
 General sequentialization approach: Concurrent program +
bound on context switches  Sequential program
 Then, verify sequential program using your favourite verification
technique
 Many flavours of context-bounded analysis:
 PDS based (Qadeer et al.)
 Transformation based sequentialization: Eager, Lazy (Lal et al.)
 BMC based (Parlato et al.)
Sequentialization: Basic idea
 What is hard about sequentialization?
 Have to remember local variables across phases (though
they don’t change)
 If exploring T1  T2  T1, have to remember locals of T1
across phase of T2
 Lal-Reps 2008: Instead, do a source to source
transformation
 Copy each statement and global variable c times
 Now, we can explore T1  T1  T2 instead of T1  T2  T1
 Only one threads local variables relevant at each stage
Sequentialization: Basic idea
 Replace each global variable X by X[tid][0..K]
 X[tid][i] represents the value of the global variable X the ith
X := X + 1
if (phase = 0)
X[tid][0] :=
else if (phase =
X[tid][1] :=
…
else if (phase =
X[tid][K] =
X[tid][0] + 1
1)
X[tid][1] + 1
K)
X[tid][K] + 1
if (phase < K && *)
phase++;
if phase == K + 1
phase = 1
Sequentialization: Basic idea
 A program (T1||T2) is rewritten into Seq(T1); Seq(T2);
check()
for phase = 0 to K
if (phase > 0)
assume (X[0][phase] == X[N][phase – 1]
for tid = 1 to N
assume (X[tid][phase] == X[tid-1][phase])
 Roughly,
 But, at random points, guess new values for global variables
 In the end, check the guessed new values are consistent
Sequentialization
Each green arrow is one part of the check!
…
…
X[0][0] := X[0][0] + 1
…
…
…
…
X[1][0] := X[1][0] + 1
…
…
…
X[0][1] := X[0][1] + 1
…
…
X[1][1] := X[1][1] + 1
…
…
X[0][2] := X[0][2] + 1
…
…
X[1][2] := X[1][2] + 1
…
Sequentialization
 The original Lal/Reps technique uses summarization for
verification of the sequential program
 Compute summaries for the relation of initial and final values
of global variables
 Extremely powerful idea
 No need to reason explicitly about interleavings
 Interleavings encoded into data (variables)
 Scales linearly with number of threads
Sequentialization and BMC
 Currently, the best tools in the concurrency verification
competitions use “sequentialization + BMC”
 The previous sequentialization technique is better suited
for analysis techniques, not model checking
checking for consistency
 Instead, just explicitly use non-determinism
BMC for concurrency
 First, rewrite threads by unrolling loops and inlining
function calls
 No loops
 No function calls
 Forward only control flow
 Write a driver “main” function to schedule the threads
one by one
Naïve sequentialization for BMC
Main driver:
pc0 = 0, … , pcn = 0
switch(pci) {
case 0: goto 0;
case 1: goto 1;
…
}
0: CS(0); stmt0;
1: CS(1); stmt1;
…
M: CS(M); stmtm;
main() {
for (r = 0; r < K, r++)
for (i = 0; i < n; i++)
}
• The resume mechanism jumps
into “right” spot in the thread
• There is a potential CS before
each statement
• What’s the problem? Lots of
jumps in the control flow
CS(j) := if(*) {
pci = j;
return
}
Better sequentialization for BMC
Main driver:
pc0 = 0, … , pcn = 0
main() {
for (r = 0; r < K, r++)
for (i = 0; i < n; i++)
nextCS = *
assume (nextCS >= pci)
pci = nextCS
}
• Avoid the multiple control flow
breaking jumps
• Restricted non-determinism to
one spot
0: CS(0); stmt0;
1: CS(1); stmt1;
…
M: CS(M); stmtm;
CS(j) :=
if(j < pci || j >= nextCS)
{
goto j+1;
}
Context bounding and
Sequentialization: Summary
 Host of related techniques
 Can be adapted for analysis, model checking, testing, etc
 Different techniques need different kinds of tuning
 Basic idea: Most bugs require few context switches to turn
up
 Can leverage standard sequential program analysis
techniques
Odds and Ends
Things we didn’t cover
Specification-free correctness
 In many cases we don’t want to write assertions
 Just want concurrent program to do the same thing as a
sequential program is doing
 Standard correctness conditions
 Linearizability [Herlihy/Wing 91]
 Serializability [Papadimitrou and others 70s]
C
o
n
c
.
E
x
e
c
Seq. Exec
Method 0
Method 2
Method 1
Method 3
Method 0
Method 3
Method 1
Method 2
Testing for concurrency
 Root cause of bugs
 Ordering violations
 Atomicity violations
 Data races
 Coverage metrics and coverage guided search
 Define use pairs [Tasirin et al]
 Find ordering violations based on define use orderings
 HaPSet [ Wang et al]
 Find interesting interleavings by trying to cover all
“immediate histories” of events
 Cute/JCute [Sen et al]
 Concolic testing: Accumulate constraints along test run to
guide future test runs
(Symbolic) Predictive Analysis
 Analyze variations of the given concurrent trace
 Run a test and record information
 Build a predictive model by relaxing scheduling constraints
 Analyze predictive model for alternate interleavings
 Can flag false bugs
 Symbolic predictive analysis
 From a trace, build precise predictive model (as SMT
formula)
 No false bugs
This is the End
 Brief overview of concurrent verification techniques
 Lecture 1: Full proof techniques
 Lecture 2: Incomplete techniques / Bug finding
 What did we learn?
 Full verification is hard, not many techniques for weakmemory architectures
 Use light-weight and incomplete techniques to detect
shallow bugs
 Code using a strict concurrency discipline is more likely to be
correct, easier to verify
```