Chapter 4 - Iowa State University

Report
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Chapter 4
The Processor
Zhao Zhang
Iowa State University
Revised from original slides provided
by MKP
Week 10 Overview




Expected project progress: Complete MiniProject B, part 1
ALU data hazard and forwarding
MEM data hazard, forwarding, and pipeline
stall
Control hazard and branch execution
Chapter 1 — Computer Abstractions and Technology — 2
Data Hazards from ALU Instructions

An instruction depends on completion of
data access by a previous instruction


add
sub
$s0, $t0, $t1
$t2, $s0, $t3
Consider this sequence:
sub
and
or
add
sw
$2, $1,$3
$12,$2,$5
$13,$6,$2
$14,$2,$2
$15,100($2)
Chapter 4 — The Processor — 3
Data Hazards from ALU Instructions

A naïve approach is to insert nops to wait
out the dependence

add
sub
$s0, $t0, $t1
$t2, $s0, $t3
Change to

add $s0, $t0, $t1
noop
noop
sub $t2, $s0, $t3
Chapter 4 — The Processor — 4
Data Hazards in ALU Instructions

Another naïve approach is to stall the 2nd
instruction in the dependence

add
sub
$s0, $t0, $t1
$t2, $s0, $t3
Chapter 4 — The Processor — 5
Data Hazards in ALU Instructions

Observations on this scenario
The first, ALU instruction produces a register
value
 The following instruction(s) consumes the
register value
sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)

Chapter 1 — Computer Abstractions and Technology — 6
Data Hazards in ALU Instructions

What is exactly the problem?


A register value is written to the register file in the WB
stage, two cycles after the EX stage
The following instructions read the register value in
the beginning of the ID stage
IF ID EX MEM WB
and
sub
…
…
…
sub reads $1, $3
or
and
sub
…
…
AND reads old $2
add
or
and
sub
…
OR reads old $2
sw
add
or
and
sub
sub writes to $2
add reads new $2
Chapter 1 — Computer Abstractions and Technology — 7
Data Hazards in ALU Instructions
Chapter 1 — Computer Abstractions and Technology — 8
Forwarding (aka Bypassing)

Use result when it is computed



The result is already in the pipeline
Don’t wait for it to be stored in a register
Requires extra connections in the datapath
Chapter 4 — The Processor — 9
Dependencies & Forwarding
Chapter 4 — The Processor — 10
Data Forwarding

To what place: The two ALU inputs in
the EX stage datapath


Forwarded register value may replace the
values from ID
From where: The destination register
value in pipeline registers


Source 1: EX/MEM register
Source 2: MEM/WB register
Chapter 1 — Computer Abstractions and Technology — 11
Data Forwarding

When to forward: Data dependence
detected between



Instructions at the EX and MEM stage
Instructions at the EX and WB stage
How to detect: Compare source and
destination register numbers
Chapter 1 — Computer Abstractions and Technology — 12
Data Forwarding

Example
sub
and
or
add
sw
$2, $1,$3 # MEM=>EX forwarding
$12,$2,$5 # WB =>EX forwarding
$13,$6,$2
$14,$2,$2
$15,100($2)
IF
ID
EX
MEM
WB
or
and
sub
…
…
add
or
and
sub
…
AND gets forwarded
new $2 value
sw
add
or
and
sub
SUB gets forwarded
new $2 value
Chapter 1 — Computer Abstractions and Technology — 13
Data Forwarding Logic Design
sub $2, $1,$3 #
and $12,$2,$5 # comp $2 with $2, $5
or $13,$6,$2 # comp $2 with $6, $2


Detection: Compare rs and rt at EX,
with rd at MEM and rd at WB
Those register numbers are in the IE/EX,
EX/MEM, and MEM/WB registers

rs was not in IE/EX register, we can add it
Chapter 1 — Computer Abstractions and Technology — 14
Data Forwarding Logic Design

Register numbers in pipeline




Source registers of the instruction at the EX stage
ID/EX.RegisterRs, ID/EX.RegisterRt
Destination register of the instruction at the MEM stage
EX/MEM.RegisterRd
Destination register of the instruction at WB stage
MEM/WB.RegisterRd
Potential data hazards when
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Fwd from
EX/MEM
pipeline reg
Fwd from
MEM/WB
pipeline reg
Chapter 4 — The Processor — 15
Data Forwarding Logic Design

But only if forwarding instruction will write
to a register!



EX/MEM.RegWrite=1, MEM/WB.RegWrite=1
It’s possible an instruction has a matching rd
but doesn’t write to register
And only if Rd for that instruction is not
$zero


EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0
It’s allowed for an instruction to write to $0
Chapter 4 — The Processor — 16
Forwarding Paths
The forwarding unit accesses three pipeline registers
Note rs is added to IE/EX pipeline register
Chapter 4 — The Processor — 17
Forwarding Conditions

EX hazard: Data forwarding from EX/MEM register



if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
MEM hazard: Data forwarding from MEM/WB register


if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
This is not the final version (see slides 23)
Chapter 4 — The Processor — 18
Datapath with Forwarding
Chapter 4 — The Processor — 19
Caveats

Data forwarding happens in the beginning of
the cycle



The forwarding unit is in the EX stage, with its
inputs from three pipeline stages
A small overhead added to the critical path
latency of the EX stage
For EX hazard, data forwarding is from MEM
to EX

Precisely, the register value of the instruction
being executed at the MEM stage is forwarded to
the instruction being executed at the EX stage
Chapter 1 — Computer Abstractions and Technology — 20
Caveats

For MEM hazard, the forwarding is from the
WB to EX


Data forwarding is to EX not to ID




From the instruction at WB to the instruction at EX
An instruction may read obsolete register values
at ID, with the values latched at ID/EX register
The correct values may be at EX (EX Hazard) or
MEM (MEM Hazard)
Any obsolete values get replaced at EX
There is no WB hazard

Register write at WB and register read at ID, for
the same register, may complete within one cycle
Chapter 1 — Computer Abstractions and Technology — 21
Double Data Hazard

Consider the sequence:
add $1,$1,$2
add $1,$1,$3
add $1,$1,$4

Both hazards occur


Want to use the most recent
Revise MEM hazard condition

Only fwd if EX hazard condition isn’t true
Chapter 4 — The Processor — 22
Revised Forwarding Condition

MEM hazard (revision from slide 18)


if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
Chapter 4 — The Processor — 23
Load-Use Data Hazard


Load-use data hazard: A load instruction is
followed immediately by an instruction
using the value of load
How is a load instruction different from an
ALU instruction?



ALU inst: destination register value available
at the end of the EX stage
Load inst: destination register value available
at the end of the MEM stage
Note the next instruction may need the value
in the beginning of its EX stage
Chapter 1 — Computer Abstractions and Technology — 24
Load-Use Data Hazard

Can’t always avoid stalls by forwarding


If value not computed when needed
Can’t forward backward in time!
Chapter 4 — The Processor — 25
Load-Use Data Hazard
Need to stall
for one cycle
Chapter 4 — The Processor — 26
Load-Use Hazard
How to insert a pipeline bubble (lost cycle)?
lw $2, 20($1)
sub $4, $2, $5
or $8, $2, $6
When the load instruction is at the EX stage
 Hold the instruction at the IF stage


Hold the instruction at the ID stage


Do not change the IF/ID register
Insert a nop at the EX stage



Do not update the PC
Make all control signals in ID/EX register to zero
Particularly, RegWrite = 0 and MemWrite = 0
Move forward MEM and WB
Chapter 1 — Computer Abstractions and Technology — 27
Load-Use Hazard Detection
To detect, check if
 A load instruction is at the EX stage


ID/EX.MemRead = 1
The instruction at the ID stage reads the
register value of load

ID/EX.RegisterRt = IF/ID.RegisterRs, or
ID/EX.RegisterRt = IF/ID.RegisterRt (for Rtype)
If detected, stall IF and ID, insert bubble at
EX, move forward MEM and MB
Chapter 4 — The Processor — 28
Pipeline Stall

The nop has all control signals set to zero


It does nothing at EX, MEM and WB
Prevent update of PC and IF/ID register



Using instruction is decoded again (OK)
Following instruction is fetched again (OK)
1-cycle stall allows MEM to read data for lw


Can subsequently forward from WB to EX
Need to add new control lines


PCWrite for holding or updating PC
IF/IDWrite for holding or update IF/ID register
Chapter 4 — The Processor — 29
Stall/Bubble in the Pipeline
Stall inserted
here
Chapter 4 — The Processor — 30
Stall/Bubble in the Pipeline
Or, more
accurately…
Chapter 4 — The Processor — 31
Datapath with Hazard Detection
Chapter 4 — The Processor — 32
Stalls and Performance
The BIG Picture

Stalls reduce performance


But are required to get correct results
Compiler can arrange code to avoid
hazards and stalls

Requires knowledge of the pipeline structure
Chapter 4 — The Processor — 33
Code Scheduling to Avoid Stalls


Reorder code to avoid use of load result in
the next instruction
C code for A = B + E; C = B + F;
stall
stall
lw
lw
add
sw
lw
add
sw
$t1,
$t2,
$t3,
$t3,
$t4,
$t5,
$t5,
0($t0)
4($t0)
$t1, $t2
12($t0)
8($t0)
$t1, $t4
16($t0)
13 cycles
lw
lw
lw
add
sw
add
sw
$t1,
$t2,
$t4,
$t3,
$t3,
$t5,
$t5,
0($t0)
4($t0)
8($t0)
$t1, $t2
12($t0)
$t1, $t4
16($t0)
11 cycles
Chapter 4 — The Processor — 34
Control Hazards

Branch determines flow of control



Two branch outcomes: Taken or Not-Taken
Fetching next instruction depends on branch
outcome
Pipeline can’t always fetch correct instruction


Still working on ID stage of branch
In MIPS pipeline


Need to compare registers and compute
target early in the pipeline
Add hardware to do it in ID stage
Chapter 4 — The Processor — 35
Control Hazards
Several caveats
 The CPU doesn’t recognize a branch until it
reaches the end of the ID stage
 Every cycle, the CPU has to fetch one
instruction




Cannot afford to wait and see
Must predict the next PC every cycle
The CPU may predict “always not-taken”
(MIPS 5-stage pipeline)
Alternatively, the CPU may predict branch
outcome dynamically (advanced CPU design)
Chapter 1 — Computer Abstractions and Technology — 36
Control Hazards

This MIPS pipeline always predicts NotTaken




What happens if the branch is wrong?



Easy prediction: The next PC is current PC plus 4
No need to design complex branch prediction unit
More Taken than Not-Taken in most programs
Will have mis-fetched instructions
Flush those instructions before they take effect
i.e. Before they write to memory or register
A Taken branch incurs a performance
penalty
Chapter 1 — Computer Abstractions and Technology — 37

If branch outcome determined in MEM
§4.8 Control Hazards
Performance Impact
Flush these
instructions
(Set control
values to 0)
PC
Three cycles wasted on a taken branch
Chapter 4 — The Processor — 38
Performance Impact

The performance loss is 3 cycles per taken
branch


If branch outcome determined in MEM
Move execution of branch to the ID stage!

Only beq and bne are supported in original MIPS


Branch target can be calculate in ID



Testing equality and inequality is very fast, do it at the
end of ID
Branch target = PC + extended offset
PC and offset are known in the beginning of ID
At the of ID, the CPU knows the branch outcome
and branch target
Chapter 1 — Computer Abstractions and Technology — 39
Reducing Branch Delay

Move hardware to determine outcome to ID
stage



Target address adder
Register comparator
Example code with branch taken
36:
40:
44:
48:
52:
56:
72:
sub
beq
and
or
add
slt
...
lw
$10,
$1,
$12,
$13,
$14,
$15,
$4,
$3,
$2,
$2,
$4,
$6,
$8
7
$5
$6
$2
$7
$4, 50($7)
Chapter 4 — The Processor — 40
Example: Branch Taken
Chapter 4 — The Processor — 41
Early Branch Outcome

Pipeline changes for early branch outcome




2nd PC adder and the shifter moved to ID
Comparator added to ID
No zero any more from ALU
CPU flushes one instruction for every
taken branch



CPU detects taken branch at ID
The instruction at the IF will be flushed
1 lost cycles instead of 3 lost cycles per taken
branch
Chapter 1 — Computer Abstractions and Technology — 42
Pipeline Flushing
When CPU detects a taken branch at ID
 Update PC with branch target (already
have)
 Flush the instruction IF stage



Add flush signal to IF/ID pipeline
When flushing, convert the instruction in IF/ID
register to 32-bit zeros
0x00000000 is “and $0, $0, $0”, effectively a
nop
Chapter 1 — Computer Abstractions and Technology — 43
Example: Branch Taken
Note: Branch does nothing in EX, MEM and WB
Chapter 4 — The Processor — 44
Pipeline Bubble on Branch

Taken branch incurs a pipeline bubble
because of instruction flushing
Chapter 4 — The Processor — 45
Data Hazards for Branches
Moving branch execution to ID is not so easy
 May need another forwarding unit



The forwarding unit has to be in the ID stage
The current forwarding unit, in the EX stage,
obviously doesn't work
Need extensions to the hazard detection
unit, and more pipeline stalls

Branch uses register values at ID, ALU and
load produce register values at EX and MEM
Chapter 1 — Computer Abstractions and Technology — 46
Data Hazards for Branches

If a comparison register is a destination of
2nd or 3rd preceding ALU instruction
add $1, $2, $3
IF
add $4, $5, $6
…
beq $1, $4, target


ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
Can resolve using forwarding
From MEM to ID, and from WB to ID
Chapter 4 — The Processor — 47
Data Hazards for Branches

If a comparison register is a destination of
preceding ALU instruction or 2nd preceding
load instruction


lw
May need 1 stall cycle
However, beq needs the value at the end of ID
$1, addr
IF
add $4, $5, $6
beq stalled
beq $1, $4, target
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
IF
ID
ID
EX
MEM
WB
Chapter 4 — The Processor — 48
Data Hazards for Branches

If a comparison register is a destination of
immediately preceding load instruction


lw
May need 2 stall cycles
Again, beq needs the value at the end of ID,
so it’s possible to reduce stall to one cycle
$1, addr
IF
beq stalled
beq stalled
beq $1, $0, target
ID
EX
IF
ID
MEM
WB
ID
ID
EX
MEM
WB
Chapter 4 — The Processor — 49
Mini-Project C

In Mini-Project C, implement



The simple MIPS pipeline
Data forwarding and hazard detection
Not-taken branch prediction with pipeline
flushing
Chapter 1 — Computer Abstractions and Technology — 50
Delayed Branch
Delayed branch may remove the one-cycle stall
The instruction right after the beq is executed no
matter the branch is taken or not (sub instruction
in the example)
 Alternatingly saying, the execution of beq is
delayed by one cycle
sub $10, $4, $8
beq $1, $3, 7
beq $1, $3, 7 => sub $10, $4, $8
and $12, $2, $5
and $12, $2, $5
Must find an independent instruction, otherwise



May have to fill in a nop instruction, or
Need two variants of beq, delayed and not delayed
Chapter 1 — Computer Abstractions and Technology — 51
Branch Prediction
We’ve actually studied one form of branch
prediction: always not-taken
 Longer pipelines can’t readily determine
branch outcome early


Predict outcome of branch


Stall penalty becomes unacceptable
Only stall if prediction is wrong
In MIPS pipeline


Can predict branches not taken
Fetch instruction after branch, with no delay
Chapter 4 — The Processor — 52

similar documents