### Chapter One Introduction to Pipelined Processors

```Chapter One
Introduction to Pipelined
Processors
Principle of Designing Pipeline
Processors
(Design Problems of Pipeline
Processors)
Data Buffering and Busing
Structures
Speeding up of pipeline segments
• The processing speed of pipeline segments are
usually unequal.
• Consider the example given below:
S1
S2
S3
T1
T2
T3
Speeding up of pipeline segments
• If T1 = T3 = T and T2 = 3T, S2 becomes the
bottleneck and we need to remove it
• How?
• One method is to subdivide the bottleneck
– Two divisions possible are:
Speeding up of pipeline segments
• First Method:
S1
T
S3
T
2T
T
Speeding up of pipeline segments
• First Method:
S1
T
S3
T
2T
T
Speeding up of pipeline segments
• Second Method:
S1
T
S3
T
T
T
T
Speeding up of pipeline segments
• If the bottleneck is not sub-divisible, we can
duplicate S2 in parallel
S2
3T
S1
S2
S3
T
3T
T
S2
3T
Speeding up of pipeline segments
• Control and Synchronization is more complex
in parallel segments
Data Buffering
• Instruction and data buffering provides a
continuous flow to pipeline units
• Example: 4X TI ASC
Example: 4X TI ASC
• In this system it uses a memory buffer unit
(MBU) which
– Supply arithmetic unit with a continuous stream
of operands
– Store results in memory
• The MBU has three double buffers X, Y and Z
(one octet per buffer)
– X,Y for input and Z for output
Example: 4X TI ASC
• This provides pipeline processing at high rate
and alleviate mismatch bandwidth problem
between memory and arithmetic pipeline
Busing Structures
• PBLM: Ideally subfunctions in pipeline should
be independent, else the pipeline must be
halted till dependency is removed.
• SOLN: An efficient internal busing structure.
• Example : TI ASC
Example : TI ASC
• In TI ASC, once instruction dependency is
recognized, update capability is incorporated
by transferring contents of Z buffer to X or Y
buffer.
Internal Data Forwarding and
Register Tagging
Internal Forwarding and Register
Tagging
• Internal Forwarding: It is replacing
unnecessary memory accesses by register-toregister transfers.
• Register Tagging: It is the use of tagged
registers for exploiting concurrent activities
among multiple ALUs.
Internal Forwarding
• Memory access is slower than register-toregister operations.
• Performance can be enhanced by eliminating
unnecessary memory accesses
Internal Forwarding
• This concept can be explored in 3 directions:
1. Store – Load Forwarding
3. Store – Store Forwarding
Store – Load Forwarding
Store – Store Forwarding
Register Tagging
Example : IBM Model 91 :
Floating Point Execution Unit
Example : IBM Model 91-FPU
• The floating point execution unit consists of :
– Data registers
– Transfer paths
– Floating Point Adder Unit
– Multiply-Divide Unit
– Reservation stations
– Common Data Bus
Example : IBM Model 91-FPU
• There are 3 reservation stations for adder
named A1, A2 and A3 and 2 for multipliers
named M1 and M2.
• Each station has the source & sink registers
and their tag & control fields
• The stations hold operands for next execution.
Example : IBM Model 91-FPU
• 3 store data buffers(SDBs) and 4 floating point
registers (FLRs) are tagged
• Busy bits in FLR indicates the dependence of
instructions in subsequent execution
• Common Data Bus(CDB) is to transfer
operands
Example : IBM Model 91-FPU
• There are 11 units to supply information to
CDB: 6 FLBs, 3 adders & 2 multiply/divide unit
• Tags for these stations are :
Unit
Tag
Unit
Tag
FLB1
FLB2
FLB3
0001
0010
0011
1010
1011
1100
FLB4
0100
M1
1000
FLB5
0101
M2
1001
FLB6
0110
Example : IBM Model 91-FPU
• Internal forwarding can be achieved with
tagging scheme on CDB.
• Example:
• Let F refers to FLR and FLBi stands for ith FLB
and their contents be (F) and (FLBi)
• Consider instruction sequence
F  (F) + (FLB1)
MPY F,FLB2
F  (F) x (FLB2)
Example : IBM Model 91-FPU
• During addition :
– Busy bit of F is set to 1
– Contents of F and FLB1 is sent to adder A1
– Tag of F is set to 1010 (tag of adder)
F
Busy Bit = 1
Tag=1010
Storage Bus
Instruction Unit
6
5
Floating
Point
Buffers
(FLB)
4
Control
3
2
Floating
Point
Operand
Stack(FLOS)
Busy Bit = 1 Tag=1010
Tags
1
Decoder
Tag
Sink
Tag
Sink
1010 F
Tag
Tag
0001
Source
Source
FLB1
CTRL
CTRL
CTRL
Tag Sink
Tag Sink
Tag
Tag
Source CTRL
Source CTRL
Multiplier
(Common Data Bus)
Store
3
data buffers 2
(SDB)
1
Example : IBM Model 91-FPU
• Meantime, the decode of MPY reveals F is
busy, then
– F should set tag of M1 as 1010 (Tag of adder)
– F should change its tag to 1000 (Tag of Multiplier)
– Send content of FLB2 to M1
F
Busy Bit = 1
Tag=1000
Storage Bus
Instruction Unit
6
5
Floating
Point
Buffers
(FLB)
4
Control
3
2
Floating
Point
Operand
Stack(FLOS)
Busy Bit = 1 Tag=1000
Tags
1
Decoder
Tag Sink Tag Source
Tag Sink Tag Source
Tag Sink Tag Source
CTRL
CTRL
CTRL
Tag Sink Tag
1000 F
0010
Source CTRL
FLB2 CTRL
Multiplier
(Common Data Bus)
Store
3
data buffers 2
(SDB)
1
Example : IBM Model 91-FPU
• When addition is done, CDB finds that the
result should be sent to M1
• Multiplication is done when both operands
are available
Hazard Detection and Resolution
Hazard Detection and Resolution
• Hazards are caused by resource usage
conflicts among various instructions
• They are triggered by inter-instruction
dependencies
Terminologies:
• Resource Objects: set of working registers,
memory locations and special flags
Hazard Detection and Resolution
• Data Objects: Content of resource objects
• Each Instruction can be considered as a
mapping from a set of data objects to a set of
data objects.
• Domain D(I) : set of resource of objects whose
data objects may affect the execution of
instruction I.
Hazard Detection and Resolution
• Range R(I): set of resource objects whose data
objects may be modified by the execution of
instruction I
• Instruction reads from its domain and writes
in its range
Hazard Detection and Resolution
• Consider execution of instructions I and J, and
J appears immediately after I.
• There are 3 types of data dependent hazards:
1. RAW (Read After Write)
2. WAW(Write After Write)
3. WAR (Write After Write)
RAW (Read After Write)
• The necessary condition for this hazard is
R( I )  D( J )  
RAW (Read After Write)
• Example:
I1 : LOAD r1,a
I2 : ADD r2,r1
• I2 cannot be correctly executed until r1 is
• Thus I2 is RAW dependent on I1
WAW(Write After Write)
• The necessary condition is
R( I )  R( J )  
WAW(Write After Write)
• Example
I1 : MUL r1, r2
I2 : ADD r1,r4
• Here I1 and I2 writes to same destination and
hence they are said to be WAW dependent.
• The necessary condition is
D( I )  R( J )  
•
•
•
•
Example:
I1 : MUL r1,r2
I2 : ADD r2,r3
Here I2 has r2 as destination while I1 uses it as
source and hence they are WAR dependent
Hazard Detection and Resolution
• Hazards can be detected in fetch stage by
comparing domain and range.
• Once detected, there are two methods:
1. Generate a warning signal to prevent hazard
2. Allow incoming instruction through pipe and
distribute detection to all pipeline stages.
```