ARM organization

Report
A[31:0]
control
address register
P
C
ARM organization
incrementer
PC
register
bank
instruction
decode
A
L
U
b
u
s
multiply
register
&
A
B
b
u
s
b
u
s
barrel
shif t er
control
ALU
dat a out register
dat a in register
D[31:0]
©2000 Addison Wesley
ARM single-cycle instruction pipeline operation
1
2
3
instruction
f et ch
decode
execut e
f et ch
decode
execut e
f et ch
decode
execut e
time
©2000 Addison Wesley
ARM multi-cycle instruction pipeline operation
1
f et chADD decode
2
execut e
f et ch STR decode calc. addr. dat a xf er
3
4
5
f et chADD
decode
f et chADD
execut e
decode
f et chADD decode
instruction
time
©2000 Addison Wesley
execut e
execut e
next
pc
+4
fetch
I-cache
pc + 4
pc + 8
ARM9TDMI 5-stage
pipeline organization
I decode
r15
instruction
decode
register read
immediate
fields
mul
LDM/
STM
+4
postindex
reg
shift
shift
pre-index
execute
ALU
forwarding
paths
mux
B, BL
MOV pc
SUBS pc
byte repl.
load/store
address
D-cache
buffer/
data
rot/sgn ex
LDR pc
register write
©2000 Addison Wesley
write-back
Data processing instruction datapath activity
address regi ster
address regi ster
incr ement
Rd
PC
incr ement
Rd
PC
reg isters
Rn
reg isters
Rm
Rn
mul t
mul t
as i ns.
as i ns.
as i nstruction
as i nstruction
[7:0]
data out
data in
i. pi pe
(a) re gister - re gister ope rations
©2000 Addison Wesley
data out
data in
i. pi pe
(b) re gister - immediate ope rations
STR (store register) datapath activity
address regi ster
address reg ister
incr ement
PC
reg isters
incr ement
Rn PC
reg ister s
Rn
Rd
mul t
mul t
lsl #0
shifter
= A / A+ B / A- B
=A+ B/ A- B
[1 1:0]
data out
data in
i. pi pe
(a) 1st cy c le - c ompute addr
e ss
©2000 Addison Wesley
byte?
data in
i. pi pe
(b) 2nd cy cle - store data & auto-index
The first two (of three) cycles of a branch instruction
address regi ster
address regi ster
incr ement
incr ement
R14
reg ister s
reg isters
PC
PC
mul t
mul t
lsl #2
shifter
= A+ B
=A
[23:0]
data out
data in
i. pi pe
(a) 1st c yc le - compute branch tar
get
©2000 Addison Wesley
data out
data in
i. pi pe
(b) 2nd c yc le - sav e re turn addre ss
2-phase non-overlapping clock scheme
phas e 1
phas e 2
1 c lock c y cle
©2000 Addison Wesley
ARM datapath timing
ALU operands
latched
phase 1
register
read
time
phase 2
read bus v alid
shif t time
precharge
inv alidat es
shif t out v alid buses
register
writ e time
ALU time
ALU out
©2000 Addison Wesley
The original ARM1 ripple-carry adder circuit
Cout
A
B
s um
Cin
©2000 Addison Wesley
The ARM2 4-bit carry look-ahead scheme
Cout [3]
A[3:0]
G
4-bit
adder
logic
P
B[3:0]
Cin[0]
©2000 Addison Wesley
s um[ 3: 0]
The ARM2 ALU logic for one result bit
fs:
5
NB
bus
01 23
4
carr y
logi c
G
ALU
bus
P
NA
bus
©2000 Addison Wesley
ARM2 ALU function codes
fs 5
0
0
0
0
0
1
0
0
0
0
0
©2000 Addison Wesley
fs 4
0
0
0
1
1
1
0
0
0
0
0
fs 3
0
1
1
1
0
0
0
0
0
1
1
fs 2
1
0
0
0
1
1
0
0
1
0
1
fs 1
0
0
0
0
1
1
0
0
0
1
0
fs 0
0
0
1
1
0
0
0
1
1
0
0
ALU o ut p ut
A and B
A and not B
A xor B
A plus not B plus carry
A plus B plus carry
not A plus B plus carry
A
A or B
B
not B
zero
The ARM6 carry-select adder scheme
a,b[3:0]
+
c
a,b[31:28]
+, +1
s
+, +1
s+1
mux
mux
mux
sum[ 3:0]sum[ 7:4] sum[ 15: 8]
©2000 Addison Wesley
sum[ 31: 16]
The ARM6 ALU organization
A operand lat ch
B operand latch
inv ert B
inv ertA
XOR gat es
XOR gat es
f unction
logic f unctions
logic/arit hmet ic
adder
C in
C
V
result mux
N
zero detect
result
©2000 Addison Wesley
Z
ARM9 carry arbitration encoding
©2000 Addison Wesley
A
B
C
u
v
0
0
0
0
0
0
1
unknown
1
0
1
0
unknown
1
0
1
1
1
1
1
The cross-bar switch barrel shifter principle
right 3 right 2 right 1 no shift
in[3]
left 1
in[2]
left 2
in[1]
left 3
in[0]
out[0] out[1] out[2] out[3]
©2000 Addison Wesley
The 2-bit multiplication algorithm, Nth cycle
Carry - i n
0
1
©2000 Addison Wesley
Mul t i p l i e r
x0
x1
x2
x3
x0
x1
x2
x3
Shi ft
LSL #2N
LSL #2N
LSL #(2N + 1)
LSL #2N
LSL #2N
LSL #(2N + 1)
LSL #2N
LSL #2N
ALU
A+0
A+B
A– B
A– B
A+B
A+B
A– B
A+0
Carry - o ut
0
0
1
1
0
0
1
1
Carry-propagate (a) and carry-save (b) adder structures
(a)
(b)
©2000 Addison Wesley
A
B Cin
+
A
B Cin
+
A
B Cin
+
A
B Cin
+
Cout S
Cout S
Cout S
Cout S
A
A
A
A
B Cin
+
Cout S
B Cin
+
Cout S
B Cin
+
Cout S
B Cin
+
Cout S
ARM high-speed multiplier organization
initi ali zati on for MLA
registers
Rs >> 8 bit s/ cy cle
Rm
rotate sum and
carr y 8 bits/cycle
carry -save adders
partial sum
partial carry
ALU (add part ials)
©2000 Addison Wesley
ARM2 register cell circuit
writ e
ALU bus
A bus
B bus
©2000 Addison Wesley
read read
A
B
ARM register bank floorplan
A bus read decoders
B bus read decoders
writ e decoders
Vdd
Vss
ALU
bus
PC
bus
INC
bus
©2000 Addison Wesley
ALU
bus
PC
register cells
A bus
B bus
ARM core datapath buses
address register
increment er
Ad
PC
A
inc
B
register bank
multiplier
shif t out
W
ALU
shif t er
dat a in
instruction
Din
©2000 Addison Wesley
instruction pipe
dat a out
ARM control logic structure
instruction
coprocessor
decode
PLA
address
control
©2000 Addison Wesley
register
control
cycle
count
ALU
control
multiply
control
load/ st ore
multiple
shif t er
control

similar documents