### Computer Architecture, Part 3 - University of California, Santa Barbara

```Part III
The Arithmetic/Logic Unit
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 1
This presentation is intended to support the use of the textbook
Computer Architecture: From Microprocessors to Supercomputers,
Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated
regularly by the author as part of his teaching of the upper-division
course ECE 154, Introduction to Computer Architecture, at the
University of California, Santa Barbara. Instructors can use these
slides freely in classroom teaching and for other educational
purposes. Any other use is strictly prohibited. © Behrooz Parhami
Edition
Released
Revised
Revised
Revised
Revised
First
July 2003
July 2004
July 2005
Mar. 2006
Jan. 2007
Jan. 2008
Jan. 2009
Jan. 2011
Oct. 2014
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 2
III The Arithmetic/Logic Unit
Overview of computer arithmetic and ALU design:
• Review representation methods for signed integers
• Discuss algorithms & hardware for arithmetic ops
• Consider floating-point representation & arithmetic
Topics in This Part
Chapter 9
Number Representation
Chapter 10 Adders and Simple ALUs
Chapter 11 Multipliers and Dividers
Chapter 12 Floating-Point Arithmetic
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 3
Preview of Arithmetic Unit in the Data Path
Incr PC
jta
Next PC
ALUOvfl
(PC)
PC
(rs)
rs
rt
Instr
cache
inst
0
1
2
rd
31
imm
op
Br&Jump
Instruction fetch
Fig. 13.3
Oct. 2014
Register
writeback
Ovfl
Reg
file
ALU
(rt)
/
16
ALU
out
Data
cache
Data
out
Data
in
Func
0
32
SE / 1
Data
0
1
2
Register input
fn
RegDst
RegWrite
Reg access / decode
ALUSrc
ALUFunc
ALU operation
RegInSrc
DataWrite
Data access
Key elements of the single-cycle MicroMIPS data path.
Computer Architecture, The Arithmetic/Logic Unit
Slide 4
Computer Arithmetic as a Topic of Study
Brief overview article –
Encyclopedia of Info Systems,
Vol. 3, pp. 317-333
Our textbook’s treatment
of the topic falls between
the extremes (4 chaps.)
ECE 252B – Text:
Computer Arithmetic,
Oxford U Press, 2000
(2nd ed., 2010)
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 5
9 Number Representation
Arguably the most important topic in computer arithmetic:
• Affects system compatibility and ease of arithmetic
• Two’s complement, flp, and unconventional methods
Topics in This Chapter
9.1 Positional Number Systems
9.2 Digit Sets and Encodings
9.4 Signed Integers
9.5 Fixed-Point Numbers
9.6 Floating-Point Numbers
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 6
9.1 Positional Number Systems
Representations of natural numbers {0, 1, 2, 3, …}
||||| ||||| ||||| ||||| ||||| ||
27
11011
XXVII
sticks or unary code
Roman numerals
Fixed-radix positional representation with k digits
k–1
Value of a number: x = (xk–1xk–2 . . . x1x0)r = S xi r i
i=0
For example:
27 = (11011)two = (124) + (123) + (022) + (121) + (120)
Number of digits for [0, P]: k = logr (P + 1) = logr P + 1
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 7
Unsigned Binary Integers
0000
1111
1110
15
0
0001
1
14
0010
2
1101
0011
13
1100
12
1011
Turn x notches
counterclockwise
3
Inside: Natural number
Outside: 4-bit encoding
11
4
5
10
1010
0100
0101
12
11
10
15
1
2
0
3
4
5
9 8 7
6
6
9
1001
14
13
8
1000
7
0110
0111
Turn y notches
clockwise
to subtract y
Figure 9.1 Schematic representation of 4-bit code for
integers in [0, 15].
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 8
Representation Range and Overflow

Overflow region max
max  Overflow region
Numbers smaller
than max 
Numbers larger
than max 
Finite set of representable numbers
Figure 9.2 Overflow regions in finite number representation systems.
For unsigned representations covered in this section, max – = 0.
Example 9.2, Part d
Discuss if overflow will occur when computing 317 – 316 in a number
system with k = 8 digits in radix r = 10.
Solution
The result 86 093 442 is representable in the number system which
has a range [0, 99 999 999]; however, if 317 is computed en route to
the final result, overflow will occur.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 9
9.2 Digit Sets and Encodings
Conventional and unconventional digit sets
 Decimal digits in [0, 9]; 4-bit BCD, 8-bit ASCII
 Hexadecimal, or hex for short: digits 0-9 & a-f
 Conventional ternary digit set in [0, 2]
Conventional digit set for radix r is [0, r – 1]
Symmetric ternary digit set in [–1, 1]
 Conventional binary digit set in [0, 1]
Redundant digit set [0, 2], encoded in 2 bits
( 0 2 1 1 0 )two and ( 1 0 1 0 2 )two represent 22
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 10
Carry-Save Numbers
Radix-2 numbers using the digits 0, 1, and 2
Example: (1 0 2 1)two = (123) + (022) + (221) + (120) = 13
Possible encodings
(a) Binary
(b) Unary
0
1
2
0
1
1
2
MSB
LSB
Oct. 2014
00
01
10
11 (Unused)
1 0 2 1
0 0 1 0 = 2
1 0 0 1 = 9
00
01 (First alternate)
10 (Second alternate)
11
First bit
Second bit
Computer Architecture, The Arithmetic/Logic Unit
1 0 2 1
0 0 1 1 = 3
1 0 1 0 = 10
Slide 11
Digit-set combination: {0, 1, 2} + {0, 1} = {0, 1, 2, 3} = {0, 2} + {0, 1}
This bit
being 1
represents
overflow
(ignore it)
Carry-save
input
Carry-save
Two
carry-save
inputs
Binary input
Carry-save
0
output
0
Carry-save
0
Figure 9.3 Adding a binary number or another
carry-save number to a carry-save number.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 12
Two ways to convert numbers from an old radix r to a new radix R
 Perform arithmetic in the new radix R
Horner’s rule:
(xk–1xk–2 . . . x1x0)r = (…((0 + xk–1)r + xk–2)r + . . . + x1)r + x0
(1 0 1 1 0 1 0 1)two = 0 + 1  1  2 + 0  2  2 + 1  5  2 + 1 
11  2 + 0  22  2 + 1  45  2 + 0  90  2 + 1  181
 Perform arithmetic in the old radix r
Divide the number by R, use the remainder as the LSD
and the quotient to repeat the process
19 / 3  rem 1, quo 6 / 3  rem 0, quo 2 / 3  rem 2, quo 0
Thus, 19 = (2 0 1)three
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 13
( xk 1xk 2
x0 )r  xk 1r
k 1
 xk 2r
k 2

 x1r  x0
 x0  r ( x1  r ( x2  r ( )))
Justifying Horner’s rule.
x
Binary representation of x/2
Figure 9.4
Oct. 2014
0
x mod 2
Justifying one step of the conversion of x to radix 2.
Computer Architecture, The Arithmetic/Logic Unit
Slide 14
9.4 Signed Integers
 We dealt with representing the natural numbers
 Signed or directed whole numbers = integers
{ . . . , 3, 2, 1, 0, 1, 2, 3, . . . }
 Signed-magnitude representation
+27 in 8-bit signed-magnitude binary code 0 0011011
–27 in 8-bit signed-magnitude binary code 1 0011011
–27 in 2-digit decimal code with BCD digits 1 0010 0111
 Biased representation
Represent the interval of numbers [N, P] by the unsigned
interval [0, P + N]; i.e., by adding N to every number
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 15
Two’s-Complement Representation
With k bits, numbers in the range [–2k–1, 2k–1 – 1] represented.
Negation is performed by inverting all bits and adding 1.
0000
1111
–1
1110
+0
0001
+1
0010
–2
+2
1101
0011
–3
1011
+
_
–4
1100
Turn x notches
counterclockwise
–5
+3
+4
+5
–6
1001
0100
–4
–5
–6
0101
–1
1
2
0
3
4
5
–7 –8 7
6
+6
–7
1010
–2
–3
–8
1000
+7
0110
0111
Turn 16 – y notches
counterclockwise to
Figure 9.5 Schematic representation of 4-bit 2’s-complement
code for integers in [–8, +7].
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 16
Conversion from 2’s-Complement to Decimal
Example 9.7
Convert x = (1 0 1 1 0 1 0 1)2’s-compl to decimal.
Solution
Given that x is negative, one could change its sign and evaluate –x.
Shortcut: Use Horner’s rule, but take the MSB as negative
–1  2 + 0  –2  2 + 1  –3  2 + 1  –5  2 + 0  –10  2 + 1
 –19  2 + 0  –38  2 + 1  –75
Sign Change for a 2’s-Complement Number
Example 9.8
Given y = (1 0 1 1 0 1 0 1)2’s-compl, find the representation of –y.
Solution
–y = (0 1 0 0 1 0 1 0) + 1 = (0 1 0 0 1 0 1 1)2’s-compl
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
(i.e., 75)
Slide 17
x
k
/
c in
y
k
/
k
/
k
/
xy
c out
y or
y
Figure 9.6
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 18
9.5 Fixed-Point Numbers
Positional representation: k whole and l fractional digits
Value of a number: x = (xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r = S xi r i
For example:
2.375 = (10.011)two = (121) + (020) + (021) + (122) + (123)
Numbers in the range [0, rk – ulp] representable, where ulp = r –l
Fixed-point arithmetic same as integer arithmetic
Two’s complement properties (including sign change) hold here as well:
(01.011)2’s-compl = (–021) + (120) + (02–1) + (12–2) + (12–3) = +1.375
(11.011)2’s-compl = (–121) + (120) + (02–1) + (12–2) + (12–3) = –0.625
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 19
Fixed-Point 2’s-Complement Numbers
1.111
–.125
1.110
0.000
+0
–.25
0.001
+.125
0.010
+.25
1.101
0.011
–.375
1.011
+
_
–.5
1.100
+.5
0.100
+.625
–.625
0.101
+.75
–.75
1.010
+.375
–.875
1.001
–1
1.000
+.875
0.110
0.111
Figure 9.7 Schematic representation of 4-bit 2’s-complement
encoding for (1 + 3)-bit fixed-point numbers in the range [–1, +7/8].
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 20
Convert the whole and fractional parts separately.
To convert the fractional part from an old radix r to a new radix R:
 Perform arithmetic in the new radix R
Evaluate a polynomial in r –1: (.011)two = 0  2–1 + 1  2–2 + 1  2–3
Simpler: View the fractional part as integer, convert, divide by r l
(.011)two = (?)ten
Multiply by 8 to make the number an integer: (011)two = (3)ten
Thus, (.011)two = (3 / 8)ten = (.375)ten
 Perform arithmetic in the old radix r
Multiply the given fraction by R, use the whole part as the MSD
and the fractional part to repeat the process
(.72)ten = (?)two
0.72  2 = 1.44, so the answer begins with 0.1
0.44  2 = 0.88, so the answer begins with 0.10
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 21
9.6 Floating-Point Numbers
Useful for applications where very large and very small
numbers are needed simultaneously
 Fixed-point representation must sacrifice precision
for small values to represent large values
x = (0000 0000 . 0000 1001)two
Small number
y = (1001 0000 . 0000 0000)two
Large number
 Neither y2 nor y / x is representable in the format above
 Floating-point representation is like scientific notation:
20 000 000 = 2  10 7
+0.000 000 007 = +7  10–9
Sign
Oct. 2014
Significand
Exponent base
Exponent
Computer Architecture, The Arithmetic/Logic Unit
Also, 7E9
Slide 22
ANSI/IEEE Standard Floating-Point Format (IEEE 754)
Revision (IEEE 754R) was completed in 2008: The revised version includes
16-bit and 128-bit binary formats, as well as 64- and 128-bit decimal formats
Short (32-bit) format
8 bits,
bias = 127,
–126 to 127
23 bits for fractional part
(plus hidden 1 in integer part)
Sign Exponent
11 bits,
bias = 1023,
–1022 to 1023
Short exponent range is –127 to 128
but the two extreme values
are reserved for special operands
(similarly for the long format)
Significand
52 bits for fractional part
(plus hidden 1 in integer part)
Long (64-bit) format
Figure 9.8
Oct. 2014
The two ANSI/IEEE standard floating-point formats.
Computer Architecture, The Arithmetic/Logic Unit
Slide 23
Short and Long IEEE 754 Formats: Features
Table 9.1 Some features of ANSI/IEEE standard floating-point formats
Feature
Word width in bits
Significand in bits
Significand range
Exponent bits
Exponent bias
Zero (±0)
Denormal
Single/Short
32
23 + 1 hidden
[1, 2 – 2–23]
8
127
e + bias = 0, f = 0
e + bias = 0, f ≠ 0
represents ±0.f  2–126
e + bias = 255, f = 0
e + bias = 255, f ≠ 0
e + bias  [1, 254]
e  [–126, 127]
represents 1.f  2e
Double/Long
64
52 + 1 hidden
[1, 2 – 2–52]
11
1023
e + bias = 0, f = 0
e + bias = 0, f ≠ 0
represents ±0.f  2–1022
e + bias = 2047, f = 0
e + bias = 2047, f ≠ 0
e + bias  [1, 2046]
e  [–1022, 1023]
represents 1.f  2e
min
2–126  1.2  10–38
2–1022  2.2  10–308
max
 2128  3.4  1038
 21024  1.8  10308
Subnormal
Infinity (∞)
Not-a-number (NaN)
Ordinary number
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 24
Addition is the most important arith operation in computers:
• Even the simplest computers must have an adder
• An adder, plus a little extra logic, forms a simple ALU
Topics in This Chapter
10.2 Carry Propagation Networks
10.3 Counting and Incrementation
10.5 Logic and Shift Operations
10.6 Multifunction ALUs
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 25
Inputs
x
Outputs
y
0
0
1
1
c
0
1
0
1
c
0
1
1
0
Digit-set interpretation:
{0, 1} + {0, 1}
= {0, 2} + {0, 1}
HA
s
Outputs
x
y
cin
cout
s
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
1
1
0
1
1
0
1
0
0
1
Figures 10.1/10.2
Oct. 2014
y
s
0
0
0
1
Inputs
x
x
cout
y
FA
cin
Digit-set interpretation:
{0, 1} + {0, 1} + {0, 1}
= {0, 2} + {0, 1}
s
Computer Architecture, The Arithmetic/Logic Unit
Slide 26
x
y
HA
c out
HA
c in
x
y
c out
s
(a) FA built of two HAs
x
y
c out
0
1
2
3
0
0
1
2
3
1
c in
s
(b) CMOS mux-based FA
c in
s
(c) Two-level AND-OR FA
of two 4-input multiplexers, and as two-level gate network.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 27
x31
y31
c32
cout
c31
FA
Oct. 2014
. . .
y1
c2
x0
y0
c1
c0
FA
FA
s1
s0
Critical path
s31
Figure 10.4
x1
cin
Ripple-carry binary adder with 32-bit inputs and output.
Computer Architecture, The Arithmetic/Logic Unit
Slide 28
Carry Chains and Auxiliary Signals
15 14 13 12
----------1 0 1 1
Bit positions
11 10 9 8
----------0 1 1 0
7 6 5 4
----------0 1 1 0
cout 0 1 0 1
1 0 0 1
1 1
\__________/\__________________/
4
6
g = xy
Oct. 2014
0
3 2 1 0
----------1 1 1 0
0
0 0 1 1 cin
\________/\____/
3
2
Carry chains and their lengths
Computer Architecture, The Arithmetic/Logic Unit
p=xy
Slide 29
Carry Chains Illustrated with Dominoes
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 30
10.2 Carry Propagation Networks
gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
xi
g k2 p k2
yi
gi = xi yi
pi = xi  yi
g i+1 p i+1
gi
pi
...
...
g1 p1
g0 p0
c0
Carry network
ck
c k1
...
c k2
ci
c i+1
...
c1
c0
si
Figure 10.5 The main part of an adder is the carry network. The rest
is just a set of gates to produce the g and p signals and the sum bits.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 31
The carry recurrence: ci+1 = gi  pi ci
Latency of k-bit adder is roughly 2k gate delays:
1 gate delay for production of p and g signals, plus
2(k – 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits
gk1 pk1
gk2 pk2
g1
p1
...
ck
ck1
Figure 10.6
Oct. 2014
ck2
c2
c1
g0
p0
c0
The carry propagation network of a ripple-carry adder.
Computer Architecture, The Arithmetic/Logic Unit
Slide 32
The Complete Design of a Ripple-Carry Adder
gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
gk1 pk1
ck
ck
xi
g k2 p k2
c k1
gi = xi yi
pi = xi  yi
g i+1 p i+1
gi
pi
...
...
gk2 pk2
g1
...
ck1
yi
ck2
c2
p1
g0
gk2 pk2
c
Carry network
c
c
c 1 ck
...
c k2
p0 pk1
gk1
ci
c i+1
g1
p1
...
0
k1
g1 p1
k2
c2
c1
...
c1
g0 p0
g0
c0
p0
c0
c0
si
Figure 10.6 (ripple-carry network) superimposed on
Figure 10.5 (general structure of an adder).
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 33
First Carry Speed-Up Method: Carry Skip
g4j+3 p4j+3
c4j+4
c4j+3
g4j+2 p4j+2
c4j+2
g4j+1
p4j+1
c4j+1
g4j
p4j
c4j
One-way street
Freeway
Figures 10.7/10.8
A 4-bit section of a ripple-carry network
with skip paths and the driving analogy.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 34
Mux-Based Skip Carry Logic
g4j+3 p4j+3
g4j+2 p4j+2
g4j+1
p4j+1
g4j
p4j
Fig. 10.7
c4j+4
c4j+3
g4j+3 p4j+3
c4j+4
0
1c4j+4
p[4j, 4j+3]
c4j+3
c4j+2
g4j+2 p4j+2
c4j+1
g4j+1
c4j+2
p4j+1
c4j
g4j
p4j
c4j+1
c4j
The carry-skip adder of Fig. 10.7 works fine if we begin with a
clean slate, where all signals are 0s; otherwise, it will run into
problems, which do not exist in this mux-based implementation
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 35
10.3 Counting and Incrementation
k
/
k
/
Data in
IncrInit
0
k
/
Count
register
D
Q
1
k
/
c in
_
C
Q
k
Update
a
(Increment
amount)
Figure 10.9
Oct. 2014
c out
Schematic diagram of an initializable synchronous counter.
Computer Architecture, The Arithmetic/Logic Unit
Slide 36
Circuit for Incrementation by 1
Figure 10.6
Substantially
simpler than
gk1 pk1
ck
xk1
ck
0g1
gk2 pk2
sk1
x1
0g0
p0
x0
...
ck1
ck2
c2
xk2
ck1
p1
x1
ck2
sk2
...
c2
s2
c0
c1
1
x0
c1
s1
s0
Figure 10.10 Carry propagation network
and sum logic for an incrementer.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 37
 Carries can be computed directly without propagation
 For example, by unrolling the equation for c3, we get:
c 3 = g2  p2 c 2 = g2  p2 g1  p2 p1 g0  p2 p1 p0 c 0
 We define “generate” and “propagate” signals for a block
extending from bit position a to bit position b as follows:
g[a,b] = gb  pb gb–1  pb pb–1 gb–2  . . .  pb pb–1 … pa+1 ga
p[a,b] = pb pb–1 . . . pa+1 pa
 Combining g and p signals for adjacent blocks:
g[h,j] = g[i+1,j]  p[i+1,j] g[h,i]
p[h,j] = p[i+1,j] p[h,i]
j
i+1 i
h
[h, j] = [i + 1, j] ¢ [h, i]
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 38
Carries as Generate Signals for Blocks [ 0, i ]
gi pi
Carry is:
0
0
1
1
annihilated or killed
propagated
generated
(impossible)
0
1
0
1
g k1 p k1
xi
g k2 p k2
yi
Assuming c0 = 0,
we have ci = g [0,i –1]
g i+1 p i+1
gi
pi
...
...
g1 p1
g0 p0
c0
Carry network
ck
c k1
Figure 10.5
Oct. 2014
...
c k2
ci
c i+1
...
c1
c0
si
Computer Architecture, The Arithmetic/Logic Unit
Slide 39
Second Carry Speed-Up Method: Carry Lookahead
[7, 7 ]
[6, 6 ]
¢
[5, 5 ]
[4, 4 ] [3, 3 ]
¢
[2, 2 ]
[1, 1 ]
¢
[6, 7 ]
[0, 0 ]
g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]
¢
[2, 3 ]
[4, 5 ]
[0, 1 ]
¢
¢
[4, 7 ]
[0, 3 ]
¢
¢
¢
[0, 7 ]
[0, 6 ]
¢
[0, 5 ]
[0, 4 ] [0, 3 ]
¢
[0, 2 ]
g [0, 1] p [0, 1]
[0, 1 ]
[0, 0 ]
along with details of one of the carry operator blocks.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 40
Recursive Structure of Brent-Kung Carry Network
[7, 7]
[6, 6]
[5, 5]
[4, 4]
[3, 3]
[2, 2]
[1, 1]
[0, 0]
[7, 7 ]
¢
¢
¢
¢
[6, 6 ]
¢
[5, 5 ]
[4, 4 ] [3, 3 ]
¢
[2, 2 ]
¢
[6, 7 ]
[2, 3 ]
[0, 1 ]
¢
¢
[4, 7 ]
[0, 3 ]
¢
¢
¢
¢
¢
[0, 6]
[0, 5]
[0, 4]
¢
¢
¢
[0, 7 ]
[0, 7]
[0, 0 ]
¢
[4, 5 ]
4-input Brent-Kung
carry network
[1, 1 ]
[0, 3]
[0, 2]
[0, 1]
[0, 6 ]
[0, 5 ]
[0, 4 ] [0, 3 ]
[0, 2 ]
[0, 1 ]
[0, 0 ]
[0, 0]
with only its top and bottom rows of carry-operators shown.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 41
An Alternate Design: Kogge-Stone Network
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
¢
c7 = g [0,6]
c8 = g [0,7]
c5 = g [0,4]
c6 = g [0,5]
¢
c3 = g [0,2]
c4 = g [0,3]
c1 = g [0,0]
c2 = g [0,1]
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 42
Brent-Kung vs. Kogge-Stone Carry Network
[7, 7 ]
[6, 6 ]
[5, 5 ]
¢
[4, 4 ]
¢
[3, 3 ]
[2, 2 ]
[1, 1 ]
¢
[6, 7 ]
[0, 0 ]
g [1, 1] p [1, 1]
g [0, 0]
p [0, 0]
¢
[2, 3 ]
[4, 5 ]
[0, 1 ]
¢
¢
[4, 7 ]
[0, 3 ]
¢
¢
¢
[0, 7 ]
[0, 6 ]
¢
[0, 5 ]
[0, 4 ]
¢
[0, 3 ]
[0, 2 ]
g [0, 1] p [0, 1]
[0, 1 ]
11 carry operators
4 levels
Oct. 2014
[0, 0 ]
17 carry operators
3 levels
Computer Architecture, The Arithmetic/Logic Unit
Slide 43
p i+3
g i+3
p i+2
g i+2
p i+1 g i+1 p i
gi
g [i, i+3]
Intermeidte carries
p [i, i+3]
Block signal generation
ci
c i+3
c i+2
c i+1
Figure 10.13
with four-way grouping of bits.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 44
Third Carry Speed-Up Method: Carry Select
x
c out
c in
Version 0
of sum bits
c out
0
0
1
s
Figure 10.14
Oct. 2014
c
y
[a, b]
[a, b]
c in
Version 1
of sum bits
a
[a, b]
1
The lower
a positions,
(0 to a – 1)
as usual
Computer Architecture, The Arithmetic/Logic Unit
Slide 45
10.5 Logic and Shift Operations
Conceptually, shifts can be implemented by multiplexing
Right-shifted
values
Left-shifted
values
00...0, x
Right’Left
Shift amount
5
00, x[30, 2]
0, x[31, 1]
x[31, 0]
32
32 32
0
1
6-bit code specifying
shift direction & amount
Oct. 2014
x[30, 0], 0
x[31, 0]
. . .
32
32
31
32
32
33
. . .
32
62
32
63
Multiplexer
6
Figure 10.15
2
x, 00...0
x[1, 0], 00...0
32
Multiplexer-based logical shifting unit.
Computer Architecture, The Arithmetic/Logic Unit
Slide 46
Arithmetic Shifts
Purpose: Multiplication and division by powers of 2
sra \$t0,\$s1,2
srav \$t0,\$s1,\$s0
op
31
R
25
20
rt
15
rd
10
sh
5
fn
0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1
ALU
instruction
op
31
R
rs
# \$t0  (\$s1) right-shifted by 2
# \$t0  (\$s1) right-shifted by (\$s0)
Unused
25
rs
Source
register
20
rt
Destination
register
15
rd
Shift
amount
10
sh
sra = 3
5
fn
0
0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1
ALU
instruction
Figure 10.16
Oct. 2014
Amount
register
Source
register
Destination
register
Unused
srav = 7
The two arithmetic shift instructions of MiniMIPS.
Computer Architecture, The Arithmetic/Logic Unit
Slide 47
Practical Shifting in Multiple Stages
00
01
10
11
No shift
Logical left
Logical right
Arith right
0
x, x[31, 1]
0, x[31, 1]
x[30, 0], 0
x[31, 0]
32
32
32
32
0
2
1
2
2
2
3
y[31, 0]
0
1
2
(0 or 2)-bit shift
3
2
z[31, 0]
3
Multiplexer
0
32
2
(a) Single-bit shifter
Figure 10.17
Oct. 2014
1
(0 or 4)-bit shift
1
2
(0 or 1)-bit shift
3
(b) Shifting by up to 7 bits
Multistage shifting in a barrel shifter.
Computer Architecture, The Arithmetic/Logic Unit
Slide 48
Bit Manipulation via Shifts and Logical Operations
Bits 10-15
AND with mask to isolate a field: 0000 0000 0000 0000 1111 1100 0000 0000
Right-shift by 10 positions to move field to the right end of word
The result word ranges from 0 to 63, depending on the field pattern
32-pixel (4  8) block of
black-and-white image:
Row 0
Row 1
Row 2
Row 3
Representation
as 32-bit word:
1010 0000 0101 1000 0000 0110 0001 0111
Hex equivalent:
0xa0a80617
Figure 10.18 A 4  8 block of a black-and-white
image represented as a 32-bit word.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 49
10.6 Multifunction ALUs
Arith fn (add, sub, . . .)
Operand 1
Arith
unit
0
Result
Operand 2
1
Logic
unit
Select fn type
(logic or arith)
Logic fn (AND, OR, . . .)
General structure of a simple arithmetic/logic unit.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 50
ConstVar
Shift function
Constant
5
amount
0
Amount
5
1
5
Variable
amount
2
00
01
10
11
No shift
Logical left
Logical right
Arith right
Shifter
Function
class
32
5 LSBs
00
01
10
11
An ALU for
MiniMIPS
Shift
Set less
Arithmetic
Logic
2
Shifted y
0
x
y
0 or 1
c0
32
32
k
/
c 31
xy
s
MSB
32
2
32
Shorthand
symbol
for ALU
1
Control
c 32
3
x
Func
s
ALU
Logic
unit
AND
OR
XOR
NOR
00
01
10
11
Ovfl
y
32input
NOR
Zero
2
Logic function
Zero
Ovfl
Figure 10.19 A multifunction ALU with 8 control signals (2 for
function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 51
11 Multipliers and Dividers
Modern processors perform many multiplications & divisions:
• Encryption, image compression, graphic rendering
• Hardware vs programmed shift-add/sub algorithms
Topics in This Chapter
11.2 Hardware Multipliers
11.3 Programmed Multiplication
11.4 Shift-Subtract Division
11.5 Hardware Dividers
11.6 Programmed Division
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 52
Multiplicand
Multiplier
x
y
Partial
products
bit-matrix
y0
y
1
y2
y3
Product
z
Figure 11.1
x
x
x
x
20
21
22
23
Multiplication of 4-bit numbers in dot notation.
z(j+1) = (z(j) + yj x 2k) 2–1 with z(0) = 0 and z(k) = z
|–– shift right ––|
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 53
Binary and Decimal Multiplication
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0
4
+y0x2
1 0 1 0
––––––––––––––––––––––––––
2z (1)
0 1 0 1 0
(1)
z
0 1 0 1 0
4
+y1x2
1 0 1 0
––––––––––––––––––––––––––
2z (2)
0 1 1 1 1 0
(2)
z
0 1 1 1 1 0
4
+y2x2
0 0 0 0
––––––––––––––––––––––––––
2z (3)
0 0 1 1 1 1 0
(3)
z
0 0 1 1 1 1 0
4
+y3x2
0 0 0 0
––––––––––––––––––––––––––
2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================
Figure 11.2
Oct. 2014
Example 11.1
Position
7 6 5 4 3 2 1 0
=========================
x104
3 5 2 8
y
4 0 6 7
=========================
z (0)
0 0 0 0
4
+y0x10 2 4 6 9 6
––––––––––––––––––––––––––
10z (1)
2 4 6 9 6
(1)
z
0 2 4 6 9 6
4
+y1x10 2 1 1 6 8
––––––––––––––––––––––––––
10z (2)
2 3 6 3 7 6
(2)
z
2 3 6 3 7 6
4
+y2x10 0 0 0 0 0
––––––––––––––––––––––––––
10z (3)
0 2 3 6 3 7 6
(3)
z
0 2 3 6 3 7 6
4
+y3x10 1 4 1 1 2
––––––––––––––––––––––––––
10z (4)
1 4 3 4 8 3 7 6
(4)
z
1 4 3 4 8 3 7 6
=========================
Step-by-step multiplication examples for 4-digit unsigned numbers.
Computer Architecture, The Arithmetic/Logic Unit
Slide 54
Two’s-Complement Multiplication
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
0 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0
––––––––––––––––––––––––––
2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0
––––––––––––––––––––––––––
2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0
––––––––––––––––––––––––––
2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(–y3x2 ) 0 0 0 0 0
––––––––––––––––––––––––––
2z (4)
1 1 1 0 1 1 1 0
(4)
z
1 1 1 0 1 1 1 0
=========================
Figure 11.3
Oct. 2014
Example 11.2
Position
7 6 5 4 3 2 1 0
=========================
x24
1 0 1 0
y
1 0 1 1
=========================
z (0)
0 0 0 0 0
4
+y0x2
1 1 0 1 0
––––––––––––––––––––––––––
2z (1)
1 1 0 1 0
(1)
z
1 1 1 0 1 0
4
+y1x2
1 1 0 1 0
––––––––––––––––––––––––––
2z (2)
1 0 1 1 1 0
(2)
z
1 1 0 1 1 1 0
4
+y2x2
0 0 0 0 0
––––––––––––––––––––––––––
2z (3)
1 1 0 1 1 1 0
(3)
z
1 1 1 0 1 1 1 0
4
+(–y3x2 ) 0 0 1 1 0
––––––––––––––––––––––––––
2z (4)
0 0 0 1 1 1 1 0
(4)
z
0 0 0 1 1 1 1 0
=========================
Step-by-step multiplication examples for 2’s-complement numbers.
Computer Architecture, The Arithmetic/Logic Unit
Slide 55
11.2 Hardware Multipliers
Shift
Multiplier y
Doublewidth partial product z (j)
Hi
Shift
Lo
Multiplicand x
0
Mux
1
yj
Enable
Select
c out
Figure 11.4
Oct. 2014
c in
Hardware multiplier based on the shift-add algorithm.
Computer Architecture, The Arithmetic/Logic Unit
Slide 56
Sum
cout
/k
Partial product
/k–1
Multiplier
/k–1
/k
yj
Figure11.5
Shifting incorporated in the connections to
the partial product register rather than as a separate phase.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 57
Multiplicand
Multiplier
x
y
0, x, 2x, or 3x
Product
z
z(j+1) = (z(j) + yj x 2k) 4–1 with z(0) = 0 and z(k/2) = z
Assume k even
|–– shift right ––|
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 58
Tree Multipliers
All partial products
Several partial products
...
...
Large tree of
carry-save
Logdepth
Logdepth
Small tree of
carry-save
Product
Product
(a) Full-tree multiplier
Figure 11.6
Oct. 2014
(b) Partial-tree multiplier
Schematic diagram for full/partial-tree multipliers.
Computer Architecture, The Arithmetic/Logic Unit
Slide 59
Array Multipliers
x3 0
x2 0
x1 0
0
0 Carry-save
0
0
Carry-save
input
bit
g1
ents
ow
e it)
MA
Binary
0 input
MA
s
Carry-save
MA
0
output
0
Figure 9.3a
(Recalling
carry-save
MA
MA
Two
carry-save
inputs
y0
c
MA
MA
0
MA
MA
x0 0
MA
MA
MA
MA
MA
y1
y2
z0
0
Carry-save Our original
z1
0
y3 z2
MA
MA
0
z3
FA
Figure 11.7
Oct. 2014
representing
multiplication
FA
FA
HA
z7
z6
z5
Straightened
dots to depict
array multiplier
to the left
z4
Array multiplier for 4-bit unsigned operands.
Computer Architecture, The Arithmetic/Logic Unit
Slide 60
11.3 Programmed Multiplication
MiniMIPS instructions related to multiplication
mult
multu
mfhi
mflo
\$s0,\$s1
\$s2,\$s3
\$t0
\$t1
#
#
#
#
set
set
set
set
Hi,Lo to (\$s0)(\$s1); signed
Hi,Lo to (\$s2)(\$s3); unsigned
\$t0 to (Hi)
\$t1 to (Lo)
Example 11.3
Finding the 32-bit product of 32-bit integers in MiniMIPS
Multiply; result will be obtained in Hi,Lo
For unsigned multiplication:
Hi should be all-0s and Lo holds the 32-bit result
For signed multiplication:
Hi should be all-0s or all-1s, depending on the sign bit of Lo
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 61
Emulating a Hardware Multiplier in Software
Example 11.4 (MiniMIPS shift-add program for multiplication)
Shift
\$a1
(multiplier
Multiplier
y y)
\$v0Doublewidth
(Hi part of z)partial
\$v1product
(Lo part zof(j)z)
\$t0 (carry-out)
\$a0Multiplicand
(multiplicandx x)
0
Also, holds
LSB of Hi
during shift
Shift
Mux
1
\$t1 (bit j of y)
yj
Enable
Select
\$t2 (counter)
c out
c in
Part of the
control in
hardware
Figure 11.8 Register usage for programmed multiplication
superimposed on the block diagram for a hardware multiplier.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 62
Multiplication When There Is No Multiply Instruction
Example 11.4 (MiniMIPS shift-add program for multiplication)
shamu: move
move
mloop: move
move
srl
subu
subu
beqz
sltu
srl
subu
subu
sll
srl
sll
bne
jr
Oct. 2014
\$v0,\$zero
\$vl,\$zero
\$t2,\$zero,32
\$t0,\$zero
\$t1,\$a1
\$a1,1
\$t1,\$t1,\$a1
\$t1,\$t1,\$a1
\$v0,\$v0,\$a0
\$t0,\$v0,\$a0
\$t1,\$v0
\$v0,1
\$t1,\$t1,\$v0
\$t1,\$t1,\$v0
\$t0,\$t0,31
\$v0,\$v0,\$t0
\$v1,1
\$t1,\$t1,31
\$v1,\$v1,\$t1
\$t2,\$t2,-1
\$t2,\$zero,mloop
\$ra
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
initialize Hi to 0
initialize Lo to 0
init repetition counter to 32
set c-out to 0 in case of no add
copy (\$a1) into \$t1
halve the unsigned value in \$a1
subtract (\$a1) from (\$t1) twice to
obtain LSB of (\$a1), or y[j], in \$t1
no addition needed if y[j] = 0
add x to upper part of z
form carry-out of addition in \$t0
copy (\$v0) into \$t1
halve the unsigned value in \$v0
subtract (\$v0) from (\$t1) twice to
obtain LSB of Hi in \$t1
carry-out converted to 1 in MSB of \$t0
right-shifted \$v0 corrected
halve the unsigned value in \$v1
LSB of Hi converted to 1 in MSB of \$t1
right-shifted \$v1 corrected
decrement repetition counter by 1
if counter > 0, repeat multiply loop
Computer Architecture, The Arithmetic/Logic Unit
Slide 63
11.4 Shift-Subtract Division
x
Divisor
Subtracted
bit-matrix
y
z
Dividend
y 3 x 23
y 2 x 22
y 1 x 21
y 0 x 20
s
Figure11.9
Quotient
Remainder
Division of an 8-bit number by a 4-bit number in dot notation.
z(j) = 2z(j1)  ykj x 2k with z(0) = z and z(k) = 2k s
|shift|
|–– subtract ––|
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 64
Integer and Fractional Unsigned Division
Position
7 6 5 4 3 2 1 0
=========================
z
0 1 1 1 0 1 0 1
4
x2
1 0 1 0
=========================
z (0)
0 1 1 1 0 1 0 1
(0)
2z
0 1 1 1 0 1 0 1
4
–y3x2
1 0 1 0
y3=1
––––––––––––––––––––––––––
z (1)
0 1 0 0 1 0 1
(1)
2z
0 1 0 0 1 0 1
4
–y2x2
0 0 0 0
y2=0
––––––––––––––––––––––––––
z (2)
1 0 0 1 0 1
(2)
2z
1 0 0 1 0 1
4
–y1x2
1 0 1 0
y1=1
––––––––––––––––––––––––––
z (3)
1 0 0 0 1
(3)
2z
1 0 0 0 1
4
–y0x2
1 0 1 0
y0=1
––––––––––––––––––––––––––
z (4)
0 1 1 1
s
0 1 1 1
y
1 0 1 1
=========================
Figure 11.10
Oct. 2014
Example 11.5
Position
–1 –2 –3 –4 –5 –6 –7 –8
==========================
z
.1 4 3 5 1 5 0 2
x
.4 0 6 7
==========================
z (0)
.1 4 3 5 1 5 0 2
(0)
10z
1.4 3 5 1 5 0 2
–y–1x
1.2 2 0 1
y–1=3
–––––––––––––––––––––––––––
z (1)
.2 1 5 0 5 0 2
(1)
10z
2.1 5 0 5 0 2
–y–2x
2.0 3 3 5
y–2=5
–––––––––––––––––––––––––––
z (2)
.1 1 7 0 0 2
(2)
10z
1.1 7 0 0 2
–y–3x
0.8 1 3 4
y–3=2
–––––––––––––––––––––––––––
z (3)
.3 5 6 6 2
(3)
10z
3.5 6 6 2
–y–4x
3.2 5 3 6
y–4=8
–––––––––––––––––––––––––––
z (4)
.3 1 2 6
s
.0 0 0 0 3 1 2 6
y
.3 5 2 8
==========================
Division examples for binary integers and decimal fractions.
Computer Architecture, The Arithmetic/Logic Unit
Slide 65
Division with Same-Width Operands
Position
7 6 5 4 3 2 1 0
=========================
z
0 0 0 0 1 1 0 1
4
x2
0 1 0 1
=========================
z (0)
0 0 0 0 1 1 0 1
(0)
2z
0 0 0 1 1 0 1
4
–y3x2
0 0 0 0
y3=0
––––––––––––––––––––––––––
z (1)
0 0 0 1 1 0 1
(1)
2z
0 0 1 1 0 1
4
–y2x2
0 0 0 0
y2=0
––––––––––––––––––––––––––
z (2)
0 0 1 1 0 1
(2)
2z
0 1 1 0 1
4
–y1x2
0 1 0 1
y1=1
––––––––––––––––––––––––––
z (3)
0 0 0 1 1
(3)
2z
0 0 1 1
4
–y0x2
1 0 1 0
y0=0
––––––––––––––––––––––––––
z (4)
0 0 1 1
s
0 0 1 1
y
0 0 1 0
=========================
Figure 11.11
Oct. 2014
Example 11.6
Position
–1 –2 –3 –4 –5 –6 –7 –8
==========================
z
.0 1 0 1
x
.1 1 0 1
==========================
z (0)
.0 1 0 1
(0)
2z
0.1 0 1 0
–y–1x
0.0 0 0 0
y–1=0
–––––––––––––––––––––––––––
z (1)
.1 0 1 0
(1)
2z
1.0 1 0 0
–y–2x
0.1 1 0 1
y–2=1
–––––––––––––––––––––––––––
z (2)
.0 1 1 1
(2)
2z
0.1 1 1 0
–y–3x
0.1 1 0 1
y–3=1
–––––––––––––––––––––––––––
z (3)
.0 0 0 1
(3)
2z
0.0 0 1 0
–y–4x
0.0 0 0 0
y–4=0
–––––––––––––––––––––––––––
z (4)
.0 0 1 0
s
.0 0 0 0 0 0 1 0
y
.0 1 1 0
==========================
Division examples for 4/4-digit binary integers and fractions.
Computer Architecture, The Arithmetic/Logic Unit
Slide 66
Signed Division
Method 1 (indirect): strip operand signs, divide, set result signs
Dividend
z=5
z=5
z = –5
z = –5
Divisor
x=3
x = –3
x=3
x = –3




Quotient
y=1
y = –1
y = –1
y=1
Remainder
s=2
s=2
s = –2
s = –2
Method 2 (direct 2’s complement): develop quotient with digits
–1 and 1, chosen based on signs, convert to digits 0 and 1
Restoring division: perform trial subtraction, choose 0 for q digit
if partial remainder negative
Nonrestoring division: if sign of partial remainder is correct,
then subtract (choose 1 for q digit) else add (choose –1)
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 67
11.5 Hardware Dividers
Shift
Quotient y
Partial remainder z (j) (initially z)
Hi
Shift
Lo
Quotient
digit
selector
Divisor x
0 Mux 1
yk – j
Enable
1
Select
c
out
Trial difference
Figure 11.12
Oct. 2014
c
in
1
(Always
subtract)
Hardware divider based on the shift-subtract algorithm.
Computer Architecture, The Arithmetic/Logic Unit
Slide 68
The Shift Part of Shift-Subtract
qk–j
/k
Partial remainder
/k
Quotient
/k
MSB
/k
Figure 11.13 Shifting incorporated in the connections to the
partial remainder register rather than as a separate phase.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 69
x
Divisor
y
Quotient
z
Dividend
0, x, 2x, or 3x
s
Remainder
z(j) = 4z(j1)  (yk2j+1 yk2j)two x 2k with z(0) = z and z(k/2) = 2ks
|shift|
Assume k even
|––––––– subtract –––––––|
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 70
Array Dividers
z7
y3
x3
z6
MS
x1
MS
x0
z5
MS
b
y2
x2
z4
MS
MS
z3
0
z2
d
MS
MS
MS
0
z1
y1
MS
MS
MS
MS
Our original
dot-notation
for division
0
z0
y0
MS
s3
MS
s2
Figure 11.14
Oct. 2014
MS
s1
MS
0
s0
Straightened
dots to depict
an array divider
Array divider for 8/4-bit unsigned integers.
Computer Architecture, The Arithmetic/Logic Unit
Slide 71
11.6 Programmed Division
MiniMIPS instructions related to division
div
divu
mfhi
mflo
\$s0,\$s1
\$s2,\$s3
\$t0
\$t1
#
#
#
#
Lo = quotient, Hi = remainder
unsigned version of division
set \$t0 to (Hi)
set \$t1 to (Lo)
Example 11.7
Compute z mod x, where z (singed) and x > 0 are integers
Divide; remainder will be obtained in Hi
if remainder is negative,
then add |x| to (Hi) to obtain z mod x
else Hi holds z mod x
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 72
Emulating a Hardware Divider in Software
Example 11.8 (MiniMIPS shift-add program for division)
Shift
\$a1
(quotient
Quotient
y y)
\$v0
Partial
(Hi part
remainder
of z)
\$v1
z (j) (Lo
(initially
part of
z)z)
Shift
\$t0 (MSB of Hi)
yk – j
\$t1 (bit k j of y)
\$a0
Divisor
(divisorx x)
0
1
Mux
Enable
1
Quotient
digit
selector
Select
\$t2 (counter)
c out
Trial difference
c in
1
(Always
subtract)
Part of the
control in
hardware
Figure 11.15 Register usage for programmed division superimposed
on the block diagram for a hardware divider.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 73
Division When There Is No Divide Instruction
Example 11.7 (MiniMIPS shift-subtract program for division)
shsdi: move
move
dloop: slt
sll
slt
or
sll
sge
or
sll
or
beq
subu
bne
move
jr
Oct. 2014
\$v0,\$a2
\$vl,\$a3
\$t2,\$zero,32
\$t0,\$v0,\$zero
\$v0,\$v0,1
\$t1,\$v1,\$zero
\$v0,\$v0,\$t1
\$v1,\$v1,1
\$t1,\$v0,\$a0
\$t1,\$t1,\$t0
\$a1,\$a1,1
\$a1,\$a1,\$t1
\$t1,\$zero,nosub
\$v0,\$v0,\$a0
\$t2,\$t2,-1
\$t2,\$zero,dloop
\$v1,\$a1
\$ra
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
initialize Hi to (\$a2)
initialize Lo to (\$a3)
initialize repetition counter to 32
copy MSB of Hi into \$t0
left-shift the Hi part of z
copy MSB of Lo into \$t1
move MSB of Lo into LSB of Hi
left-shift the Lo part of z
quotient digit is 1 if (Hi)  x,
or if MSB of Hi was 1 before shifting
shift y to make room for new digit
copy y[k-j] into LSB of \$a1
if y[k-j] = 0, do not subtract
subtract divisor x from Hi part of z
decrement repetition counter by 1
if counter > 0, repeat divide loop
copy the quotient y into \$v1
Computer Architecture, The Arithmetic/Logic Unit
Slide 74
Divider vs Multiplier: Hardware Similarities
Shift
Quotient y
Partial remainder z (j) (initially z)
Shift
yk – j
Multiplier y
Doublewidth partial product z (j)
Shift
Shift
Quotient
digit
selector
Divisor x
Enable
0 Mux 1
Multiplicand x
0
1
1
Mux
yj
Enable
Select
Select
Figure 11.12
c
out
c
in
Trial difference
0
x3 0
0
MA
x2 0
0
MA
x1 0
0
MA
Figure 11.4
x0 0
z7
y3
MA
c out
1
(Always
subtract)
y0
x3
z6
MS
x2
x1
z5
MS
c in
x0
z4
MS
MS
0
MA
MA
MA
MA
y1
z0
y2
z1
y3
z2
0
MA
MA
MA
MA
MS
MS
MS
MS
MA
MA
MA
MS
MS
MS
MS
MS
z3
FA
FA
HA
z7
z6
z5
Straightened
dots to depict
array multiplier
to the left
Figure 11.14
Oct. 2014
z4
0
Our original
dot-notation
for division
0
z0
y0
MS
MS
MS
0
FA
0
z1
y1
0
MA
z3
z2
y2
Our original
dot-notation
representing
multiplication
s3
s2
s1
s0
Turn upside-down
Computer Architecture, The Arithmetic/Logic Unit
0
Straightened
dots to depict
an array divider
Figure 11.7
Slide 75
12 Floating-Point Arithmetic
Floating-point is no longer reserved for high-end machines
• Multimedia and signal processing require flp arithmetic
• Details of standard flp format and arithmetic operations
Topics in This Chapter
12.1 Rounding Modes
12.2 Special Values and Exceptions
12.4 Other Floating-Point Operations
12.5 Floating-Point Instructions
12.6 Result Precision and Errors
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 76
12.1 Rounding Modes
Short (32-bit) format
8 bits,
bias = 127,
–126 to 127
23 bits for fractional part
(plus hidden 1 in integer part)
Sign Exponent
0, , NaN
IEEE 754
Format
Denormals:
0.f  2emin
1.f  2e
Significand
11 bits,
bias = 1023,
–1022 to 1023
52 bits for fractional part
(plus hidden 1 in integer part)
Denormals allow
graceful underflow
Long (64-bit) format
–
Negative numbers
max –
FLP –
min –
Sparser
Overflow
region
0
Denser
Positive numbers
min +
FLP +
Denser
Figure 12.1
Oct. 2014
Underflow
example
+
Sparser
Underflow
regions
Midway
example
max +
Overflow
region
Typical
example
Overflow
example
Distribution of floating-point numbers on the real line.
Computer Architecture, The Arithmetic/Logic Unit
Slide 77
Round-to-Nearest (Even)
rtnei(x)
rtni(x)
4
4
3
3
2
2
1
1
x
–4
–3
–2
–1
1
2
3
4
Oct. 2014
–4
–3
–2
–1
1
–1
–1
–2
–2
–3
–3
–4
–4
(a) Round to nearest even integer
Figure 12.2
x
2
3
(b) Round to nearest integer
Two round-to-nearest-integer functions for x in [–4, 4].
Computer Architecture, The Arithmetic/Logic Unit
Slide 78
4
Directed Rounding
ritni(x)
rutni(x)
4
4
3
3
2
2
1
1
x
–4
–3
–2
–1
1
2
3
4
x
–4
–3
–2
–1
1
–1
–1
–2
–2
–3
–3
–4
–4
(a) Round inward to nearest integer
2
3
4
(b) Round upward to nearest integer
Figure 12.3 Two directed round-to-nearest-integer functions for x in [–4, 4].
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 79
12.2 Special Values and Exceptions
Zeros, infinities, and NaNs (not a number)
 0 Biased exponent = 0, significand = 0 (no hidden 1)
  Biased exponent = 255 (short) or 2047 (long), significand = 0
NaN Biased exponent = 255 (short) or 2047 (long), significand  0
Arithmetic operations with special operands
(+0) + (+0) = (+0) – (–0) = +0
(+0)  (+5) = +0
(+0) / (–5) = –0
(+) + (+) = +
x – (+) = –
(+)  x = , depending on the sign of x
x / (+) = 0, depending on the sign of x
(+) = +
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 80
Exceptions
Undefined results lead to NaN (not a number)
(0) / (0) = NaN
(+) + (–) = NaN
(0)  () = NaN
() / () = NaN
Arithmetic operations and comparisons with NaNs
NaN + x = NaN
NaN + NaN = NaN
NaN  0 = NaN
NaN  NaN = NaN
NaN < 2  false
NaN = Nan  false
NaN  (+)  true
NaN  NaN  true
Examples of invalid-operation exceptions
Multiplication:
Division:
Square-root:
Oct. 2014
(+) + (–)
0
0 / 0 or  / 
Operand < 0
Computer Architecture, The Arithmetic/Logic Unit
Slide 81
(2e1s1) + (2e1(s2 / 2e1–e2)) = 2e1(s1  s2 / 2e1–e2)
(2e2s2)
x = 25  1.00101101
y = 21  1.11101101
Operand with
smaller exponent
to be preshifted
Operands after alignment shift:
x = 25  1.00101101
y = 25  0.000111101101
s = 25  1.010010111101
s = 25  1.01001100
Figure 12.4
Oct. 2014
Extra bits to be
rounded off
Rounded sum
Alignment shift and rounding in floating-point addition.
Computer Architecture, The Arithmetic/Logic Unit
Slide 82
Input 1
Hardware for
Floating-Point
Input 2
Unpack
Signs Exponents
Significands
Mu x
Sub
Possible swap
& complement
Align
significands
Control
& sign
logic
Normalize
& round
Figure 12.5
Simplified schematic of

Sign
Exponent
Significand
Pack
Output
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 83
12.4 Other Floating-Point Operations
Overflow
(underflow)
possible
Floating-point multiplication
(2e1s1)  (2e2s2) = 2e1+ e2(s1  s2)
Product of significands in [1, 4)
If product is in [2, 4), halve to normalize (increment exponent)
Overflow
(underflow)
possible
Floating-point division
(2e1s1) / (2e2s2) = 2e1– e2(s1 / s2)
Ratio of significands in (1/2, 2)
If ratio is in (1/2, 1), double to normalize (decrement exponent)
Floating-point square-rooting
(2es)1/2 = 2e/2(s)1/2
= 2(e–1)2(2s)1/2
Normalization not needed
Oct. 2014
when e is even
when e is odd
Computer Architecture, The Arithmetic/Logic Unit
Slide 84
Hardware for
Floating-Point
Multiplication
and Division
Input 1
Input 2
Unpack
Signs Exponents
Significands
MulDiv

Multiply
or divide
Control
& sign
logic
Normalize
& round

Figure 12.6 Simplified
schematic of a floatingpoint multiply/divide unit.
Sign
Exponent
Significand
Pack
Output
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 85
12.5 Floating-Point Instructions
Floating-point arithmetic instructions for MiniMIPS:
sub.d
mul.d
div.s
neg.s
31
F
op
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8,\$f10
\$f0,\$f8
25
ex
20
#
#
#
#
#
ft
set
set
set
set
set
15
\$f0
\$f0
\$f0
\$f0
\$f0
fs
to
to
to
to
to
10
(\$f8) +fp
(\$f8) –fp
(\$f8) fp
(\$f8) /fp
–(\$f8)
(\$f10)
(\$f10)
(\$f10)
(\$f10)
fd
fn
5
0
0 1 0 0 0 1 0 0 0 0 x 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 x x x
Floating-point
instruction
s=0
d=1
Source
register 2
Source
register 1
Destination
register
sub.* = 1
mul.* = 2
div.* = 3
neg.* = 7
Figure 12.7 The common floating-point instruction format for
MiniMIPS and components for arithmetic instructions. The extension
(ex) field distinguishes single (* = s) from double (* = d) operands.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 86
The Floating-Point Unit in MiniMIPS
...
m  2 32
Loc 0 Loc 4 Loc 8
4 B / location
Memory
up to 2 30 words
Loc
Loc
m 8 m 4
Coprocessor 1
...
EIU
Execution
& integer
unit
\$0
\$1
\$2
(Main proc.)
\$31
Integer
mul/div
ALU
Hi
FPU
(Coproc. 1)
FP
arith
\$0
\$1
\$2
Floatingpoint unit
\$31
Pairs of registers,
beginning with an
even-numbered
one, are used for
double operands
Lo
TMU
Chapter
10
Chapter
11
Figure 5.1
Oct. 2014
Chapter
12
(Coproc. 0) Status memory
Cause unit
EPC
Memory and processing subsystems for MiniMIPS.
Computer Architecture, The Arithmetic/Logic Unit
Slide 87
Floating-Point Format Conversions
MiniMIPS instructions for number format conversion:
cvt.s.w
cvt.d.w
cvt.d.s
cvt.s.d
cvt.w.s
cvt.w.d
31
F
op
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
25
ex
#
#
#
#
#
#
20
set
set
set
set
set
set
ft
\$f0
\$f0
\$f0
\$f0
\$f0
\$f0
15
to
to
to
to
to
to
fs
single(integer \$f8)
double(integer \$f8)
double(\$f8)
single(\$f8,\$f9)
integer(\$f8)
integer(\$f8,\$f9)
10
fd
5
fn
0
0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 x x x
Floating-point
instruction
Figure 12.8
Oct. 2014
*.w = 0
w.s = 0
w.d = 1
*.* = 1
Unused
Source
register
Destination
register
To format:
s = 32
d = 33
w = 36
Floating-point instructions for format conversion in MiniMIPS.
Computer Architecture, The Arithmetic/Logic Unit
Slide 88
Floating-Point Data Transfers
MiniMIPS instructions for floating-point load, store, and move:
lwc1
swc1
mov.s
mov.d
mfc1
mtc1
31
F
\$f8,40(\$s3)
\$f8,A(\$s3)
\$f0,\$f8
\$f0,\$f8
\$t0,\$f12
\$f8,\$t4
op
25
20
ft
store (\$f8) into mem[A+(\$s3)]
15
fs
10
fd
5
fn
0
0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
31
R
ex
#
#
#
#
#
#
op
s=0
d=1
25
rs
Unused
20
rt
Source
register
15
rd
Destination
register
10
sh
mov.* = 6
5
fn
0
0 1 0 0 0 1 0 0 x 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Floating-point
instruction
Figure 12.9
Oct. 2014
mfc1 = 0
mtc1 = 4
Source
register
Destination
register
Unused
Unused
Instructions for floating-point data movement in MiniMIPS.
Computer Architecture, The Arithmetic/Logic Unit
Slide 89
Floating-Point Branches and Comparisons
MiniMIPS instructions for floating-point load, store, and move:
bc1t
bc1f
c.eq.*
c.lt.*
c.le.*
31
I
L
L
\$f0,\$f8
\$f0,\$f8
\$f0,\$f8
op
25
20
branch on fp flag true
branch on fp flag false
if (\$f0)=(\$f8), set flag to “true”
if (\$f0)<(\$f8), set flag to “true”
if (\$f0)(\$f8), set flag to “true”
rt
operand / offset
15
0
0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
Floating-point
instruction
31
F
rs
#
#
#
#
#
op
bc1? = 8
25
ex
true = 1
false = 0
20
ft
Offset
Correction: 1 1 x x x 0
15
fs
10
fd
5
fn
0
0 1 0 0 0 1 0 0 0 0 x 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0
Floating-point
instruction
Figure 12.10
Oct. 2014
s=0
d=1
Source
register 2
Source
register 1
Unused
c.eq.* = 50
c.lt.* = 60
c.le.* = 62
Floating-point branch and comparison instructions in MiniMIPS.
Computer Architecture, The Arithmetic/Logic Unit
Slide 90
Floating-Point
Instructions of
MiniMIPS
Copy
Table 12.1
Arithmetic
*
s/d for single/double
# 0/1 for single/double
Conversions
Memory access
Control transfer
Oct. 2014
Instruction
Usage
Move s/d registers
Move fm coprocessor 1
Move to coprocessor 1
Subtract single/double
Multiply single/double
Divide single/double
Negate single/double
Compare equal s/d
Compare less s/d
Compare less or eq s/d
Convert integer to single
Convert integer to double
Convert single to double
Convert double to single
Convert single to integer
Convert double to integer
Store word coprocessor 1
Branch coproc 1 true
Branch coproc 1 false
mov.* fd,fs
mfc1 rt,rd
mtc1 rd,rt
sub.* fd,fs,ft
mul.* fd,fs,ft
div.* fd,fs,ft
neg.* fd,fs
c.eq.* fs,ft
c.lt.* fs,ft
c.le.* fs,ft
cvt.s.w fd,fs
cvt.d.w fd,fs
cvt.d.s fd,fs
cvt.s.d fd,fs
cvt.w.s fd,fs
cvt.w.d fd,fs
lwc1 ft,imm(rs)
swc1 ft,imm(rs)
bc1t L
bc1f L
Computer Architecture, The Arithmetic/Logic Unit
ex fn
#
0
4
#
#
#
#
#
#
#
#
0
0
1
1
0
1
rs
rs
8
8
Slide 91
6
0
1
2
3
7
50
60
62
32
33
33
32
36
36
12.6 Result Precision and Errors
Example 12.4
Laws of algebra may not hold in floating-point arithmetic. For example,
the following computations show that the associative law of addition,
(a + b) + c = a + (b + c), is violated for the three numbers shown.
a =-25  1.10101011
b = 25  1.10101110
b = 25  1.10101110
c =-22  1.01100101
Compute a + b
25  0.00000011
a+b = 22  1.10000000
c =-22  1.01100101
Compute b + c (after preshifting c)
25  1.101010110011011
b+c = 25  1.10101011 (Round)
a =-25  1.10101011
Compute (a + b) + c
22  0.00011011
Sum = 26  1.10110000
Compute a + (b + c)
25  0.00000000
Sum = 0 (Normalize to special code for 0)
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 92
Error Control and Certifiable Arithmetic
Catastrophic cancellation in subtracting almost equal numbers:
Area of a needlelike triangle
A = [s(s – a)(s – b)(s – c)]1/2
c
b
a
Possible remedies
Carry extra precision in intermediate results (guard digits):
commonly used in calculators
Use alternate formula that does not produce cancellation errors
Certifiable arithmetic with intervals
A number is represented by its lower and upper bounds [xl, xu]
Example of arithmetic: [xl, xu] +interval [yl, yu] = [xl +fp yl, xu +fp yu]
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 93
Evaluation of Elementary Functions
Approximating polynomials
ln x = 2(z + z3/3 + z5/5 + z7/7 + . . . ) where z = (x – 1)/(x + 1)
ex = 1 + x/1! + x2/2! + x3/3! + x4/4! + . . .
cos x = 1 – x2/2! + x4/4! – x6/6! + x8/8! – . . .
tan–1 x = x – x3/3 + x5/5 – x7/7 + x9/9 – . . .
Iterative (convergence) schemes
For example, beginning with an estimate for x1/2, the following
iterative formula provides a more accurate estimate in each step
q(i+1) = 0.5(q(i) + x/q(i))
Table lookup (with interpolation)
A pure table lookup scheme results in huge tables (impractical);
hence, often a hybrid approach, involving interpolation, is used.
Oct. 2014
Computer Architecture, The Arithmetic/Logic Unit
Slide 94
Function Evaluation by Table Lookup
h bits
xH
Input x
k - h bits
xL
xL
Table
for a
f(x)
Table
for b
Best linear
approximation
in subinterval
Multiply
x
xH
Output
Figure 12.12
Oct. 2014
f(x)
The linear approximation above is
characterized by the line equation
a + b x L , where a and b are
read out from tables based on x H
Function evaluation by table lookup and linear interpolation.
Computer Architecture, The Arithmetic/Logic Unit
Slide 95
```