Report

Part III The Arithmetic/Logic Unit Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 1 About This Presentation This presentation is intended to support the use of the textbook Computer Architecture: From Microprocessors to Supercomputers, Oxford University Press, 2005, ISBN 0-19-515455-X. It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised Revised First July 2003 July 2004 July 2005 Mar. 2006 Jan. 2007 Jan. 2008 Jan. 2009 Jan. 2011 Oct. 2014 Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 2 III The Arithmetic/Logic Unit Overview of computer arithmetic and ALU design: • Review representation methods for signed integers • Discuss algorithms & hardware for arithmetic ops • Consider floating-point representation & arithmetic Topics in This Part Chapter 9 Number Representation Chapter 10 Adders and Simple ALUs Chapter 11 Multipliers and Dividers Chapter 12 Floating-Point Arithmetic Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 3 Preview of Arithmetic Unit in the Data Path Incr PC Next addr jta Next PC ALUOvfl (PC) PC (rs) rs rt Instr cache inst 0 1 2 rd 31 imm op Br&Jump Instruction fetch Fig. 13.3 Oct. 2014 Register writeback Ovfl Reg file ALU (rt) / 16 ALU out Data cache Data out Data in Func 0 32 SE / 1 Data addr 0 1 2 Register input fn RegDst RegWrite Reg access / decode ALUSrc ALUFunc ALU operation DataRead RegInSrc DataWrite Data access Key elements of the single-cycle MicroMIPS data path. Computer Architecture, The Arithmetic/Logic Unit Slide 4 Computer Arithmetic as a Topic of Study Brief overview article – Encyclopedia of Info Systems, Academic Press, 2002, Vol. 3, pp. 317-333 Our textbook’s treatment of the topic falls between the extremes (4 chaps.) Graduate course ECE 252B – Text: Computer Arithmetic, Oxford U Press, 2000 (2nd ed., 2010) Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 5 9 Number Representation Arguably the most important topic in computer arithmetic: • Affects system compatibility and ease of arithmetic • Two’s complement, flp, and unconventional methods Topics in This Chapter 9.1 Positional Number Systems 9.2 Digit Sets and Encodings 9.3 Number-Radix Conversion 9.4 Signed Integers 9.5 Fixed-Point Numbers 9.6 Floating-Point Numbers Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 6 9.1 Positional Number Systems Representations of natural numbers {0, 1, 2, 3, …} ||||| ||||| ||||| ||||| ||||| || 27 11011 XXVII sticks or unary code radix-10 or decimal code radix-2 or binary code Roman numerals Fixed-radix positional representation with k digits k–1 Value of a number: x = (xk–1xk–2 . . . x1x0)r = S xi r i i=0 For example: 27 = (11011)two = (124) + (123) + (022) + (121) + (120) Number of digits for [0, P]: k = logr (P + 1) = logr P + 1 Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 7 Unsigned Binary Integers 0000 1111 1110 15 0 0001 1 14 0010 2 1101 0011 13 1100 12 1011 Turn x notches counterclockwise to add x 3 Inside: Natural number Outside: 4-bit encoding 11 4 5 10 1010 0100 0101 12 11 10 15 1 2 0 3 4 5 9 8 7 6 6 9 1001 14 13 8 1000 7 0110 0111 Turn y notches clockwise to subtract y Figure 9.1 Schematic representation of 4-bit code for integers in [0, 15]. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 8 Representation Range and Overflow Overflow region max max Overflow region Numbers smaller than max Numbers larger than max Finite set of representable numbers Figure 9.2 Overflow regions in finite number representation systems. For unsigned representations covered in this section, max – = 0. Example 9.2, Part d Discuss if overflow will occur when computing 317 – 316 in a number system with k = 8 digits in radix r = 10. Solution The result 86 093 442 is representable in the number system which has a range [0, 99 999 999]; however, if 317 is computed en route to the final result, overflow will occur. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 9 9.2 Digit Sets and Encodings Conventional and unconventional digit sets Decimal digits in [0, 9]; 4-bit BCD, 8-bit ASCII Hexadecimal, or hex for short: digits 0-9 & a-f Conventional ternary digit set in [0, 2] Conventional digit set for radix r is [0, r – 1] Symmetric ternary digit set in [–1, 1] Conventional binary digit set in [0, 1] Redundant digit set [0, 2], encoded in 2 bits ( 0 2 1 1 0 )two and ( 1 0 1 0 2 )two represent 22 Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 10 Carry-Save Numbers Radix-2 numbers using the digits 0, 1, and 2 Example: (1 0 2 1)two = (123) + (022) + (221) + (120) = 13 Possible encodings (a) Binary (b) Unary 0 1 2 0 1 1 2 MSB LSB Oct. 2014 00 01 10 11 (Unused) 1 0 2 1 0 0 1 0 = 2 1 0 0 1 = 9 00 01 (First alternate) 10 (Second alternate) 11 First bit Second bit Computer Architecture, The Arithmetic/Logic Unit 1 0 2 1 0 0 1 1 = 3 1 0 1 0 = 10 Slide 11 The Notion of Carry-Save Addition Digit-set combination: {0, 1, 2} + {0, 1} = {0, 1, 2, 3} = {0, 2} + {0, 1} This bit being 1 represents overflow (ignore it) Carry-save input Carry-save addition Two carry-save inputs Binary input Carry-save 0 output 0 Carry-save addition 0 a. Carry-save addition. b. Adding two carry-save numbers. Figure 9.3 Adding a binary number or another carry-save number to a carry-save number. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 12 9.3 Number Radix Conversion Two ways to convert numbers from an old radix r to a new radix R Perform arithmetic in the new radix R Suitable for conversion from radix r to radix 10 Horner’s rule: (xk–1xk–2 . . . x1x0)r = (…((0 + xk–1)r + xk–2)r + . . . + x1)r + x0 (1 0 1 1 0 1 0 1)two = 0 + 1 1 2 + 0 2 2 + 1 5 2 + 1 11 2 + 0 22 2 + 1 45 2 + 0 90 2 + 1 181 Perform arithmetic in the old radix r Suitable for conversion from radix 10 to radix R Divide the number by R, use the remainder as the LSD and the quotient to repeat the process 19 / 3 rem 1, quo 6 / 3 rem 0, quo 2 / 3 rem 2, quo 0 Thus, 19 = (2 0 1)three Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 13 Justifications for Radix Conversion Rules ( xk 1xk 2 x0 )r xk 1r k 1 xk 2r k 2 x1r x0 x0 r ( x1 r ( x2 r ( ))) Justifying Horner’s rule. x Binary representation of x/2 Figure 9.4 Oct. 2014 0 x mod 2 Justifying one step of the conversion of x to radix 2. Computer Architecture, The Arithmetic/Logic Unit Slide 14 9.4 Signed Integers We dealt with representing the natural numbers Signed or directed whole numbers = integers { . . . , 3, 2, 1, 0, 1, 2, 3, . . . } Signed-magnitude representation +27 in 8-bit signed-magnitude binary code 0 0011011 –27 in 8-bit signed-magnitude binary code 1 0011011 –27 in 2-digit decimal code with BCD digits 1 0010 0111 Biased representation Represent the interval of numbers [N, P] by the unsigned interval [0, P + N]; i.e., by adding N to every number Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 15 Two’s-Complement Representation With k bits, numbers in the range [–2k–1, 2k–1 – 1] represented. Negation is performed by inverting all bits and adding 1. 0000 1111 –1 1110 +0 0001 +1 0010 –2 +2 1101 0011 –3 1011 + _ –4 1100 Turn x notches counterclockwise to add x –5 +3 +4 +5 –6 1001 0100 –4 –5 –6 0101 –1 1 2 0 3 4 5 –7 –8 7 6 +6 –7 1010 –2 –3 –8 1000 +7 0110 0111 Turn 16 – y notches counterclockwise to add –y (subtract y) Figure 9.5 Schematic representation of 4-bit 2’s-complement code for integers in [–8, +7]. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 16 Conversion from 2’s-Complement to Decimal Example 9.7 Convert x = (1 0 1 1 0 1 0 1)2’s-compl to decimal. Solution Given that x is negative, one could change its sign and evaluate –x. Shortcut: Use Horner’s rule, but take the MSB as negative –1 2 + 0 –2 2 + 1 –3 2 + 1 –5 2 + 0 –10 2 + 1 –19 2 + 0 –38 2 + 1 –75 Sign Change for a 2’s-Complement Number Example 9.8 Given y = (1 0 1 1 0 1 0 1)2’s-compl, find the representation of –y. Solution –y = (0 1 0 0 1 0 1 0) + 1 = (0 1 0 0 1 0 1 1)2’s-compl Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit (i.e., 75) Slide 17 Two’s-Complement Addition and Subtraction x k / c in Adder y k / k / k / xy c out y or y AddSub Figure 9.6 Oct. 2014 Binary adder used as 2’s-complement adder/subtractor. Computer Architecture, The Arithmetic/Logic Unit Slide 18 9.5 Fixed-Point Numbers Positional representation: k whole and l fractional digits Value of a number: x = (xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l )r = S xi r i For example: 2.375 = (10.011)two = (121) + (020) + (021) + (122) + (123) Numbers in the range [0, rk – ulp] representable, where ulp = r –l Fixed-point arithmetic same as integer arithmetic (radix point implied, not explicit) Two’s complement properties (including sign change) hold here as well: (01.011)2’s-compl = (–021) + (120) + (02–1) + (12–2) + (12–3) = +1.375 (11.011)2’s-compl = (–121) + (120) + (02–1) + (12–2) + (12–3) = –0.625 Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 19 Fixed-Point 2’s-Complement Numbers 1.111 –.125 1.110 0.000 +0 –.25 0.001 +.125 0.010 +.25 1.101 0.011 –.375 1.011 + _ –.5 1.100 +.5 0.100 +.625 –.625 0.101 +.75 –.75 1.010 +.375 –.875 1.001 –1 1.000 +.875 0.110 0.111 Figure 9.7 Schematic representation of 4-bit 2’s-complement encoding for (1 + 3)-bit fixed-point numbers in the range [–1, +7/8]. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 20 Radix Conversion for Fixed-Point Numbers Convert the whole and fractional parts separately. To convert the fractional part from an old radix r to a new radix R: Perform arithmetic in the new radix R Evaluate a polynomial in r –1: (.011)two = 0 2–1 + 1 2–2 + 1 2–3 Simpler: View the fractional part as integer, convert, divide by r l (.011)two = (?)ten Multiply by 8 to make the number an integer: (011)two = (3)ten Thus, (.011)two = (3 / 8)ten = (.375)ten Perform arithmetic in the old radix r Multiply the given fraction by R, use the whole part as the MSD and the fractional part to repeat the process (.72)ten = (?)two 0.72 2 = 1.44, so the answer begins with 0.1 0.44 2 = 0.88, so the answer begins with 0.10 Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 21 9.6 Floating-Point Numbers Useful for applications where very large and very small numbers are needed simultaneously Fixed-point representation must sacrifice precision for small values to represent large values x = (0000 0000 . 0000 1001)two Small number y = (1001 0000 . 0000 0000)two Large number Neither y2 nor y / x is representable in the format above Floating-point representation is like scientific notation: 20 000 000 = 2 10 7 +0.000 000 007 = +7 10–9 Sign Oct. 2014 Significand Exponent base Exponent Computer Architecture, The Arithmetic/Logic Unit Also, 7E9 Slide 22 ANSI/IEEE Standard Floating-Point Format (IEEE 754) Revision (IEEE 754R) was completed in 2008: The revised version includes 16-bit and 128-bit binary formats, as well as 64- and 128-bit decimal formats Short (32-bit) format 8 bits, bias = 127, –126 to 127 23 bits for fractional part (plus hidden 1 in integer part) Sign Exponent 11 bits, bias = 1023, –1022 to 1023 Short exponent range is –127 to 128 but the two extreme values are reserved for special operands (similarly for the long format) Significand 52 bits for fractional part (plus hidden 1 in integer part) Long (64-bit) format Figure 9.8 Oct. 2014 The two ANSI/IEEE standard floating-point formats. Computer Architecture, The Arithmetic/Logic Unit Slide 23 Short and Long IEEE 754 Formats: Features Table 9.1 Some features of ANSI/IEEE standard floating-point formats Feature Word width in bits Significand in bits Significand range Exponent bits Exponent bias Zero (±0) Denormal Single/Short 32 23 + 1 hidden [1, 2 – 2–23] 8 127 e + bias = 0, f = 0 e + bias = 0, f ≠ 0 represents ±0.f 2–126 e + bias = 255, f = 0 e + bias = 255, f ≠ 0 e + bias [1, 254] e [–126, 127] represents 1.f 2e Double/Long 64 52 + 1 hidden [1, 2 – 2–52] 11 1023 e + bias = 0, f = 0 e + bias = 0, f ≠ 0 represents ±0.f 2–1022 e + bias = 2047, f = 0 e + bias = 2047, f ≠ 0 e + bias [1, 2046] e [–1022, 1023] represents 1.f 2e min 2–126 1.2 10–38 2–1022 2.2 10–308 max 2128 3.4 1038 21024 1.8 10308 Subnormal Infinity (∞) Not-a-number (NaN) Ordinary number Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 24 10 Adders and Simple ALUs Addition is the most important arith operation in computers: • Even the simplest computers must have an adder • An adder, plus a little extra logic, forms a simple ALU Topics in This Chapter 10.1 Simple Adders 10.2 Carry Propagation Networks 10.3 Counting and Incrementation 10.4 Design of Fast Adders 10.5 Logic and Shift Operations 10.6 Multifunction ALUs Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 25 10.1 Simple Adders Inputs x Outputs y 0 0 1 1 c 0 1 0 1 c 0 1 1 0 Digit-set interpretation: {0, 1} + {0, 1} = {0, 2} + {0, 1} HA s Outputs x y cin cout s 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 1 Figures 10.1/10.2 Oct. 2014 y s 0 0 0 1 Inputs x x cout y FA cin Digit-set interpretation: {0, 1} + {0, 1} + {0, 1} = {0, 2} + {0, 1} s Binary half-adder (HA) and full-adder (FA). Computer Architecture, The Arithmetic/Logic Unit Slide 26 Full-Adder Implementations x y HA c out HA c in x y c out s (a) FA built of two HAs x y c out 0 1 2 3 0 0 1 2 3 1 c in s (b) CMOS mux-based FA c in s (c) Two-level AND-OR FA Figure10.3 Full adder implemented with two half-adders, by means of two 4-input multiplexers, and as two-level gate network. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 27 Ripple-Carry Adder: Slow But Simple x31 y31 c32 cout c31 FA Oct. 2014 . . . y1 c2 x0 y0 c1 c0 FA FA s1 s0 Critical path s31 Figure 10.4 x1 cin Ripple-carry binary adder with 32-bit inputs and output. Computer Architecture, The Arithmetic/Logic Unit Slide 28 Carry Chains and Auxiliary Signals 15 14 13 12 ----------1 0 1 1 Bit positions 11 10 9 8 ----------0 1 1 0 7 6 5 4 ----------0 1 1 0 cout 0 1 0 1 1 0 0 1 1 1 \__________/\__________________/ 4 6 g = xy Oct. 2014 0 3 2 1 0 ----------1 1 1 0 0 0 0 1 1 cin \________/\____/ 3 2 Carry chains and their lengths Computer Architecture, The Arithmetic/Logic Unit p=xy Slide 29 Carry Chains Illustrated with Dominoes Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 30 10.2 Carry Propagation Networks gi pi Carry is: 0 0 1 1 annihilated or killed propagated generated (impossible) 0 1 0 1 g k1 p k1 xi g k2 p k2 yi gi = xi yi pi = xi yi g i+1 p i+1 gi pi ... ... g1 p1 g0 p0 c0 Carry network ck c k1 ... c k2 ci c i+1 ... c1 c0 si Figure 10.5 The main part of an adder is the carry network. The rest is just a set of gates to produce the g and p signals and the sum bits. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 31 Ripple-Carry Adder Revisited The carry recurrence: ci+1 = gi pi ci Latency of k-bit adder is roughly 2k gate delays: 1 gate delay for production of p and g signals, plus 2(k – 1) gate delays for carry propagation, plus 1 XOR gate delay for generation of the sum bits gk1 pk1 gk2 pk2 g1 p1 ... ck ck1 Figure 10.6 Oct. 2014 ck2 c2 c1 g0 p0 c0 The carry propagation network of a ripple-carry adder. Computer Architecture, The Arithmetic/Logic Unit Slide 32 The Complete Design of a Ripple-Carry Adder gi pi Carry is: 0 0 1 1 annihilated or killed propagated generated (impossible) 0 1 0 1 g k1 p k1 gk1 pk1 ck ck xi g k2 p k2 c k1 gi = xi yi pi = xi yi g i+1 p i+1 gi pi ... ... gk2 pk2 g1 ... ck1 yi ck2 c2 p1 g0 gk2 pk2 c Carry network c c c 1 ck ... c k2 p0 pk1 gk1 ci c i+1 g1 p1 ... 0 k1 g1 p1 k2 c2 c1 ... c1 g0 p0 g0 c0 p0 c0 c0 si Figure 10.6 (ripple-carry network) superimposed on Figure 10.5 (general structure of an adder). Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 33 First Carry Speed-Up Method: Carry Skip g4j+3 p4j+3 c4j+4 c4j+3 g4j+2 p4j+2 c4j+2 g4j+1 p4j+1 c4j+1 g4j p4j c4j One-way street Freeway Figures 10.7/10.8 A 4-bit section of a ripple-carry network with skip paths and the driving analogy. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 34 Mux-Based Skip Carry Logic g4j+3 p4j+3 g4j+2 p4j+2 g4j+1 p4j+1 g4j p4j Fig. 10.7 c4j+4 c4j+3 g4j+3 p4j+3 c4j+4 0 1c4j+4 p[4j, 4j+3] c4j+3 c4j+2 g4j+2 p4j+2 c4j+1 g4j+1 c4j+2 p4j+1 c4j g4j p4j c4j+1 c4j The carry-skip adder of Fig. 10.7 works fine if we begin with a clean slate, where all signals are 0s; otherwise, it will run into problems, which do not exist in this mux-based implementation Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 35 10.3 Counting and Incrementation k / k / Data in IncrInit 0 k / Count register D Q 1 k / c in _ C Q Adder / k Update a (Increment amount) Figure 10.9 Oct. 2014 c out Schematic diagram of an initializable synchronous counter. Computer Architecture, The Arithmetic/Logic Unit Slide 36 Circuit for Incrementation by 1 Figure 10.6 Substantially simpler than an adder gk1 pk1 ck xk1 ck 0g1 gk2 pk2 sk1 x1 0g0 p0 x0 ... ck1 ck2 c2 xk2 ck1 p1 x1 ck2 sk2 ... c2 s2 c0 c1 1 x0 c1 s1 s0 Figure 10.10 Carry propagation network and sum logic for an incrementer. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 37 10.4 Design of Fast Adders Carries can be computed directly without propagation For example, by unrolling the equation for c3, we get: c 3 = g2 p2 c 2 = g2 p2 g1 p2 p1 g0 p2 p1 p0 c 0 We define “generate” and “propagate” signals for a block extending from bit position a to bit position b as follows: g[a,b] = gb pb gb–1 pb pb–1 gb–2 . . . pb pb–1 … pa+1 ga p[a,b] = pb pb–1 . . . pa+1 pa Combining g and p signals for adjacent blocks: g[h,j] = g[i+1,j] p[i+1,j] g[h,i] p[h,j] = p[i+1,j] p[h,i] j i+1 i h [h, j] = [i + 1, j] ¢ [h, i] Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 38 Carries as Generate Signals for Blocks [ 0, i ] gi pi Carry is: 0 0 1 1 annihilated or killed propagated generated (impossible) 0 1 0 1 g k1 p k1 xi g k2 p k2 yi Assuming c0 = 0, we have ci = g [0,i –1] g i+1 p i+1 gi pi ... ... g1 p1 g0 p0 c0 Carry network ck c k1 Figure 10.5 Oct. 2014 ... c k2 ci c i+1 ... c1 c0 si Computer Architecture, The Arithmetic/Logic Unit Slide 39 Second Carry Speed-Up Method: Carry Lookahead [7, 7 ] [6, 6 ] ¢ [5, 5 ] [4, 4 ] [3, 3 ] ¢ [2, 2 ] [1, 1 ] ¢ [6, 7 ] [0, 0 ] g [1, 1] p [1, 1] g [0, 0] p [0, 0] ¢ [2, 3 ] [4, 5 ] [0, 1 ] ¢ ¢ [4, 7 ] [0, 3 ] ¢ ¢ ¢ [0, 7 ] [0, 6 ] ¢ [0, 5 ] [0, 4 ] [0, 3 ] ¢ [0, 2 ] g [0, 1] p [0, 1] [0, 1 ] [0, 0 ] Figure 10.11 Brent-Kung lookahead carry network for an 8-digit adder, along with details of one of the carry operator blocks. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 40 Recursive Structure of Brent-Kung Carry Network [7, 7] [6, 6] [5, 5] [4, 4] [3, 3] [2, 2] [1, 1] [0, 0] [7, 7 ] ¢ ¢ ¢ ¢ [6, 6 ] ¢ [5, 5 ] [4, 4 ] [3, 3 ] ¢ [2, 2 ] ¢ [6, 7 ] [2, 3 ] [0, 1 ] ¢ ¢ [4, 7 ] [0, 3 ] ¢ ¢ ¢ ¢ ¢ [0, 6] [0, 5] [0, 4] ¢ ¢ ¢ [0, 7 ] [0, 7] [0, 0 ] ¢ [4, 5 ] 4-input Brent-Kung carry network [1, 1 ] [0, 3] [0, 2] [0, 1] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ] [0, 0] Figure 10.12 Brent-Kung lookahead carry network for an 8-digit adder, with only its top and bottom rows of carry-operators shown. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 41 An Alternate Design: Kogge-Stone Network ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ ¢ c7 = g [0,6] c8 = g [0,7] c5 = g [0,4] c6 = g [0,5] ¢ c3 = g [0,2] c4 = g [0,3] c1 = g [0,0] c2 = g [0,1] Kogge-Stone lookahead carry network for an 8-digit adder. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 42 Brent-Kung vs. Kogge-Stone Carry Network [7, 7 ] [6, 6 ] [5, 5 ] ¢ [4, 4 ] ¢ [3, 3 ] [2, 2 ] [1, 1 ] ¢ [6, 7 ] [0, 0 ] g [1, 1] p [1, 1] g [0, 0] p [0, 0] ¢ [2, 3 ] [4, 5 ] [0, 1 ] ¢ ¢ [4, 7 ] [0, 3 ] ¢ ¢ ¢ [0, 7 ] [0, 6 ] ¢ [0, 5 ] [0, 4 ] ¢ [0, 3 ] [0, 2 ] g [0, 1] p [0, 1] [0, 1 ] 11 carry operators 4 levels Oct. 2014 [0, 0 ] 17 carry operators 3 levels Computer Architecture, The Arithmetic/Logic Unit Slide 43 Carry-Lookahead Logic with 4-Bit Block p i+3 g i+3 p i+2 g i+2 p i+1 g i+1 p i gi g [i, i+3] Intermeidte carries p [i, i+3] Block signal generation ci c i+3 c i+2 c i+1 Figure 10.13 Blocks needed in the design of carry-lookahead adders with four-way grouping of bits. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 44 Third Carry Speed-Up Method: Carry Select Allows doubling of adder width with a single-mux additional delay x c out Adder c in Version 0 of sum bits c out 0 0 1 s Figure 10.14 Oct. 2014 c y [a, b] Adder [a, b] c in Version 1 of sum bits a [a, b] 1 The lower a positions, (0 to a – 1) are added as usual Carry-select addition principle. Computer Architecture, The Arithmetic/Logic Unit Slide 45 10.5 Logic and Shift Operations Conceptually, shifts can be implemented by multiplexing Right-shifted values Left-shifted values 00...0, x[31] Right’Left Shift amount 5 00, x[30, 2] 0, x[31, 1] x[31, 0] 32 32 32 0 1 6-bit code specifying shift direction & amount Oct. 2014 x[30, 0], 0 x[31, 0] . . . 32 32 31 32 32 33 . . . 32 62 32 63 Multiplexer 6 Figure 10.15 2 x[0], 00...0 x[1, 0], 00...0 32 Multiplexer-based logical shifting unit. Computer Architecture, The Arithmetic/Logic Unit Slide 46 Arithmetic Shifts Purpose: Multiplication and division by powers of 2 sra $t0,$s1,2 srav $t0,$s1,$s0 op 31 R 25 20 rt 15 rd 10 sh 5 fn 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 ALU instruction op 31 R rs # $t0 ($s1) right-shifted by 2 # $t0 ($s1) right-shifted by ($s0) Unused 25 rs Source register 20 rt Destination register 15 rd Shift amount 10 sh sra = 3 5 fn 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 ALU instruction Figure 10.16 Oct. 2014 Amount register Source register Destination register Unused srav = 7 The two arithmetic shift instructions of MiniMIPS. Computer Architecture, The Arithmetic/Logic Unit Slide 47 Practical Shifting in Multiple Stages 00 01 10 11 No shift Logical left Logical right Arith right 0 x[31], x[31, 1] 0, x[31, 1] x[30, 0], 0 x[31, 0] 32 32 32 32 0 2 1 2 2 2 3 y[31, 0] 0 1 2 (0 or 2)-bit shift 3 2 z[31, 0] 3 Multiplexer 0 32 2 (a) Single-bit shifter Figure 10.17 Oct. 2014 1 (0 or 4)-bit shift 1 2 (0 or 1)-bit shift 3 (b) Shifting by up to 7 bits Multistage shifting in a barrel shifter. Computer Architecture, The Arithmetic/Logic Unit Slide 48 Bit Manipulation via Shifts and Logical Operations Bits 10-15 AND with mask to isolate a field: 0000 0000 0000 0000 1111 1100 0000 0000 Right-shift by 10 positions to move field to the right end of word The result word ranges from 0 to 63, depending on the field pattern 32-pixel (4 8) block of black-and-white image: Row 0 Row 1 Row 2 Row 3 Representation as 32-bit word: 1010 0000 0101 1000 0000 0110 0001 0111 Hex equivalent: 0xa0a80617 Figure 10.18 A 4 8 block of a black-and-white image represented as a 32-bit word. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 49 10.6 Multifunction ALUs Arith fn (add, sub, . . .) Operand 1 Arith unit 0 Result Operand 2 1 Logic unit Select fn type (logic or arith) Logic fn (AND, OR, . . .) General structure of a simple arithmetic/logic unit. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 50 ConstVar Shift function Constant 5 amount 0 Amount 5 1 5 Variable amount 2 00 01 10 11 No shift Logical left Logical right Arith right Shifter Function class 32 5 LSBs 00 01 10 11 An ALU for MiniMIPS Shift Set less Arithmetic Logic 2 Shifted y 0 x Adder y 0 or 1 c0 32 32 k / c 31 xy s MSB 32 2 32 Shorthand symbol for ALU 1 Control c 32 3 x Func AddSub s ALU Logic unit AND OR XOR NOR 00 01 10 11 Ovfl y 32input NOR Zero 2 Logic function Zero Ovfl Figure 10.19 A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 51 11 Multipliers and Dividers Modern processors perform many multiplications & divisions: • Encryption, image compression, graphic rendering • Hardware vs programmed shift-add/sub algorithms Topics in This Chapter 11.1 Shift-Add Multiplication 11.2 Hardware Multipliers 11.3 Programmed Multiplication 11.4 Shift-Subtract Division 11.5 Hardware Dividers 11.6 Programmed Division Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 52 11.1 Shift-Add Multiplication Multiplicand Multiplier x y Partial products bit-matrix y0 y 1 y2 y3 Product z Figure 11.1 x x x x 20 21 22 23 Multiplication of 4-bit numbers in dot notation. z(j+1) = (z(j) + yj x 2k) 2–1 with z(0) = 0 and z(k) = z |––– add –––| |–– shift right ––| Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 53 Binary and Decimal Multiplication Position 7 6 5 4 3 2 1 0 ========================= x24 1 0 1 0 y 0 0 1 1 ========================= z (0) 0 0 0 0 4 +y0x2 1 0 1 0 –––––––––––––––––––––––––– 2z (1) 0 1 0 1 0 (1) z 0 1 0 1 0 4 +y1x2 1 0 1 0 –––––––––––––––––––––––––– 2z (2) 0 1 1 1 1 0 (2) z 0 1 1 1 1 0 4 +y2x2 0 0 0 0 –––––––––––––––––––––––––– 2z (3) 0 0 1 1 1 1 0 (3) z 0 0 1 1 1 1 0 4 +y3x2 0 0 0 0 –––––––––––––––––––––––––– 2z (4) 0 0 0 1 1 1 1 0 (4) z 0 0 0 1 1 1 1 0 ========================= Figure 11.2 Oct. 2014 Example 11.1 Position 7 6 5 4 3 2 1 0 ========================= x104 3 5 2 8 y 4 0 6 7 ========================= z (0) 0 0 0 0 4 +y0x10 2 4 6 9 6 –––––––––––––––––––––––––– 10z (1) 2 4 6 9 6 (1) z 0 2 4 6 9 6 4 +y1x10 2 1 1 6 8 –––––––––––––––––––––––––– 10z (2) 2 3 6 3 7 6 (2) z 2 3 6 3 7 6 4 +y2x10 0 0 0 0 0 –––––––––––––––––––––––––– 10z (3) 0 2 3 6 3 7 6 (3) z 0 2 3 6 3 7 6 4 +y3x10 1 4 1 1 2 –––––––––––––––––––––––––– 10z (4) 1 4 3 4 8 3 7 6 (4) z 1 4 3 4 8 3 7 6 ========================= Step-by-step multiplication examples for 4-digit unsigned numbers. Computer Architecture, The Arithmetic/Logic Unit Slide 54 Two’s-Complement Multiplication Position 7 6 5 4 3 2 1 0 ========================= x24 1 0 1 0 y 0 0 1 1 ========================= z (0) 0 0 0 0 0 4 +y0x2 1 1 0 1 0 –––––––––––––––––––––––––– 2z (1) 1 1 0 1 0 (1) z 1 1 1 0 1 0 4 +y1x2 1 1 0 1 0 –––––––––––––––––––––––––– 2z (2) 1 0 1 1 1 0 (2) z 1 1 0 1 1 1 0 4 +y2x2 0 0 0 0 0 –––––––––––––––––––––––––– 2z (3) 1 1 0 1 1 1 0 (3) z 1 1 1 0 1 1 1 0 4 +(–y3x2 ) 0 0 0 0 0 –––––––––––––––––––––––––– 2z (4) 1 1 1 0 1 1 1 0 (4) z 1 1 1 0 1 1 1 0 ========================= Figure 11.3 Oct. 2014 Example 11.2 Position 7 6 5 4 3 2 1 0 ========================= x24 1 0 1 0 y 1 0 1 1 ========================= z (0) 0 0 0 0 0 4 +y0x2 1 1 0 1 0 –––––––––––––––––––––––––– 2z (1) 1 1 0 1 0 (1) z 1 1 1 0 1 0 4 +y1x2 1 1 0 1 0 –––––––––––––––––––––––––– 2z (2) 1 0 1 1 1 0 (2) z 1 1 0 1 1 1 0 4 +y2x2 0 0 0 0 0 –––––––––––––––––––––––––– 2z (3) 1 1 0 1 1 1 0 (3) z 1 1 1 0 1 1 1 0 4 +(–y3x2 ) 0 0 1 1 0 –––––––––––––––––––––––––– 2z (4) 0 0 0 1 1 1 1 0 (4) z 0 0 0 1 1 1 1 0 ========================= Step-by-step multiplication examples for 2’s-complement numbers. Computer Architecture, The Arithmetic/Logic Unit Slide 55 11.2 Hardware Multipliers Shift Multiplier y Doublewidth partial product z (j) Hi Shift Lo Multiplicand x 0 Mux 1 yj Enable Select c out Figure 11.4 Oct. 2014 Adder c in Add’Sub Hardware multiplier based on the shift-add algorithm. Computer Architecture, The Arithmetic/Logic Unit Slide 56 The Shift Part of Shift-Add From adder Sum cout /k Partial product /k–1 Multiplier /k–1 /k To adder yj Figure11.5 Shifting incorporated in the connections to the partial product register rather than as a separate phase. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 57 High-Radix Multipliers Multiplicand Multiplier x y 0, x, 2x, or 3x Product z Radix-4 multiplication in dot notation. z(j+1) = (z(j) + yj x 2k) 4–1 with z(0) = 0 and z(k/2) = z |––– add –––| Assume k even |–– shift right ––| Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 58 Tree Multipliers All partial products Several partial products ... ... Large tree of carry-save adders Logdepth Adder Logdepth Small tree of carry-save adders Adder Product Product (a) Full-tree multiplier Figure 11.6 Oct. 2014 (b) Partial-tree multiplier Schematic diagram for full/partial-tree multipliers. Computer Architecture, The Arithmetic/Logic Unit Slide 59 Array Multipliers x3 0 x2 0 x1 0 0 0 Carry-save 0 0 Carry-save input addition bit g1 ents ow e it) MA Binary 0 input MA s Carry-save MA 0 output 0 Figure 9.3a (Recalling a. Carry-save addition. carry-save addition) MA MA Two carry-save inputs y0 c MA MA 0 MA MA x0 0 MA MA MA MA MA y1 y2 z0 0 Carry-save Our original addition dot-notation z1 0 b. Adding two carry-save numbers. y3 z2 MA MA 0 z3 FA Figure 11.7 Oct. 2014 representing multiplication FA FA HA z7 z6 z5 Straightened dots to depict array multiplier to the left z4 Array multiplier for 4-bit unsigned operands. Computer Architecture, The Arithmetic/Logic Unit Slide 60 11.3 Programmed Multiplication MiniMIPS instructions related to multiplication mult multu mfhi mflo $s0,$s1 $s2,$s3 $t0 $t1 # # # # set set set set Hi,Lo to ($s0)($s1); signed Hi,Lo to ($s2)($s3); unsigned $t0 to (Hi) $t1 to (Lo) Example 11.3 Finding the 32-bit product of 32-bit integers in MiniMIPS Multiply; result will be obtained in Hi,Lo For unsigned multiplication: Hi should be all-0s and Lo holds the 32-bit result For signed multiplication: Hi should be all-0s or all-1s, depending on the sign bit of Lo Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 61 Emulating a Hardware Multiplier in Software Example 11.4 (MiniMIPS shift-add program for multiplication) Shift $a1 (multiplier Multiplier y y) $v0Doublewidth (Hi part of z)partial $v1product (Lo part zof(j)z) $t0 (carry-out) $a0Multiplicand (multiplicandx x) 0 Also, holds LSB of Hi during shift Shift Mux 1 $t1 (bit j of y) yj Enable Select $t2 (counter) c out Adder c in Add’Sub Part of the control in hardware Figure 11.8 Register usage for programmed multiplication superimposed on the block diagram for a hardware multiplier. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 62 Multiplication When There Is No Multiply Instruction Example 11.4 (MiniMIPS shift-add program for multiplication) shamu: move move addi mloop: move move srl subu subu beqz addu sltu noadd: move srl subu subu sll addu srl sll addu addi bne jr Oct. 2014 $v0,$zero $vl,$zero $t2,$zero,32 $t0,$zero $t1,$a1 $a1,1 $t1,$t1,$a1 $t1,$t1,$a1 $t1,noadd $v0,$v0,$a0 $t0,$v0,$a0 $t1,$v0 $v0,1 $t1,$t1,$v0 $t1,$t1,$v0 $t0,$t0,31 $v0,$v0,$t0 $v1,1 $t1,$t1,31 $v1,$v1,$t1 $t2,$t2,-1 $t2,$zero,mloop $ra # # # # # # # # # # # # # # # # # # # # # # # initialize Hi to 0 initialize Lo to 0 init repetition counter to 32 set c-out to 0 in case of no add copy ($a1) into $t1 halve the unsigned value in $a1 subtract ($a1) from ($t1) twice to obtain LSB of ($a1), or y[j], in $t1 no addition needed if y[j] = 0 add x to upper part of z form carry-out of addition in $t0 copy ($v0) into $t1 halve the unsigned value in $v0 subtract ($v0) from ($t1) twice to obtain LSB of Hi in $t1 carry-out converted to 1 in MSB of $t0 right-shifted $v0 corrected halve the unsigned value in $v1 LSB of Hi converted to 1 in MSB of $t1 right-shifted $v1 corrected decrement repetition counter by 1 if counter > 0, repeat multiply loop return to the calling program Computer Architecture, The Arithmetic/Logic Unit Slide 63 11.4 Shift-Subtract Division x Divisor Subtracted bit-matrix y z Dividend y 3 x 23 y 2 x 22 y 1 x 21 y 0 x 20 s Figure11.9 Quotient Remainder Division of an 8-bit number by a 4-bit number in dot notation. z(j) = 2z(j1) ykj x 2k with z(0) = z and z(k) = 2k s |shift| |–– subtract ––| Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 64 Integer and Fractional Unsigned Division Position 7 6 5 4 3 2 1 0 ========================= z 0 1 1 1 0 1 0 1 4 x2 1 0 1 0 ========================= z (0) 0 1 1 1 0 1 0 1 (0) 2z 0 1 1 1 0 1 0 1 4 –y3x2 1 0 1 0 y3=1 –––––––––––––––––––––––––– z (1) 0 1 0 0 1 0 1 (1) 2z 0 1 0 0 1 0 1 4 –y2x2 0 0 0 0 y2=0 –––––––––––––––––––––––––– z (2) 1 0 0 1 0 1 (2) 2z 1 0 0 1 0 1 4 –y1x2 1 0 1 0 y1=1 –––––––––––––––––––––––––– z (3) 1 0 0 0 1 (3) 2z 1 0 0 0 1 4 –y0x2 1 0 1 0 y0=1 –––––––––––––––––––––––––– z (4) 0 1 1 1 s 0 1 1 1 y 1 0 1 1 ========================= Figure 11.10 Oct. 2014 Example 11.5 Position –1 –2 –3 –4 –5 –6 –7 –8 ========================== z .1 4 3 5 1 5 0 2 x .4 0 6 7 ========================== z (0) .1 4 3 5 1 5 0 2 (0) 10z 1.4 3 5 1 5 0 2 –y–1x 1.2 2 0 1 y–1=3 ––––––––––––––––––––––––––– z (1) .2 1 5 0 5 0 2 (1) 10z 2.1 5 0 5 0 2 –y–2x 2.0 3 3 5 y–2=5 ––––––––––––––––––––––––––– z (2) .1 1 7 0 0 2 (2) 10z 1.1 7 0 0 2 –y–3x 0.8 1 3 4 y–3=2 ––––––––––––––––––––––––––– z (3) .3 5 6 6 2 (3) 10z 3.5 6 6 2 –y–4x 3.2 5 3 6 y–4=8 ––––––––––––––––––––––––––– z (4) .3 1 2 6 s .0 0 0 0 3 1 2 6 y .3 5 2 8 ========================== Division examples for binary integers and decimal fractions. Computer Architecture, The Arithmetic/Logic Unit Slide 65 Division with Same-Width Operands Position 7 6 5 4 3 2 1 0 ========================= z 0 0 0 0 1 1 0 1 4 x2 0 1 0 1 ========================= z (0) 0 0 0 0 1 1 0 1 (0) 2z 0 0 0 1 1 0 1 4 –y3x2 0 0 0 0 y3=0 –––––––––––––––––––––––––– z (1) 0 0 0 1 1 0 1 (1) 2z 0 0 1 1 0 1 4 –y2x2 0 0 0 0 y2=0 –––––––––––––––––––––––––– z (2) 0 0 1 1 0 1 (2) 2z 0 1 1 0 1 4 –y1x2 0 1 0 1 y1=1 –––––––––––––––––––––––––– z (3) 0 0 0 1 1 (3) 2z 0 0 1 1 4 –y0x2 1 0 1 0 y0=0 –––––––––––––––––––––––––– z (4) 0 0 1 1 s 0 0 1 1 y 0 0 1 0 ========================= Figure 11.11 Oct. 2014 Example 11.6 Position –1 –2 –3 –4 –5 –6 –7 –8 ========================== z .0 1 0 1 x .1 1 0 1 ========================== z (0) .0 1 0 1 (0) 2z 0.1 0 1 0 –y–1x 0.0 0 0 0 y–1=0 ––––––––––––––––––––––––––– z (1) .1 0 1 0 (1) 2z 1.0 1 0 0 –y–2x 0.1 1 0 1 y–2=1 ––––––––––––––––––––––––––– z (2) .0 1 1 1 (2) 2z 0.1 1 1 0 –y–3x 0.1 1 0 1 y–3=1 ––––––––––––––––––––––––––– z (3) .0 0 0 1 (3) 2z 0.0 0 1 0 –y–4x 0.0 0 0 0 y–4=0 ––––––––––––––––––––––––––– z (4) .0 0 1 0 s .0 0 0 0 0 0 1 0 y .0 1 1 0 ========================== Division examples for 4/4-digit binary integers and fractions. Computer Architecture, The Arithmetic/Logic Unit Slide 66 Signed Division Method 1 (indirect): strip operand signs, divide, set result signs Dividend z=5 z=5 z = –5 z = –5 Divisor x=3 x = –3 x=3 x = –3 Quotient y=1 y = –1 y = –1 y=1 Remainder s=2 s=2 s = –2 s = –2 Method 2 (direct 2’s complement): develop quotient with digits –1 and 1, chosen based on signs, convert to digits 0 and 1 Restoring division: perform trial subtraction, choose 0 for q digit if partial remainder negative Nonrestoring division: if sign of partial remainder is correct, then subtract (choose 1 for q digit) else add (choose –1) Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 67 11.5 Hardware Dividers Shift Quotient y Partial remainder z (j) (initially z) Hi Shift Lo Load Quotient digit selector Divisor x 0 Mux 1 yk – j Enable 1 Select c out Adder Trial difference Figure 11.12 Oct. 2014 c in 1 (Always subtract) Hardware divider based on the shift-subtract algorithm. Computer Architecture, The Arithmetic/Logic Unit Slide 68 The Shift Part of Shift-Subtract qk–j From adder /k Partial remainder /k Quotient /k MSB /k To adder Figure 11.13 Shifting incorporated in the connections to the partial remainder register rather than as a separate phase. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 69 High-Radix Dividers x Divisor y Quotient z Dividend 0, x, 2x, or 3x s Remainder Radix-4 division in dot notation. z(j) = 4z(j1) (yk2j+1 yk2j)two x 2k with z(0) = z and z(k/2) = 2ks |shift| Assume k even |––––––– subtract –––––––| Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 70 Array Dividers z7 y3 x3 z6 MS x1 MS x0 z5 MS b y2 x2 z4 MS MS z3 0 z2 d MS MS MS 0 z1 y1 MS MS MS MS Our original dot-notation for division 0 z0 y0 MS s3 MS s2 Figure 11.14 Oct. 2014 MS s1 MS 0 s0 Straightened dots to depict an array divider Array divider for 8/4-bit unsigned integers. Computer Architecture, The Arithmetic/Logic Unit Slide 71 11.6 Programmed Division MiniMIPS instructions related to division div divu mfhi mflo $s0,$s1 $s2,$s3 $t0 $t1 # # # # Lo = quotient, Hi = remainder unsigned version of division set $t0 to (Hi) set $t1 to (Lo) Example 11.7 Compute z mod x, where z (singed) and x > 0 are integers Divide; remainder will be obtained in Hi if remainder is negative, then add |x| to (Hi) to obtain z mod x else Hi holds z mod x Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 72 Emulating a Hardware Divider in Software Example 11.8 (MiniMIPS shift-add program for division) Shift $a1 (quotient Quotient y y) $v0 Partial (Hi part remainder of z) $v1 z (j) (Lo (initially part of z)z) Shift $t0 (MSB of Hi) yk – j Load $t1 (bit k j of y) $a0 Divisor (divisorx x) 0 1 Mux Enable 1 Quotient digit selector Select $t2 (counter) c out Adder Trial difference c in 1 (Always subtract) Part of the control in hardware Figure 11.15 Register usage for programmed division superimposed on the block diagram for a hardware divider. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 73 Division When There Is No Divide Instruction Example 11.7 (MiniMIPS shift-subtract program for division) shsdi: move move addi dloop: slt sll slt or sll sge or sll or beq subu nosub: addi bne move jr Oct. 2014 $v0,$a2 $vl,$a3 $t2,$zero,32 $t0,$v0,$zero $v0,$v0,1 $t1,$v1,$zero $v0,$v0,$t1 $v1,$v1,1 $t1,$v0,$a0 $t1,$t1,$t0 $a1,$a1,1 $a1,$a1,$t1 $t1,$zero,nosub $v0,$v0,$a0 $t2,$t2,-1 $t2,$zero,dloop $v1,$a1 $ra # # # # # # # # # # # # # # # # # # initialize Hi to ($a2) initialize Lo to ($a3) initialize repetition counter to 32 copy MSB of Hi into $t0 left-shift the Hi part of z copy MSB of Lo into $t1 move MSB of Lo into LSB of Hi left-shift the Lo part of z quotient digit is 1 if (Hi) x, or if MSB of Hi was 1 before shifting shift y to make room for new digit copy y[k-j] into LSB of $a1 if y[k-j] = 0, do not subtract subtract divisor x from Hi part of z decrement repetition counter by 1 if counter > 0, repeat divide loop copy the quotient y into $v1 return to the calling program Computer Architecture, The Arithmetic/Logic Unit Slide 74 Divider vs Multiplier: Hardware Similarities Shift Quotient y Partial remainder z (j) (initially z) Shift yk – j Multiplier y Doublewidth partial product z (j) Load Shift Shift Quotient digit selector Divisor x Enable 0 Mux 1 Multiplicand x 0 1 1 Mux yj Enable Select Select Figure 11.12 c out c Adder in Trial difference 0 x3 0 0 MA x2 0 0 MA x1 0 0 MA Figure 11.4 x0 0 z7 y3 MA Adder c out 1 (Always subtract) y0 x3 z6 MS x2 x1 z5 MS c in x0 z4 MS MS 0 MA MA MA MA y1 z0 y2 z1 y3 z2 0 MA MA MA MA MS MS MS MS MA MA MA MS MS MS MS MS z3 FA FA HA z7 z6 z5 Straightened dots to depict array multiplier to the left Figure 11.14 Oct. 2014 z4 0 Our original dot-notation for division 0 z0 y0 MS MS MS 0 FA 0 z1 y1 0 MA z3 z2 y2 Our original dot-notation representing multiplication Add’Sub s3 s2 s1 s0 Turn upside-down Computer Architecture, The Arithmetic/Logic Unit 0 Straightened dots to depict an array divider Figure 11.7 Slide 75 12 Floating-Point Arithmetic Floating-point is no longer reserved for high-end machines • Multimedia and signal processing require flp arithmetic • Details of standard flp format and arithmetic operations Topics in This Chapter 12.1 Rounding Modes 12.2 Special Values and Exceptions 12.3 Floating-Point Addition 12.4 Other Floating-Point Operations 12.5 Floating-Point Instructions 12.6 Result Precision and Errors Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 76 12.1 Rounding Modes Short (32-bit) format 8 bits, bias = 127, –126 to 127 23 bits for fractional part (plus hidden 1 in integer part) Sign Exponent 0, , NaN IEEE 754 Format Denormals: 0.f 2emin 1.f 2e Significand 11 bits, bias = 1023, –1022 to 1023 52 bits for fractional part (plus hidden 1 in integer part) Denormals allow graceful underflow Long (64-bit) format – Negative numbers max – FLP – min – Sparser Overflow region 0 Denser Positive numbers min + FLP + Denser Figure 12.1 Oct. 2014 Underflow example + Sparser Underflow regions Midway example max + Overflow region Typical example Overflow example Distribution of floating-point numbers on the real line. Computer Architecture, The Arithmetic/Logic Unit Slide 77 Round-to-Nearest (Even) rtnei(x) rtni(x) 4 4 3 3 2 2 1 1 x –4 –3 –2 –1 1 2 3 4 Oct. 2014 –4 –3 –2 –1 1 –1 –1 –2 –2 –3 –3 –4 –4 (a) Round to nearest even integer Figure 12.2 x 2 3 (b) Round to nearest integer Two round-to-nearest-integer functions for x in [–4, 4]. Computer Architecture, The Arithmetic/Logic Unit Slide 78 4 Directed Rounding ritni(x) rutni(x) 4 4 3 3 2 2 1 1 x –4 –3 –2 –1 1 2 3 4 x –4 –3 –2 –1 1 –1 –1 –2 –2 –3 –3 –4 –4 (a) Round inward to nearest integer 2 3 4 (b) Round upward to nearest integer Figure 12.3 Two directed round-to-nearest-integer functions for x in [–4, 4]. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 79 12.2 Special Values and Exceptions Zeros, infinities, and NaNs (not a number) 0 Biased exponent = 0, significand = 0 (no hidden 1) Biased exponent = 255 (short) or 2047 (long), significand = 0 NaN Biased exponent = 255 (short) or 2047 (long), significand 0 Arithmetic operations with special operands (+0) + (+0) = (+0) – (–0) = +0 (+0) (+5) = +0 (+0) / (–5) = –0 (+) + (+) = + x – (+) = – (+) x = , depending on the sign of x x / (+) = 0, depending on the sign of x (+) = + Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 80 Exceptions Undefined results lead to NaN (not a number) (0) / (0) = NaN (+) + (–) = NaN (0) () = NaN () / () = NaN Arithmetic operations and comparisons with NaNs NaN + x = NaN NaN + NaN = NaN NaN 0 = NaN NaN NaN = NaN NaN < 2 false NaN = Nan false NaN (+) true NaN NaN true Examples of invalid-operation exceptions Addition: Multiplication: Division: Square-root: Oct. 2014 (+) + (–) 0 0 / 0 or / Operand < 0 Computer Architecture, The Arithmetic/Logic Unit Slide 81 12.3 Floating-Point Addition (2e1s1) + (2e1(s2 / 2e1–e2)) = 2e1(s1 s2 / 2e1–e2) (2e2s2) Numbers to be added: x = 25 1.00101101 y = 21 1.11101101 Operand with smaller exponent to be preshifted Operands after alignment shift: x = 25 1.00101101 y = 25 0.000111101101 Result of addition: s = 25 1.010010111101 s = 25 1.01001100 Figure 12.4 Oct. 2014 Extra bits to be rounded off Rounded sum Alignment shift and rounding in floating-point addition. Computer Architecture, The Arithmetic/Logic Unit Slide 82 Input 1 Hardware for Floating-Point Addition Input 2 Unpack Signs Exponents Significands AddSub Mu x Sub Possible swap & complement Align significands Control & sign logic Add Normalize & round Figure 12.5 Simplified schematic of a floating-point adder. Sign Exponent Significand Pack Output Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 83 12.4 Other Floating-Point Operations Overflow (underflow) possible Floating-point multiplication (2e1s1) (2e2s2) = 2e1+ e2(s1 s2) Product of significands in [1, 4) If product is in [2, 4), halve to normalize (increment exponent) Overflow (underflow) possible Floating-point division (2e1s1) / (2e2s2) = 2e1– e2(s1 / s2) Ratio of significands in (1/2, 2) If ratio is in (1/2, 1), double to normalize (decrement exponent) Floating-point square-rooting (2es)1/2 = 2e/2(s)1/2 = 2(e–1)2(2s)1/2 Normalization not needed Oct. 2014 when e is even when e is odd Computer Architecture, The Arithmetic/Logic Unit Slide 84 Hardware for Floating-Point Multiplication and Division Input 1 Input 2 Unpack Signs Exponents Significands MulDiv Multiply or divide Control & sign logic Normalize & round Figure 12.6 Simplified schematic of a floatingpoint multiply/divide unit. Sign Exponent Significand Pack Output Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 85 12.5 Floating-Point Instructions Floating-point arithmetic instructions for MiniMIPS: add.s sub.d mul.d div.s neg.s 31 F op $f0,$f8,$f10 $f0,$f8,$f10 $f0,$f8,$f10 $f0,$f8,$f10 $f0,$f8 25 ex 20 # # # # # ft set set set set set 15 $f0 $f0 $f0 $f0 $f0 fs to to to to to 10 ($f8) +fp ($f8) –fp ($f8) fp ($f8) /fp –($f8) ($f10) ($f10) ($f10) ($f10) fd fn 5 0 0 1 0 0 0 1 0 0 0 0 x 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 x x x Floating-point instruction s=0 d=1 Source register 2 Source register 1 Destination register add.* = 0 sub.* = 1 mul.* = 2 div.* = 3 neg.* = 7 Figure 12.7 The common floating-point instruction format for MiniMIPS and components for arithmetic instructions. The extension (ex) field distinguishes single (* = s) from double (* = d) operands. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 86 The Floating-Point Unit in MiniMIPS ... m 2 32 Loc 0 Loc 4 Loc 8 4 B / location Memory up to 2 30 words Loc Loc m 8 m 4 Coprocessor 1 ... EIU Execution & integer unit $0 $1 $2 (Main proc.) $31 Integer mul/div ALU Hi FPU (Coproc. 1) FP arith $0 $1 $2 Floatingpoint unit $31 Pairs of registers, beginning with an even-numbered one, are used for double operands Lo TMU Chapter 10 Chapter 11 Figure 5.1 Oct. 2014 Chapter 12 BadVaddr Trap & (Coproc. 0) Status memory Cause unit EPC Memory and processing subsystems for MiniMIPS. Computer Architecture, The Arithmetic/Logic Unit Slide 87 Floating-Point Format Conversions MiniMIPS instructions for number format conversion: cvt.s.w cvt.d.w cvt.d.s cvt.s.d cvt.w.s cvt.w.d 31 F op $f0,$f8 $f0,$f8 $f0,$f8 $f0,$f8 $f0,$f8 $f0,$f8 25 ex # # # # # # 20 set set set set set set ft $f0 $f0 $f0 $f0 $f0 $f0 15 to to to to to to fs single(integer $f8) double(integer $f8) double($f8) single($f8,$f9) integer($f8) integer($f8,$f9) 10 fd 5 fn 0 0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 x x x Floating-point instruction Figure 12.8 Oct. 2014 *.w = 0 w.s = 0 w.d = 1 *.* = 1 Unused Source register Destination register To format: s = 32 d = 33 w = 36 Floating-point instructions for format conversion in MiniMIPS. Computer Architecture, The Arithmetic/Logic Unit Slide 88 Floating-Point Data Transfers MiniMIPS instructions for floating-point load, store, and move: lwc1 swc1 mov.s mov.d mfc1 mtc1 31 F $f8,40($s3) $f8,A($s3) $f0,$f8 $f0,$f8 $t0,$f12 $f8,$t4 op 25 20 ft load mem[40+($s3)] into $f8 store ($f8) into mem[A+($s3)] load $f0 with ($f8) load $f0,$f1 with ($f8,$f9) load $t0 with ($f12) load $f8 with ($t4) 15 fs 10 fd 5 fn 0 0 1 0 0 0 1 0 0 0 0 x 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Floating-point instruction 31 R ex # # # # # # op s=0 d=1 25 rs Unused 20 rt Source register 15 rd Destination register 10 sh mov.* = 6 5 fn 0 0 1 0 0 0 1 0 0 x 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Floating-point instruction Figure 12.9 Oct. 2014 mfc1 = 0 mtc1 = 4 Source register Destination register Unused Unused Instructions for floating-point data movement in MiniMIPS. Computer Architecture, The Arithmetic/Logic Unit Slide 89 Floating-Point Branches and Comparisons MiniMIPS instructions for floating-point load, store, and move: bc1t bc1f c.eq.* c.lt.* c.le.* 31 I L L $f0,$f8 $f0,$f8 $f0,$f8 op 25 20 branch on fp flag true branch on fp flag false if ($f0)=($f8), set flag to “true” if ($f0)<($f8), set flag to “true” if ($f0)($f8), set flag to “true” rt operand / offset 15 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 Floating-point instruction 31 F rs # # # # # op bc1? = 8 25 ex true = 1 false = 0 20 ft Offset Correction: 1 1 x x x 0 15 fs 10 fd 5 fn 0 0 1 0 0 0 1 0 0 0 0 x 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Floating-point instruction Figure 12.10 Oct. 2014 s=0 d=1 Source register 2 Source register 1 Unused c.eq.* = 50 c.lt.* = 60 c.le.* = 62 Floating-point branch and comparison instructions in MiniMIPS. Computer Architecture, The Arithmetic/Logic Unit Slide 90 Floating-Point Instructions of MiniMIPS Copy Table 12.1 Arithmetic * s/d for single/double # 0/1 for single/double Conversions Memory access Control transfer Oct. 2014 Instruction Usage Move s/d registers Move fm coprocessor 1 Move to coprocessor 1 Add single/double Subtract single/double Multiply single/double Divide single/double Negate single/double Compare equal s/d Compare less s/d Compare less or eq s/d Convert integer to single Convert integer to double Convert single to double Convert double to single Convert single to integer Convert double to integer Load word coprocessor 1 Store word coprocessor 1 Branch coproc 1 true Branch coproc 1 false mov.* fd,fs mfc1 rt,rd mtc1 rd,rt add.* fd,fs,ft sub.* fd,fs,ft mul.* fd,fs,ft div.* fd,fs,ft neg.* fd,fs c.eq.* fs,ft c.lt.* fs,ft c.le.* fs,ft cvt.s.w fd,fs cvt.d.w fd,fs cvt.d.s fd,fs cvt.s.d fd,fs cvt.w.s fd,fs cvt.w.d fd,fs lwc1 ft,imm(rs) swc1 ft,imm(rs) bc1t L bc1f L Computer Architecture, The Arithmetic/Logic Unit ex fn # 0 4 # # # # # # # # 0 0 1 1 0 1 rs rs 8 8 Slide 91 6 0 1 2 3 7 50 60 62 32 33 33 32 36 36 12.6 Result Precision and Errors Example 12.4 Laws of algebra may not hold in floating-point arithmetic. For example, the following computations show that the associative law of addition, (a + b) + c = a + (b + c), is violated for the three numbers shown. Numbers to be added first a =-25 1.10101011 b = 25 1.10101110 Numbers to be added first b = 25 1.10101110 c =-22 1.01100101 Compute a + b 25 0.00000011 a+b = 22 1.10000000 c =-22 1.01100101 Compute b + c (after preshifting c) 25 1.101010110011011 b+c = 25 1.10101011 (Round) a =-25 1.10101011 Compute (a + b) + c 22 0.00011011 Sum = 26 1.10110000 Compute a + (b + c) 25 0.00000000 Sum = 0 (Normalize to special code for 0) Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 92 Error Control and Certifiable Arithmetic Catastrophic cancellation in subtracting almost equal numbers: Area of a needlelike triangle A = [s(s – a)(s – b)(s – c)]1/2 c b a Possible remedies Carry extra precision in intermediate results (guard digits): commonly used in calculators Use alternate formula that does not produce cancellation errors Certifiable arithmetic with intervals A number is represented by its lower and upper bounds [xl, xu] Example of arithmetic: [xl, xu] +interval [yl, yu] = [xl +fp yl, xu +fp yu] Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 93 Evaluation of Elementary Functions Approximating polynomials ln x = 2(z + z3/3 + z5/5 + z7/7 + . . . ) where z = (x – 1)/(x + 1) ex = 1 + x/1! + x2/2! + x3/3! + x4/4! + . . . cos x = 1 – x2/2! + x4/4! – x6/6! + x8/8! – . . . tan–1 x = x – x3/3 + x5/5 – x7/7 + x9/9 – . . . Iterative (convergence) schemes For example, beginning with an estimate for x1/2, the following iterative formula provides a more accurate estimate in each step q(i+1) = 0.5(q(i) + x/q(i)) Table lookup (with interpolation) A pure table lookup scheme results in huge tables (impractical); hence, often a hybrid approach, involving interpolation, is used. Oct. 2014 Computer Architecture, The Arithmetic/Logic Unit Slide 94 Function Evaluation by Table Lookup h bits xH Input x k - h bits xL xL Table for a f(x) Table for b Best linear approximation in subinterval Multiply x xH Add Output Figure 12.12 Oct. 2014 f(x) The linear approximation above is characterized by the line equation a + b x L , where a and b are read out from tables based on x H Function evaluation by table lookup and linear interpolation. Computer Architecture, The Arithmetic/Logic Unit Slide 95