### Solving the Greatest Common Divisor Problem in Parallelx

```Solving the Greatest Common
Divisor Problem in Parallel
Derrick Coetzee
University of California, Berkeley
CS 273, Fall 2010, Prof. Satish Rao
The problem and serial solutions
• Given two positive integers (represented in binary),
find the large positive integer that divides both
• Let n will be the total number of bits in the inputs
• Solvable in O(n) divisions and O(n2) bit operations with
Euclidean algorithm:
– while b ≠ 0: (a, b) ← (b, a mod b)
• Matrix formulation:
• Schönhage uses this to solve in O(n log2n log log n)
• Brent & Kung (1982): reduce time of each step to
O(1) (O(n) bit complexity)
– α ← n; β ← n (where a,b ≤ 2n and a odd)
while b ≠ 0:
while b ≡ 0 (mod 2) { b ← b/2; β ← β − 1 }
if α ≥ β { swap(a,b); swap(α, β) }
if a+b ≡ 0 (mod 4): b ← ⌊(a+b)/2⌋
else: b ← ⌊(a−b)/2⌋
– Test “a+b ≡ 0 (mod 4)” uses only 2 lowest bits of sum,
enables effective pipelining of the steps
– Number of steps is still linear, requires n processors
Algorithm PM: Example
•
•
•
•
•
•
GCD(105, 30) (a = 11010012, b = 111102)
α ← 7; β ← 7
b ← 11112; β ← 6
swap (a ← 11112, b ← 11010012, α ← 6, β ← 7)
a + b ≡ 112 + 012 ≡ 002 (mod 4)
Compute trailing zeroes of new b and 2 more
bits:
– b ← ⌊(a+b)/2⌋ = (011112 + …010012)/2 = …1100
• b ← …112; β ← 5
• swap (a ← …112, b ← 11112, α ← 5, β ← 6)
Algorithm PM: Example
• a = …112, b = 11112, α = 5, β = 6
• a + b ≡ 112 + 112 ≡ 102 (mod 4)
• Compute LSB of new b while previous
iteration computes next bit of a in parallel:
– b' = ⌊(a−b)/2⌋ = …0
– a = …1112
• With another bit of a, can compute next bit of
b’
• Final result: a= 11112 , b’ = 0
Sublinear time GCD
• Kannan et al (1987): Break the problem into n/log
n steps, each of which takes log log n time and
eliminates log n bits of each input.
– CRCW, O(n log log n/log n) time, O(n2 log2n)
processors
– Idea: instead of a mod b, compute pa mod b for all
0≤p≤n
• If gcd(a,b) = d and 0 ≤ p ≤ n, then gcd(pa,b)/d ≤ n
• Pigeonhole principle gives first log n bits same for at least
two p
– gcd(a,b) = gcd(b,(p1a mod b)-(p2a mod b)) (ignoring factors ≤ n)
– Only need to know first log n bits to identify p1, p2 – can be
computed in log log n time
Sublinear time GCD: Example
• GCD(143, 221), n = 8
–
–
–
–
–
–
–
–
–
(0×143) mod 221 = 000…2
(1×143) mod 221 = 100…2
(2×143) mod 221 = 010…2
(3×143) mod 221 = 110…2
(4×143) mod 221 = 100…2
(5×143) mod 221 = 001…2
(6×143) mod 221 = 110…2
(7×143) mod 221 = 011…2
(8×143) mod 221 = 001…2
• (4×143) mod 221 − (1×143) mod 221 = 13
Sublinear time GCD: Using matrix form
• Problem: (p1a mod b) − (p2a mod b) takes log n
time
numbers
– Idea: Delay updates to (a, b) for log n steps, collecting
updates in a 2 × 2 matrix T, then compute T(a,b) in log
n time
– Results in n/log2n phases each taking log n time
– Each step performs constant number of log2n-bit
operations
• Ensures that each step still takes log log n time
– Time dominated by cost of the O(n/log n) steps
Getting rid of the log log n factor
• Chor and Goldreich (1990): Parallelize Brent & Kung’s
PM algorithm instead of Euclidean algorithm
– α ← n; β ← n
while b ≠ 0:
while b = 0 (mod 2) { b ← b/2; β ← β − 1 }
if α ≥ β { swap(a,b); swap(α, β) }
if a+b ≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋
– Idea: pack log n iterations of PM algorithm into each phase
• Each phase can be done in constant time
– Requires O(n/log n) time on O(n1+ε) processors
– Best-known algorithm
log n iterations in constant time
– α ← n; β ← n δ ← 0
while b ≠ 0:
while b = 0 (mod 2) { b ← b/2; β ← β − 1 δ ← δ + 1}
if α ≥ β δ ≥ 0 { swap(a,b); swap(α, β) δ ← −δ }
if a+b ≡ 0 (mod 4): b ← ⌊(a+b)/2⌋ else: b ← ⌊(a−b)/2⌋
– δ together with the last log n+1 bits of a and b determine
the transformation matrix of the next log n iterations
– Precompute a table – lookup, do the matrix multiplication
• Technicality: Need to treat |δ| ≤ log n and |δ| > log n differently
Multiplication in constant time
• How to multiply an n-bit number by a log n bit number
in constant time?
– Represent both numbers in base 2log n
– Precompute a multiplication table for this base
– Separate the larger number into a sum of two numbers,
each having zero for every other digit, e.g. 1234 = 1030 +
204
– Multiply both by the smaller number; no carries are
required
• Example: 1234 × 7 = (1030+204)×7 = 7210+1428 = 8638
– Finally, Chandra et al (1983) shows we can add two n-bit
numbers in O(1) time with O(n2) processors
• But this requires concurrent writes
Complexity perspective: an analogy
• P = NP?
– Open problem, but most practical problems have
either been shown to be in P or else NP-hard.
– Exceptions: integer factorization, graph isomorphism
• If integer factorization is NP-complete, NP = coNP
• If graph isomorphism is NP-complete, PH collapses to 2nd
level
• NC = P?
– Open problem, but most practical problems have
either been shown to be in NC or else P-hard.
– Exceptions: GCD
• What would GCD being P-complete imply?
Questions?
```