Report

Pearls of Functional Algorithm Design Chapter 2 Roger L. Costello July 2011 1 The Problem We Will Solve 2 Recurring Problem • Stock Market: each day I record the closing value of the DOW. Occasionally, I pick a date and ask, “How many days after this date has the stock market closed at a higher value?” • A more challenging question is, “Which day has the most number of following days where the stock market closed at a higher value?” 3 DOW: 12,324 12,214 12,390 1 2,400 12,367 12,380 12,310 Date: 6/1/11 6/2/11 6/3/11 6/6/11 6/7/11 6/8/11 6/9/11 12,330 12,340 6/10/11 6/13/11 Number of days that surpassed this day: 1 Number of days that surpassed this day: 2 Number of days that surpassed this day: 1 Number of days that surpassed this day: 1 Number of days that surpassed this day: 0 Number of days that surpassed this day: 1 Number of days that surpassed this day: 7 Number of days that surpassed this day: 6 4 Recurring Problem (cont.) • People’s Height: line up a bunch of people. Pick one person and ask, “How many of the following people are taller than this person?” • A more challenging question is, “Which person has the most number of following people that are taller?” 5 Height (inches): Person: 72 68 69 73 65 68 69 64 71 Tom John George Jim Pete Sam Bill Mike Shaun Number of persons that surpass this person’s height: 1 Number of persons that surpass this person’s height: 1 Number of persons that surpass this person’s height: 2 Number of persons that surpass this person’s height: 3 Number of persons that surpass this person’s height: 0 Number of persons that surpass this person’s height: 2 Number of persons that surpass this person’s height: 4 Number of persons that surpass this person’s height: 1 6 Recurring Problem (cont.) • Word Analysis: take a letter in a word and ask, “How many of the following letters are bigger (occurs later in the alphabet) than this letter?” • A more challenging question is, “Which letter has the most number of following letters that are bigger?” 7 Word: G E N E R A T I N G Number of letters that surpass this letter: 0 Number of letters that surpass this letter: 1 Number of letters that surpass this letter: 0 Number of letters that surpass this letter: 4 Number of letters that surpass this letter: 1 Number of letters that surpass this letter: 5 Number of letters that surpass this letter: 2 Number of letters that surpass this letter: 6 Number of letters that surpass this letter: 5 8 Problem Statement • Create a list of values. – Example: create a list of stock market values, or a list of people’s heights, or a list of letters. • Simple Problem: select one value in the list, and count the number of following values that surpass it. • Harder Problem: for every value in the list solve the simple problem; this produces a list of numbers; return the maximum number. – This is called the “surpasser problem” 9 Solve the Simple Problem • Let’s create a function that counts the number of surpassers of a value. • The function takes two arguments: 1. The value, x 2. A list, xs, that consists of all the values that follow x 10 Select the list items that are greater than 'G' E N E R A T I N G filter (>'G') ____ [N, R, T, I, N] 11 Count the selected list items [N, R, T, I, N] length ____ 5 Five items surpass “G”. That’s the answer! 12 scount • “scount” (surpasser count) is a user-defined function; it is the collection of functions shown on the previous two slides. scount scount scount x xs :: = Ord a => a -> [a] -> Int length (filter (>x) xs) 13 Solve the Harder Problem • We need to apply “scount” to each item in the list, producing a list of numbers; then take the maximum of the numbers. 14 Invoke “scount” multiple times "GENERATING" scount scount scount scount scount scount scount scount scount scount 'G' 'E' 'N' 'E' 'R' 'A' 'T' 'I' 'N' 'G' "ENERATING" "NERATING" "ERATING" "RATING" "ATING" "TING" "ING" "NG" "G" "" maximum: 5 6 2 5 1 4 0 1 0 0 6 15 tails • “tails” is a standard function. • It takes one argument, a list. • It returns a list of lists, i.e., a list of all items, then a list of all items but the first, then a list of all items but the first and second, etc. tails "GENERATING" ["GENERATING","ENERATING","NERATING",…,"G",""] 16 List Comprehension • Recall that “scount” takes as arguments a value, x, and a list consisting of its following items. • A list comprehension will be used to provide the arguments to “scount”: [scount z zs | z : zs <- tails xs] “For each list produced by the tails function, take its first item and the remaining items, and use them as arguments to the scount function.” 17 Set of surpasser counts "GENERATING" [scount z zs | z: zs <- tails ____] [5,6,2,5,1,4,0,1,0,0] 18 maximum surpasser count (msc) [5,6,2,5,1,4,0,1,0,0] maximum ____ 6 That’s the answer! 19 msc • “msc” (maximum surpasser count) is a userdefined function; it is the collection of functions shown on the previous two slides. msc msc msc xs :: = Ord a => [a] -> Int maximum [scount z zs | z : zs <- tails xs] 20 Here’s the Solution import List -- msc = maximum surpasser count msc msc xs :: = Ord a => [a] -> Int maximum [scount z zs | z : zs <- tails xs] scount scount x xs :: = Ord a => a -> [a] -> Int length (filter (>x) xs) 21 Time Requirements • With a list of length “n” the msc function shown on the previous slide takes on the order of n2 steps. • Here’s why: recall that n surpasser counts are generated (see slide 18). To generate the first surpasser count, we take the first list item and compare it against the remaining n-1 items. To generate the second surpasser count, we take the second list item and compare it against the remaining n-2 items. And so forth. So, the total number of comparisons is: (n-1) + (n-2) + … + 1 = n(n+1)/2, i.e., T(n) = O(n2) 22 Divide and Conquer Solution 23 The Key Concepts 1. Determine the maximum surpasser count (msc) of list ws. 2. Divide ws into two lists: ws xs + ys 3. Determine the scount of each value in xs and the scount of each value in ys. 4. Assume that xs and ys are sorted in increasing order and ys is of length n. 5. x is the first value in xs and it has an scount (within xs) of c. y is the first value in ys and it has an scount (within ys) of d. There are the two cases to consider: a) b) x < y: then the scount of x equals c + n (remember, ys is sorted, so if x < y then it is less than all n values in ys). x ≥ y: then the scount of y equals d (remember, xs and ys are sorted, so if x ≥ y then y is less than all values in xs and all values in ys). 24 The Simplest Example GE Split into xs and ys G E 25 GE ('G',0) ('E',0) The scount of 'G' in xs is zero and the scount of 'E' in ys is zero. 26 GE ('G',0) ('E',0) xs is sorted in increasing order and so is ys. Obviously. 27 GE ('G',0) ('E',0) Compare 'G' with 'E'. 'G' ≥ 'E' so 'E' must be the smallest value. Output 'E' then 'G'. 28 GE ('G',0) ('E',0) ('E',0) : ('G',0) 29 GE ('G',0) ('E',0) ('E',0) : ('G',0) These are the correct surpasser counts for GE. Furthermore, the resulting list is sorted! 30 Another Simple Example NE Split into xs and ys N E 31 NE ('N',0) ('E',0) The scount of 'N' in xs is zero and the scount of 'E' in ys is zero. 32 GE ('N',0) ('E',0) xs is sorted in increasing order and so is ys. Obviously. 33 GE ('N',0) ('E',0) Compare 'N' with 'E'. 'N' ≥ 'E' so 'E' must be the smallest value. Output 'E' then 'N'. 34 GE ('N',0) ('E',0) ('E',0) : ('N',0) 35 GE ('N',0) ('E',0) ('E',0) : ('N',0) These are the correct surpasser counts for NE. Furthermore, the resulting list is sorted! 36 A larger example GENE Split into xs and ys GE NE 37 GENE GE NE ('E',0) : ('G',0) ('E',0) : ('N',0) The previous slides showed how to process the two sublists. 38 GENE GE NE ('E',0) : ('G',0) ('E',0) : ('N',0) Compare 'E' with 'E'. 'E' ≥ 'E' so the right 'E' must be the smallest value. Output 'E' and process the remaining sub-lists. 39 GENE GE NE ('E',0) : ('G',0) ('N',0) Output: ('E', 0) 40 GENE GE NE ('E',0) : ('G',0) ('N',0) Compare 'E' with 'N'. 'E' < 'N' so all the values in ys must be surpassers of 'E'. Output 'E', but first increment its surpasser count by length ys. 41 GENE GE NE ('G',0) ('N',0) Output: ('E', 0) : ('E', 1) 42 GENE GE NE ('G',0) ('N',0) Compare 'G' with 'N'. 'G' < 'N' so all the values in ys must be surpassers of 'N'. Output 'G', but first increment its surpasser count by length ys. 43 GENE GE NE "" ('N',0) Output: ('E', 0) : ('E', 1) : ('G', 1) 44 GENE GE NE "" ('N',0) Output 'N'. 45 GENE GE NE "" "" Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0) 46 Surpasser Counts GENE Output: ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0) let zs = the list of second values in each pair msc = the maximum of zs 47 Terminology: table GENE ('E', 0) : ('E', 1) : ('G', 1) : ('N', 0) The result of processing is a list of pairs. The second value is the scount of the first value. This list of pairs is called a "table". The "table function" takes as its argument a list and returns a table. 48 Terminology: join GE ('N',0) ('E',0) ('E',0) : ('N',0) Processing two sub-lists to create one list is called "join". The "join function" takes as its arguments two lists, xs and ys, and returns a table. 49 Here's how to implement the table function table table (w:[]) table ws :: = = Ord a => [a] -> [(a,Int)] [(w, 0)] join (table xs) (table ys) where m = length (ws) n = m `div` 2 (xs,ys) = splitAt n (ws) "Process a list. If there is just one value in the list then its surpasser count is zero and return a list containing one pair, where the second value is zero. If there's more than one value in the list then divide the list in half, into xs and ys; get the table of xs and the table of ys (i.e., recurse) and then join those two tables." 50 Here's how to implement the join function join join [] tys join txs [] join xs@((x,c):txxs) ys@((y,d):tyys) :: = = | | Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)] tys txs x < y = (x, c + length ys) : join txxs ys x >= y = (y, d) : join xs tyys "Join two tables, txs and tys. If txs is empty then return tys. It tys is empty then return txs. Compare the first value of txs with the first value of tys. Specifically, compare the first value of each pair, (x,c) and (y,d). If x < y then x's surpasser count is c plus the length of ys (ys is an alias for the table). If x >= y then y's surpasser count is d. Join the remaining tables." 51 Efficiency improvment • Each time the join function is invoked it computes the length of tys. • To gain a slight efficiency improvement, invoke join with an additional argument: a value, n, corresponding to the length of tys. 52 Here's how to implement msc msc msc ws :: = Ord a => [a] -> Int maximum (map snd (table ws)) "Invoke the table function with the list, ws. It returns a list of pairs, (value, surpasser count). Create a list containing all the surpasser counts. Use map snd to accomplish this. Now get the largest surpasser count." 53 Here's the complete implementation import List -- msc = maximum surpasser count msc msc ws :: = Ord a => [a] -> Int maximum (map snd (table ws)) table table (w:[]) table ws :: = = Ord a => [a] -> [(a,Int)] [(w, 0)] join (table xs) (table ys) where m = length (ws) n = m `div` 2 (xs,ys) = splitAt n (ws) join join [] tys join txs [] join xs@((x,c):txxs) ys@((y,d):tyys) :: = = | | Ord a => [(a,Int)] -> [(a,Int)] -> [(a,Int)] tys txs x < y = (x, c + length ys) : join txxs ys x >= y = (y, d) : join xs tyys 54 Time Requirements • With a list of length “n” the msc function shown on the previous slide takes on the order of n log n steps. That's a lot faster than the first solution, especially for a large list. 55