Report

Simple Algorithm for Sorting the Fibonacci String Rotations Manolis Christodoulakis King’s College London Joint work with Costas S. Iliopoulos Yoan José Pinzón Ardila Our Goal What makes Fibonacci strings a best case input for the Burrows-Wheeler Transform (BWT)? Relationship between different rotations of a Fibonacci string What is their lexicographic order? Side effect: we can deduce the symbol stored at any position of any Fibonacci string in constant time (without using , provided that the fn values are known) SOFSEM 2006 2 Fibonacci Strings & Numbers The n-th Fibonacci string Fn = Fn-1Fn-2 n≥2 F0=b, F1=a The n-th Fibonacci number fn = fn-1+fn-2 n≥2 f0=1, f1=1 F0 = b f0 = 1 F1 = a F2 = a b f1 = 1 F3 = a b a f3 = 3 F4 = a b a a b f4 = 5 f2 = 2 SOFSEM 2006 3 Notation The i-th rotation of a string x = 0 1 … i-1 i … n-1 Ri(x) = 0 1 … i-1 i … n-1 where i is taken modulo n. rank(i,x) = the rank of Ri(x) rot(ρ,x) = the rotation whose rank is ρ SOFSEM 2006 4 Burrows-Wheeler Transform (BWT) M.Burrows and D.J.Wheeler. 1994 Purpose: to make a string more compressible BWT Algorithm: 1. 2. 3. 4. Create list of all rotations Sort them Output last symbol of every rotation Output the rank of the 0-th rotation SOFSEM 2006 5 BWT on Fibonacci Strings F5 = abaababa, f5 = 8 R70(F5) R21(F55) R25(F55) R R30(F (F55)) R43(F55) R65(F5) R16(F5) R74(F5) = = = = = = = = = a b a a a a b a a b b a b a b a a b b b a b a b a a a b a b b a a a b a a a b SOFSEM 2006 a b a a b a a b a b b a a b a b b a b a a b a a a b b a a a a b a a a b b a b a a b b a b a b b a a a b a b b a a a a b a b a 6 Properties of Fibonacci Strings The number of ‘b’ in Fn is fn-2 Proof: By induction. C.S.Iliopoulos, D.W.Moore and W.F.Smyth. 1997 Fn = Fn-2Fn-3…F1un, un = ba (n odd) un = ab (n even) Let’s call this the IMS formula. SOFSEM 2006 7 Similarities in Rotations R0(Fn) differs from Rfn-2(Fn) in 2 symbols Proof: R0(Fn) Rfn-2(Fn) R0(Fn) = = = = Fn-2Fn-3…F1un Fn-3…F1unFn-2 Fn-1Fn-2 Fn-3…F1un-1Fn-2 (1) (2) Ri(Fn) differs from Ri+fn-2(Fn) in 2 symbols Proof: Ri(Fn) = Ri(R0(Fn)) Ri+fn-2(Fn) = Ri(Rfn-2(Fn)) SOFSEM 2006 8 Relative Order of Rotations Ri(Fn) < Ri+fn-2(Fn) for n odd, i fn-1-1 Proof: R0(Fn) = Fn-3…F1un-1Fn-2 = Fn-3 … F1 ab Fn-2 Rfn-2(Fn) = Fn-3…F1un Fn-2 = Fn-3 … F1 ba Fn-2 For i=fn-1-1: Ri(Fn) = bFn-2Fn-3…F1a Ri+fn-2(Fn)= aFn-2Fn-3…F1b Similarly, Ri(Fn) > Ri+fn-2(Fn) for n even, i fn-1-1 SOFSEM 2006 9 Sorted List of Rotations We proved (n odd): Ri(Fn) < Ri+fn-2(Fn) i fn-1-1 (3) We will now prove that there is no j s.t. Ri(Fn) < Rj(Fn) < Ri+fn-2(Fn) Proof: (constructive) Start at i=fn-1 and construct the partial list Ri Ri+fn-2 Ri+2fn-2 Ri+3fn-2 … Ri+kfn-2 … for as long as i+kfn-2 fn-1-1 (mod fn) kfn-1 I.e. the list is complete! SOFSEM 2006 10 Identify Rotation (i) by Rank (ρ) Therefore, for n odd: rot(ρ,Fn) = ( fn-1+ρfn-2) mod fn = (ρfn-2-1) mod fn Similarly, for n even, the sorted list is constructed bottom-up giving rot(ρ,Fn) = (-(ρ+1)fn-2-1) mod fn SOFSEM 2006 11 Identify Rank (ρ) of a Rotation (i) This is simply the inverse of the previous function n odd rank(i,Fn) = ((i+1)fn-2) mod fn n even rank(i,Fn) = ((i+1)fn-2-1) mod fn SOFSEM 2006 12 Symbols of Fibonacci Strings Fn[i] = ? Observe that Fn[i] = Ri(Fn)[0] In the sorted list of rotations, the first fn-1 rotations start with ‘a’, the rest with ‘b’ Thus Fn[i] can be deduced from rank(i,Fn) If rank(i,Fn) ≤ fn-1 then Fn[i]=a else b. SOFSEM 2006 13 BWT & Fibonacci ― The Quick Way The first fn-2 symbols of BWT are ‘b’ Proof: (n odd) We proved the first fn-2 rotations have index (ρ·fn-2-1)modfn for 0 ≤ ρ < fn-2 The last symbol of these rotations is Fn[ (ρ·fn-2-1 +fn-1)modfn ] Which for 0 ≤ ρ < fn-2 is ‘b’ The next fn-1 symbols of BWT are ‘a’ Proof: Consequence of previous lemma SOFSEM 2006 14