Report

Using Time Series Modeling to Forecast Enrollments Robert J. Marsh, Ph.D. Associate Dean of Research and Assessment North Central Michigan College MI/AIR, November 8, 2013 Why forecast enrollment? Declining enrollment since 2010 Declining property tax base Declining state appropriation Uncertain budgeting process due to lack of data ACCURATE FORECASTS ARE MORE IMPORTANT THAN EVER Previous forecasts Intuitive, sense of potential population Trends in HS graduations Conversations with HS counselors Conversations with workforce organizations and advisory boards Assume flat enrollment for budget; be pleased with an increase Assume a 2-3% year-over-year increase Held steady for many years Extrapolate from recent terms’ trend Monitor statewide trends among CCs No data-informed methodology An enrollment forecast model Possible factors Area population Tuition Area unemployment High school graduate census Past enrollments Competition, especially online Tuition differential, four-year vs. community college Attitude towards education in general Community colleges in particular Regression approach Identify factors (xi), find coefficients (bi) F = b1x1 + b2 x2 +... + bn xn Identifying relevant or interesting (or pertinent) factors Enrollments can be more dependent on past enrollments than outside factors Area population Sometimes highly correlated, but not always. Time series forecasting Relies on past demand to forecast future Stores only one past period’s demand (originally necessary) Typically forecasts one period ahead Time series forecasting Simple: forecast = immediate past period Ft = At-1 Moving average: average of n past periods Ft = A t-1 +At-2 + At-3 3 n åA t-i Ft = i=1 n ( n = 3) Exponential Smoothing Originated in the production/operations management field, 50+ years ago (management science) Primary formula Ft = a At-1 + (1- a ) Ft-1 0.0 £ a £1.0 where a is the “smoothing coefficient” Relies on the past period’s actual (At-1) and forecast (Ft-1) values. An a close to 1.0 results in a very reactive model, one more responsive to very recent actual data. All that needed to be stored were the immediate past actual and forecast values. (Holt, 1957) Exponential Smoothing Each period’s forecast builds on all past periods, although only immediate past period’s needs to be stored F0 = A0 F1 = a A0 + (1- a )F0 F2 = a A1 + (1- a )F1 = a A1 + (1- a ) [a A0 + (1- a )F0 ] F3 = a A2 + (1- a )F2 = a A2 + (1- a )éëa A1 + (1- a ) (a A0 + (1- a )F0 )ùû “Exponential” Smoothing Reliance on progressively older data drops off exponentially F0 = A0 F1 = a A0 + (1- a )F0 F2 = a A1 + (1- a )F1 = a A1 + (1- a ) [a A0 + (1- a )F0 ] F2 = a A1 + a (1- a ) A0 + (1- a ) F0 2 F3 = a A2 + (1- a )F2 = a A2 + (1- a )éëa A1 + (1- a ) (a A0 + (1- a )F0 )ùû F3 = a A2 + a (1- a ) A1 + a (1- a ) A0 + (1- a ) F0 2 3 Exponential Smoothing with trend Trends within our actual enrollment patterns. Add a trend component Ft = a At-1 + (1- a ) ( Ft-1 + Tt-1 ) where Tt = b ( Ft - Ft-1 ) + (1- b ) Tt-1 ft = Ft + Tt 0.0 £ b £1.0 where b is the “trend factor” (Holt & Winters, 1960; Hopp & Spearman, 2011) Seasonality Enrollment is seasonal, with fall typically being higher than winter Fall Winter Add a seasonality factor by superimposing a sinusoidal function Exclude summers (much lower) Seasonality st = H sin éën (t -1) p + f ùû where n = frequency H = amplitude f = phase t = numberof period f £p Sine function is periodic over 2p, every two periods (every fall) the forecast tends to be higher. ft = Ft +Tt + st (Middleton, 2010) Methodology Obtained 30 years of enrollment data (credit hours) 60 periods (no summers) Built model in Excel® 2011 Used Solver® to “optimize” the model Minimized sum of squared differences (the objective function, Z) n SSD = Z = å( Fi - Ai ) i=3 2 by varying a, b, H, n and f subject to 0.0 £ a £ 1.0 0.0 £ b £ 1.0 f £p Excel®-spreadsheet Year 1982 1982 1983 1983 1984 YrTm 198230 198250 198330 198350 198430 Actual 14649 14778 14028 12113 13419 Smooth F 14649.00 14778.00 14028.00 12113.50 13419.00 Smooth T Seasonality FORECAST 129.00 129.00 129.00 129.00 1303.54 -1295.51 1287.47 -1279.44 16210.54 12861.49 13529.97 Z= Parameters a b H n f 1.0 0.0 11084.685 1.000233 3.02226 Squared Diff 4763482.0 559493.0 12314.0 5335291.26 Excel®-Solver® Z a, b, H, n, f a b f Results Initially used actual data through Winter 2011-12 to forecast Fall 2012-13 Discovered strong “starting point dependency” within the non-linear model in Solver Not optimal; best described as heuristic Results through Winter 2011-12, showing “best”: a start 0.5 0.0 1.0 b start 0.5 0.0 1.0 a 0.856 1.0 0.736 b 0.654 0.0 0.987 H n 10422.91 1.0002481 10067.420 1.0001813 10290.092 1.0002618 f 2.997 3.0150 2.9990 SSD 3.17 E+07 * 3.23 E+07 3.22 E+07 MAPE = 3.3%. Using model would have produced a 1.7% error for W12 Results Used “best” from W12 to forecast Fall 2012 F12 forecast = F12 actual = Difference = Model MAPE = 21,980 21,878 0.5% 3.3% Incorporated actual F12 data into model to forecast Winter 2013 a start 0.5 0.0 1.0 b start 0.5 0.0 1.0 a 0.848 1.0 0.742 b 0.670 0.0 0.987 H SSD n F 10186.25 1.0002578 2.993902 3.17 E+07 * 10186.26 1.0002604 3.011174 3.44 E+07 10392.14 1.000247 3.000360 3.22 E+07 Results Used “best” from F12 to forecast Winter 2013 W13 forecast = W13 actual = Difference = Model MAPE = 19,783 21,081 – 6.2% 3.4% Resulted in an under-forecast Assumes past practices continue More focused recruitment effort made 0 201330 201230 201130 201030 200930 200830 200730 200630 200530 200430 200330 200230 200130 200030 199930 199830 199730 199630 199530 199430 199330 199230 199130 199030 198930 198830 198730 198630 198530 198430 198330 198230 Model fit Credit hours by semester 30000 25000 20000 15000 Actual Forecast 10000 5000 Discussion Model is not optimal Non linear objective function; heuristic methodology Does not include any outside variables (as written) Solver® is blunt instrument Can be built on simple software platform Good visualization with graphs Relies on known data Very quick to update and do what-if analysis Pretty accurate I’m happy to share Bob Marsh North Central Michigan College 231.439.6353 [email protected] Discussion ?