Grid Based School Enrollment Forecasting

Report
Grid Based School Enrollment
Forecasting
Richard Lycan – Institute on Aging
Charles Rynerson – Population Research Center
Portland State University
Portland Oregon
ESRI Education Conference
San Diego, July, 2014
You can download the latest PowerPoint file for this presentation at:
http://www.pdx.edu/prc/news-and-presentations-from-the-population-research-center
Population
Research Center
What this paper is about
•
The authors have been involved in school enrollment forecasting for a number of
years and have experimented with various ways to improve the forecasting
process.
•
In this paper we will show how a simple model that is normally based on data for
school attendance areas – elementary, middle, and high school, or perhaps
planning areas, can be implemented for small grid areas roughly the size of a city
block.
•
We are using data for the Portland Public Schools area because
– We have geocoded student record data for a long time period – 1996 to the
present
– We have familiarity with the social demography of Portland
– But, the geographic pattern of changes in the 2000-2010 period was
complex
•
Evaluating the results of our model
– We start our forecast in 2003 and forecast enrollment by grade level in 2006
and 2009.
– We compare the results of the grid based model with that based on a model
for the 37 middle school attendance areas.
Common Forecast Methods
•
•
•
Cohort component
– Informed by age specific rates for deaths, births, migration
– Most often used for large geographical areas, counties, school districts
– Often relied upon for long range forecasts
Housing based
– Uses estimates of students per household for different housing types
– Requires knowledge of local housing markets
– Informed by GIS analysis or local knowledge such as student census
Grade progression model
– Informed by recent enrollment history
– Can be useful for short term forecasts
– Simple model – we will explore a grid based implementation of the grade
progression model
The grade progression model
•
Tracks a cohort of students
over time, e.g. the students in
grades KG-02 in 2000.
•
The grade progression ratio
(GPR) is the transition ratio
from one cohort to the next,
e.g. 0.91 = 724/795
•
The forecast begins in 2003
and extends to 2009. The
grade 06-08 in 2006 forecast
of 659.3 = 0.91 * 724.
•
Forecast error is shown by
subtracting the actual value
from the forecast.
3
3
5
11
39
1
2
6
1
5
62
4
1
2
8
1
2
5
1
9
826
6
11
7
1
116
3
3
11
834
1
13
1
4
1
3
41
49
1
36
1
13
1
1
3
2
1
52
1
20
2
4
11
4
229
2
3
1
1
2
2
Roosevelt
2
2
2
165
121
91
2
303
29
194
85.1
31
152
79.6
27
118
77.1
71
374
81.0
1
1
5
1
2
3
1
219
61
191
111
173
102
3
11
114 137
78
297
73.7
19
80
76.3
46
237
80.6
38
149
74.5
29
202
85.6
33
135
75.6
39
22
153 159
74.5 1 86.2
4
1
11
1
7
164
6
5
5
3
29
13
2
2
2
8
20
2
13
31
18
1
37
61
1
2
7
120
1
1
14
8
1
133
122 9 3
61
42
16 1 5
2
8
2
2 4
8
3
1
1
98
3 2
1
240
2
54
79
153
178
189
33
87
62.1
26
105
75.2
50
203
75.4
38
1
216
82.4
3
63
252
75.0
9
111
91.9
7
56
7
41
718
781
1
80
861
90.7
1
133
1
1
13
4
6
3
3
3
5
1
1
17
2
4
1
1
1
5
3
1
2
33
1
886
32
6
236
5
36
89
5
12
6
33
1
13
6
2
3
36
4
2
1
1
7
3
1
1
1 34
15
3
70 6
15
128
115
22 4
16
131
2
2
11
3
182
133
186
29
211
86.3
4
27
160
83.1
26
212
87.7
1
1
3
21
1
1
2
79
1
1
2
223 1
7
42 111
16
3
6
25
2 4
15
5
11
4
King
6
7
860
21
1
1
8
2
1
866
1
6
1
1
2
3
1
4
30 3
263
102
77
51
46
1
165
1
33
198
83.3
2
22
8
79 156
311
92.9 94.9
13
162
92.0
30
2
5
5
1
2
1
3
32
1
7
29
68
339 7
2
21
11
1
149
1
1
13
1
1
1
4
4
161
1
51
5 878
32
1
2
1
3
1
11
289 148
70
1
81
3
5
1
101
2
14
2
16
695
11
3
88
115
2
111
2
23
501 105 140
1
10
36
31
16
2
537 136 156
93.3 277.2 89.7
1
29
2
7
69
5
4
1
1
2
2
849
879
895
3
1
23
15
3
852 1277
1
902
5
1
12
2
3
1
8
7
1
1
348
3
2 2
17
895
2 2
1 3
7 6 2
4
23
7
2
1
2
27
6
148
32
3667
2
63
356
21
1
1
3
4
Jackson
George
Peninsula
Astor
Lane
Lent
Marysville
Jefferson
2
1
6
16
1 1 847
2
830
2 2
51
2
2
4
11
177
93.8
6
2
1
27
3
875 1243 827 841
5
28
1
47
4
2
2
860 6 866
878
11
1 12
5
2
5
1
9
8
44
1
1
1
1
2 1
1
2
2
King
7
1
2
847
1
51
870
7
1
166
1
1
3
842
7
4
Humboldt
Vestal
3
9 26
3
1
1 21
2
2
8 1
1 3 1
5
4
4
868
1 8
4
830
2
1
Clark
Faubion
Scott
5
5
1
886
2
Beach
2
Sabin
Rigler
Roseway Hts
Lee
4
868
8
1
1
102
2
71
5627
715
7
13
1
1
Laurelhurst
Skyline
Woodlawn
40
10
11
2
2
1
6
1
8611
896
4
Humboldt
1
6
21
3
2
9
17
2
29
7
831
55
43
10
2
1
2
69
1
1
14
887
2
Faubion
7
2
164
9
1
4
6
6 7
420
5
29
857
598
71 3
1 2 41
2
56
885
2
Beach
3
1
1
861
1
24
4
23
10
131
3
Grant 1
3
884
1
869
Sabin
1 41
1
1
79229
230
2
2
5 877
5 1
1 261
7
48
2
2
6
2
7
1
12
82
6
373
833
67
3
857
3
898
2
Laurelhurst
2
1
6
4
16
5
3
2
890
1
1
2
Irvington
7
2
1
833
902
Irvington
Sylvan
3
877 831
26
3
1 1
1
16
8
21 6 17
2
5
4
895
3
Cleary
Ockley Green
Faubion
Humboldt
2
Boise-Eliot
Vernon
878
2
Cleary
1
13
843
866
Boise-Eliot
6
1
860
1
King
Beaumont
3
843
13
8
2
21
3
6
433
2
1
1
Franklin
1
3
21
2
49 62
133
136 52
70
36
20
4 1
2
1 13
847
1
Mt. Tabor
6
830
2
Beaumont
834
CrestonBeach
Sabin
Laurelhurst
9
886
2
Creston
Arleta
888
14
370
3
111
1
Sunnyside
858
2
368
40
1
2
23
4
1
1
893
868
8
Bridger
861
1
2
13
Bridger
10
Arleta Irvington
Cleary
Boise-Eliot
Sunnyside
Mt. Tabor
Creston
Sellwood
Beaumont
857
3
888 893 826
14
6
6
1
3707
111
1
7
164
8
20
14
23
11
62
1
37
4Cleveland
15
1
98 116
7
3
6
2
56
37
6
9
40
11
22
31
2
1
39
5
11
1 9
1 1
3
1
1
4
6
17
833
Mt. Tabor
1
831
3
Sellwood
4
49
1
36
1
843 877
13
26
1
1
2
21
858
1
368 1
52
4
2040 229
2
Lincoln
4
23
3
902
3
91
24
12
4
6
6
44
5
23
293
53
719
11
327
24
68
6 5
70
2 1
7
567
96 422
1
6
90
22
57
6
657 2118 479
86.3 281.4 88.1
4
6
5
3
29
67
5
71
1
2
OUT
Total
16
548
2
480
2
1
215
890 11 898
151
1
1
272
4
85
3
25
559
1
6
450
3
1
91
Lincoln
1
142
1
111
1
224
59
111
494
1
3
159
4
68
1
163
890 898
1
96
11
281
2
63
4
12
863
3 6 146
1
88
2
51
6
178
1
155
9
136
5
255
4
3
128
11
2
130
11
1
12
397
2
138
Sylvan
857
Beverly
841 Clarendon
- Portsmouth
K-8 Cleary 2-8
Grant
902
6-8
849 George M.S.
861 Woodlawn
Irvington K-8
879 Peninsula
K-8 Skyline K-8
890
868M.S.Laurelhurst K-8 1
Lincoln
852 Robert Gray
Wilson
1277 Jackson 898
M.S.
886 West
SabinSylvan
PK-8 M.S.1
504 426
83
Total in regular middle schools
830 Beach PK-8
In other schools
149
62
4
847 Faubion PK-8 653 488 87
Total Residing
Percent in
regular
schools
77.2
87.3
95.4
860 Humboldt PK-8
Jefferson
866 King 6-8
878 Ockley Green School 6-8
895 Vernon PK-8
902 Woodlawn 6-8
890 Skyline K-8
Lincoln
898 West Sylvan M.S.
834
13
Hosford
Arleta
Middle School Attending
858 888 893 826
attending
858 Hosford M.S.
368
14
6
6
888 Sellwood M.S.
40 370
11
893 Sunnyside Environmental 6-8
23
11
62
1
HS Cluster
Middle School Attending
826 Arleta K-8
4
1
116
834 Bridger K-8
3
858 Hosford M.S.
Franklin
843 Creston 6-8
11
2
3
Cleveland
888
877 Mt. Tabor
M.S. Sellwood M.S. 39
5
9
11
831 Beaumont M.S.
1
1
893 Sunnyside Environmental 6-8
833 Boise-Eliot PK-8
High school
residing
Arleta
K-8 cluster
857 Beverly 826
Cleary 2-8
2
Grant
861 Irvington K-8
834K-8Bridger K-8
50-100%
868 Laurelhurst
2
5
% of residing
Franklin
886 Sabin PK-8
843 Creston 6-8
population
25-49% 830 Beach PK-8
877
847 Faubion
PK-8 Mt. Tabor M.S.
attending
PK-8Beaumont M.S.
12.5-24% 860 Humboldt
831
Jefferson
866 King 6-8
833
Boise-Eliot
PK-8Attending
878
Ockley
Green
School
6-8 School
HS Cluster
Middle
895 Vernon 857
PK-8 Beverly Cleary 2-8
858
902 Woodlawn
6-8 Hosford M.S.
Grant
861
890 Skyline K-8
Cleveland
888 Irvington
SellwoodK-8
M.S.
Lincoln
898 West Sylvan M.S.
4
1
868 Laurelhurst K-8 2
869 Lee 6-8 893 Sunnyside Environmental 6-8
886
Sabin
PK-8
884 Rigler 7-8
826 Arleta K-8
884 Rigler K-6
Madison
830
Beach
PK-8
885 Roseway834
Heights
6-8
2
1
4
Bridger
K-8
Franklin
887 Scott 6-8847 Faubion PK-8
843 Creston 6-8
896 Vestal K-8
1
Humboldt
842 Harrison860
Park K-8
877
Mt. Tabor PK-8
M.S. 1
870
Lent
K-8
Jefferson
866
King 6-8
Marshall
831
875 Marysville
K-8 Beaumont M.S. 1
4
878
Green
School
6-8
1243 Lane M.S.
6
17
3
833 Ockley
Boise-Eliot
PK-8
827 Astor K-8
895 Vernon PK-8
12.5-24%
Cleveland
Wilson
Sylvan
Roosevelt
Outside
Marshall
Vernon
Madison
Vernon
Lincoln
Ockley Clarendon
Green
Jefferson
Ockley Green
Grant
Hosford
HS Cluster
Franklin
Bridger
25-49%
Sellwood
12.5-24%
Cleveland
Sunnyside
50-100%
25-49%
High school cluster residing
High school cluster residing
Cleveland
Franklin
Grant
Jefferson
The Portland
district has many
programs
that are not geographically
based. It also
of residing
frequently
allows parents to choose schools outside of their neighborhood.
population
% of residing
population
attending %
Hosford
•
50-100%
Skyline
One way to do this is to used a table, such as the one below, showing the relationships
between where students live and which school they attend.
Skyline
•
Woodlawn
Gray
The forecast which we have produced is a by residing forecast. To get a by attending
forecast we need to distribute the residing students to the schools they attend.
Woodlawn
•
1
2
41 2
11
4
56056 8
585 158
162
388
122
7
419
715
651
8,329
33
618
94.7
66
224
70.5
1,496
9,825
84.8
1
1
41
56
7
715
Caveat – 2000 to 2010 a turbulent time for PPS
•
•
Recession and slump in housing
markets
Gentrification
– Affluent 30 somethings move into
close in housing
– Enrollment turnaround in some
central area schools
– Many black families moved to
suburbs
• School choice has resulted in race
and class size imbalance
•
The PPS District closed schools and
consolidated programs
•
Thus in evaluating the forecast
we consider areas where
enrollment change was:
– Constant (10)
– Turnaround (9)
– Confused (12)
Examples of enrollment trends
How did the forecast perform?
•
•
The 2009 grade 06-08 forecast was 9,005
students compared to actual 9,825. Early
downward trends did not predict a
turnaround in enrollment.
The MAPE – mean absolute percent error
?
– 12.0 % overall for middle school attendance
areas
– Middle school attendance areas
• 11.9 % with constant trend
• 13.4 % turnaround
• 10.9% confused trend
How is this done with a grid based model?
•
•
•
•
This map shows the
calculation of grade
progression ratios for
grades
03-05 in 2003
KG-02 in 2000
The map shows the
ratio between density
of students for the
two cohorts.
The orange areas
show increase in the
cohort trend, the
green decrease.
Density is calculated
in a bandwidth
surrounding each grid
cell center for
660’x660’ cells.
Example of Grade Progression Ratios for 03-05 / KG-02
•
The grade progression ratios
shown were calculated using the
CrimeStat IV crime mapping and
statistical package. The student
data were from geocoded student
records for Portland Public Schools
from 1999 to 2010.
•
Data were averaged over time by
using three year age groups. For
example, the data shown for 2000
are in fact an average of 1999,
2000, and 2001. The data also are
smoothed by using three year age
groups, KG-02 and 03-05.
•
The data were averaged over
space using grid density mapping.
An adaptive bandwidth of 200
students, was used (compared to
an average middle school size of
400 students) with a quadratic
distance decay function and a grid
size of 600 feet.
New Columbia
Grade
Progression
Ratio
0.30
0.40
0.50
0.60
0.70
0.80
•
The interesting reversal of trend
in the Clarendon attendance
area was due to the demolition
and subsequent redevelopment
of a large public housing area.
0.90
1.00
1.11
1.25
1.40
1.70
2.00
We replicate the earlier forecast using grid method
•
We use the grid map grade progression
ratios for
–
GPR. 1 =03-05/KG-02 for 2000-2003
–
GPR.2 = 06-08/03-05 for 2000-2003
•
We multiply the GPR.1 grid map times
the GPR.2 grid map to get the product
map GPR.12
•
Using a point for each KG-02 student in
2003 we add the value for each cell in
the GPR.12 map to the student
attribute file.
•
The student point file contains the
geography within which the student
resides and the GPR.12 weighting.
•
We summarize the GPR.12 weight by
the geography, here the code for each
middle school area.
•
Voila! The resulting table contains the
enrollment forecast for grades 06-08 in
2009.
Grade
Progression
Ratio
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1.11
1.25
1.40
1.70
2.00
Choices
•
•
•
•
There are a variety of implementations of the grid density
model, examples:
– ESRI Spatial Analyst (SA)
– The CrimeStat Spatial Statistics program (CS)
All provide some common options
– Cell size
– Distance weighting Quartic used
– Band width:
• Fixed – distance known, sample varies
• Adaptive – sample known, distance varies, not in SA
Common advice on options is that they don’t matter too
much for applications like finding crime hot spots.
However in using them for forecasting the metric may be
more important.
Quartic (Spherical)
Uniform
Triangular
Normal
Adaptive band width
• The adaptive band
width averages a
constant number of
points but the range
over which it averages
the points varies.
• A set number of points,
say 300, can be found in
a smaller region on the
denser east side of
Portland than on the
west side.
• A fixed band width (as
in S.A.) would
summarize fewer points
the west than in the
east.
Increasing bandwidth generalizes the data and map
•
The follow series of maps
show how the grade
progression ratio is
generalized as the
bandwidth in the density
mapping ratio is varied.
•
The bandwidth of 100, 200,
300, etc. is the number of
student points that are
included in the
computation of density for
the two cohorts.
•
Is there an optimal
bandwidth to use in the
grid based forecasting
model?
Results of the grid based forecast
•
Evaluate the grid based model versus actual
enrollment.
•
Explore the effects of varying the bandwidth
in the grid based model.
•
Compare the results for the standard and
grid based forecasts.
•
Evaluate the performance of the grid based
model for MSAA’s where the enrollment
trend was: standard, turnaround, confused.
•
Evaluate the use of the grid based model to
create forecasts for special geographies, here
gentrifying zones in the District.
Results of Grid Based Forecast
Compare grid forecast to actual by bandwidth
•
•
The results of the grid
based and standard
forecast are quite
similar.
Hosford, George, and
Lane are anomalies.
George is impacted by
enrollment shifts at the
New Columbia housing
development.
For some bandwidths
the locally high grid
values push the value
for Sylvan high.
Grades 06-08, 2009 Forecast
Band Width = 500
100
200
300
400
MAPE = 10.5
11.6 Y = 0.97X
10.6
10.4
1.06X
0.99X
0.98X
1,200
All
Sylvan
Constant
1,000
Sylvan
Sylvan
Sylvan
Reverse
Confused
800
Forecast
•
Linear (All)
Lane
Lane
600
Hosford
George
Hosford
George
George
400
200
0
0
200
400
600
Actual
800
1,000
Compare grid and standard forecasts by bandwidth
Grades 06-08, Grid vs Standard
Band Width = 500
100
200
300
400
MAPE = 10.5
11.6 RSQ = 0.994
10.6
10.4
0.982
0.991
1,200
All
Sylvan
Constant
1,000
Sylvan
Sylvan
Sylvan
Reverse
Confused
800
Grid
• Except for Sylvan
the results of the
grid and standard
forecasts are quite
similar at all
bandwidths as
shown by the
MAPE and R2
values.
Linear (All)
600
400
200
0
0
200
400
600
Standard
800
1,000
Mean absolute and algebraic error
•
•
•
•
For an increase in bandwidth from 100 to
200 students the MAPE for MSAA’s:
– Rises for reversal MSAA’s rises. It may
seem counter intuitive, but we should
expect a more efficient model to
increase the error level.
– Drops for confused (other) MSAA’s.
The forecast for areas which lack a
clear trend is improved.
– Drops slightly for constant MSAA’s
only drops slightly.
For bandwidths greater than 200 students
the MAPE does not vary greatly.
The average number of KG-02 students in
an MSAA was about 275. A bandwidth
roughly the size for which the point data
are re-aggregated appears to produce
reasonable results.
The MAPE for the grid and standard
models appear to order the three growth
trend classes in the same way but the grid
model results in more contrast.
MAPE for standard model
Forecast for custom geography
•
Top 10% of tracts by 1990-2000
change in percent baccalaureate +
education and MTP occupation
(after David Ley).
•
Gentrified / not gentrified added
to student point file.
•
Number and percent for students
in gentrified areas summarized for
actual 2003 and forecast 2009
enrollment.
•
Conclusion: Number and percent
of grade 06-08 students living in
gentifying areas declined from
2003-2009.
Number
Gentrified?
No
Yes
Percent of
Total enrolled
Gentri- No
fied? Yes
Actual Forecast
2003
2009
9,877
8,075
1,851
1,275
11,728
9,349
Actual Forecast
2003
2009
84.2
86.4
15.8
13.6
100.0
100.0
Deconstructing the Grade Progression Ratio
•
Other common measures
such as the capture rate can
be calculated as well.
•
Capture rate is the number
enrolled in the district’s
schools compared to the
number age eligible – for
example kindergarteners
divided by the age 5
population.
Here is the capture rate for
grades KG-02 in 2000, 2010,
and a map of the change in
the rate.
And, again, the grade
progression ratio using the
same classes and colors.
•
•
Conclusions
•
Neither the standard or grid based models produced a good enrollment forecast for the
2003-2009 period. During this time period there were major demographic shifts in the
District that confounded forecasts based on early trends.
•
The grid based forecast was best for the MSAA’s that changed in a confused way. It was
worst for turnaround MSAA’s. Bandwidth had little effect on the forecast for MSAA’s that
grew or declined in a constant trend.
•
The smallest bandwidth of 100 students produced erratic results. Bandwidths over 200
students produced reasonable results with minor variations in MAPE for bandwidths
between 200 and 500 students.
•
The effort involved in building the model was considerable, but the final workflow is simple
and easily could be scripted.
•
We think that the adaptive bandwidth approach is better than a fixed distance bandwidth
for this type of application. It would facilitate analysis and scripting if ESRI Spatial Analyst
provided an adaptive bandwidth option for its kernel density tool.
•
The grid based GPR model may be less useful as a primary forecasting model than as an
allocation tool to create forecasts for special areas, such as the example for gentrifying
areas.
Richard Lycan - [email protected]
You can download the latest PowerPoint file for this presentation at:
http://www.pdx.edu/prc/news-and-presentations-from-the-population-research-center Charles Rynerson – [email protected]

similar documents