Statistical Inference L3, Bootstrap, L2, L1 progression Formal and informal Continuing implementation of new NCEA standards Team Solutions Steel • Hogan • Cathcart • McNaughton • Barks • Johnson What may happen in this session. • • • • • • • • Data Questions and PPDAC Bootstrap idea Bootstrap confidence interval (CLT, normal and CI ) Contextual information Census at School resources Level 2, level 1, Year 9 and 10. What are we learning? • The bootstrap idea and how it is used to make an inference (now called formal inference). • The importance of contextual information for this standard 3.10 • A possible lesson sequence • How to justify √n • Where there are a lot of resources Data • The big idea of this standard is to make a useful inference from a sample about a population. The Population • The only definite thing we know about the population is we never know anything definite about the population! • And there would be no need for statistics • But, if we did, we could check to see if our statistical ways work. Let’s do that! The Vineyard Problem Before the harvest I want to estimate the total harvest weight reasonably accurately to plan processing. The population is 10,000 bunches of grapes. There are 500 vines and each vine is managed to have 10 shoots and each shoot grows two bunches of grapes. The Plan The plan is to pick a random sample of bunches and weigh each bunch. There are 13 rows 50m long so 6 random numbers between 1 and 650 were chosen. The bunches of grapes in a metre long section were picked from the vines at these places and weighed. The Sample Data I have the weights of 212 bunches of grapes. Analysis The median weight is 87.5 grams. The IQR is 109 – 70.5 =38.5 So the Y12 median estimate for the vineyard is 87.5 ± 1.5 x 38.5/√212 =[83.5, 91.5] I am pretty sure confident that this interval will contain the median. Conclusion Population parameter median is very likely to be contained in this interval;[83.5, 91.5]. I am pretty sure the vineyard harvest will be between 835kg and 915kg. I have stainless vats for 800kg so there will be a small surplus which I can ferment in a 200L plastic drum. Bootstrap for a better estimate I have the weights of 212 bunches of grapes. Bootstrap • The idea behind a bootstrap is to mimic the sample many times. • This is best simulated. • A computer is needed. • The distributions of means/medians of all the mimic samples reflect the population. The distribution of 200 resamples of size 212. Bootstrap, likely to be between 860kg to 945kg with 95% confidence. What controls the spread of the means? • From the left, sample sizes 3, 5, 8, 12 and 40. • Here I am varying the sample size. √n width 80 70 60 y = 125.41x-0.512 R² = 0.96453 Spread 50 40 30 20 Spread α 1/√(sample size) 10 0 0 5 10 15 20 25 Sample size 30 35 40 45 Key Ideas • List all the ideas of statistics Years 9 to 13 • Order the development of these Resources! • http://3rs.ccac.ca/en/research/reduction/experimentaldesign.html • • http://www.corwin.com/upmdata/29173_Millsap___Chapter_2.pdf • http://curiouscat.com/bill/101doe.cfm • http://stattrek.com/experiments/what-is-an-experiment.aspx • http://en.wikipedia.org/wiki/Design_of_experiments • http://webspace.ship.edu/cgboer/experiments.html • http://www.stat.auckland.ac.nz/~iase/publications/icots8/ICOTS8_ 4B2_ENGEL..pdf • http://statistics.about.com/od/Applications/a/Example-OfBootstrapping.htm • http://www.stat.rutgers.edu/home/mxie/rcpapers/bootstrap.pdf Resources! And … http://mediasite.temanuaute.org.nz/mediasite /Catalog/pages/catalog.aspx?catalogId=22dcec d5-8520-4ab7-b6c3-d96115a5dacf This is not the end, it is not even the beginning of the end, it is only the end of the beginning! Puzzle time Who on Earth said that and when?