Report

Daily Scratch Rating (DSR) PLAYING CONDITIONS Objective: The system should reflect variations in playing conditions (Principle 6). Why is it important to assess variations in playing conditions? - It is axiomatic amongst the designers of all handicap systems that a gross score must first be standardised before it can be used for handicapping. - Standardisation enables us to meaningfully assess the value of a score, and to meaningfully compare it with all other scores. For example, is 78 a good score? In order to answer the question we need to know the difficulty of the course. Why is it important to assess variations in playing conditions? Contd. - The objective of a course rating system is to enable us to standardise scores. - Course ratings are intended to precisely measure the difficulty a course presents to a golfer in the playing of their round. Why is it important to assess variations in playing conditions? Contd. - If the rating of a course is not a true reflection of the difficulty it presented to a golfer in the playing of a round, the player’s standardised score for that round will be inaccurate. - If the standardised score is inaccurate, the player’s handicap will be distorted (ie if inputs are inaccurate, so must the output also be inaccurate). Why is it important to assess variations in playing conditions? Contd. - Does the difficulty of a course vary from day to day? - We know this can happen. - Daily fluctuation can be caused by changed hole placements, varying green speeds & green firmness, and changed weather. Why is it important to assess variations in playing conditions? Contd. - So how often will a rating be accurate if it is static and unable to vary in-line with changes in course difficulty from day to day? - GA’s statisticians have conducted extensive analysis of the comprehensive repository of competition data in the GOLF Link database. Why is it important to assess variations in playing conditions? Contd. - The statisticians have looked for competitions where the weighted average net score varies by 1 stroke or more from the average net score that would be expected from the given composition of players. - The weighting is for field size – for small fields the variance from the expected average net score will need to be greater than 1 stroke. [More detail on weighting for field size is contained later in the presentation.] Why is it important to assess variations in playing conditions? Contd. - Could it be that whole fields are just having a good or bad day? - Our statisticians have factored into their analysis the standard deviation of player scores. - They have assessed that the likelihood of large fields simply playing well, or playing poorly, is very low (and the weighting for field size accommodates the potential for this to happen with small fields). Why is it important to assess variations in playing conditions? Contd. - Our statisticians have assessed that actual course difficulty aligns with the static Scratch Rating on only 33% of occasions. - Are we suggesting that a daily rating system can achieve 100% accuracy? - No, but 90-95% is a substantial improvement on 33%. Why is it important to assess variations in playing conditions? Contd. - But if a rating is sometimes a little high and sometimes a little low, does it all not just ‘even out’ under a static Scratch Rating System? - IMPACT ON NATIONAL TRENDS & AVERAGES. DSR will make handicaps in general less volatile. - Why? What is currently happening is that good scores on easy days are resulting in larger downward movements than will happen under DSR (because the course rating stays artificially high under the static Scratch Rating system). This causes a player’s handicap to dip temporarily, it becomes lower than it should be and the player struggles to play to the new value, before drifting out again. This is accentuated by scores on hard days being evaluated against ratings that are artificially low. A further impact of decreased volatility is that across Australia, the average handicap will increase very slightly. Why is it important to assess variations in playing conditions? Contd. - INDIVIDUAL IMPACT. Whilst the impact on national averages is important, in this case it only paints a small part of the picture. Examples of the impact that not having a DSR has on individual golfers are as follows: • Bill always plays in the morning when it’s calm, and Tony always plays in the afternoon when it’s windy. If there’s no DSR, both players will be having their scores evaluated against incorrect course ratings (one will be too high and the other too low). The impact of the two cases net off against each other to give an average that looks perfect. However, this doesn’t help Bill whose handicap will always be too low, and Tony won’t be happy either because he’s trying to improve his handicap and yet it’s always kept artificially high! Why is it important to assess variations in playing conditions? Contd. • Jenny’s underlying ability doesn’t change from summer to winter but because of the heavier winter conditions, she doesn’t score as well. Without DSR, Jenny’s handicap increases in winter and then decreases again in summer. Jenny doesn’t play much at the start of summer, so when she does start to play regularly at the height of summer her handicap is artificially high compared to those players whose handicaps have already adjusted to the easier conditions. • John’s handicap is only based on an average of 8 scores. As a result, it doesn’t take too many inaccurate ratings to distort his handicap at any given point in time. Distortion can be caused either by ‘top 8’ scores that were actually returned against artificially high ratings, or by scores in the worst 12 that would have been in the top 8 if they had been assessed against ratings more reflective of the true course difficulty than the Scratch Rating. Whilst the ups and downs may even out over the course of a substantial number of rounds, it seems hopeful to expect they will even out at any given point in time. • Fiona has been playing in the Melbourne winter. Without DSR, her handicap will increase because of the harder conditions. She visits her friend in tropical Cairns and plays golf. Coming out of the Melbourne winter with her artificially high handicap, Fiona wins the Cairns competition. Why is it important to assess variations in playing conditions? Contd. - But if a golfer plays on a very difficult day, won’t a poor score fall into the worst 12 of their most recent 20 scores? And if so, isn’t the need to adjust the course rating negated? If this is correct, why do we need DSR? • Firstly, handicap golfers are just as capable of playing well on difficult days as they are on easy days. And whilst the score may be worse on a hard day, it is not necessarily because the quality of the round is worse. In a proportionate number of cases it is instead because the difficulty of the course was increased. As a result, a round should not be dismissed just because the conditions were difficult and the score itself is commensurately worse. Why is it important to assess variations in playing conditions? Contd. • Secondly, a handicap system should be flexible enough to accurately assess the standard of rounds played by players under varying conditions. • Thirdly, a score may be assessed as poor when compared against the static Scratch Rating, but it may be in a player’s top 8 if it is compared against the true rating. • Fourthly, if a static Scratch Rating is being used, a score from a difficult day that is in a player’s top 8 will be assessed as being worse than it should be as it is being compared against an artificially low rating. This will make the handicap higher than it should be. • Fifthly, if a static Scratch Rating is being used, a score from an easier day that is in a player’s top 8 will be assessed as being better than it should be as it is being compared against an artificially high rating. This will make the handicap lower than it should be. • Sixthly, it is possible that the swings and roundabouts of the above may cancel each other out. However, a sample of 8 is not large and whilst the ups and downs may even out over the course of a substantial number of rounds, it seems hopeful to expect they will even out at any given point in time. Why is it important to assess variations in playing conditions? Contd. • But how much does course difficulty really vary from day to day and with the seasons? Below is a graph of DSRs that have been calculated from scores at a typical Australian golf club. Page 15 Why is it important to assess variations in playing conditions? Contd. • The graph shows clear fluctuations in course difficulty from day to day. • The graph also shows clear seasonal fluctuations in course difficulty. • But how much of an impact does all of this have on the calculation of handicaps? • One measure is to compare the proportion of Anchored players before and after the introduction of DSR by using the data from our typical Australian club. Page 16 Why is it important to assess variations in playing conditions? Contd. Page 17 Why is it important to assess variations in playing conditions? Contd. Page 18 Why is it important to assess variations in playing conditions? Contd. - So we can see that providing ratings that align with the actual difficulty presented to the golfer in the playing of their round does make a material difference to handicaps. - (Note: Approx 70% of the reduced Anchorage impact is caused by DSR (the remainder is caused by a mix of Slope and the Stableford Handicapping Adjustment).) Page 19 Exactly what is being taken into consideration when making the DSR calculation? - DSR inputs are: • Gender • Type of competition (Stroke, S’ford, Par) • Field size • Average handicap of field • Average net score of field - The suite of DSR algorithms is provided at Appendix A. Exactly what is being taken into consideration when making the DSR calculation? Contd. The use of the Mean Net Score for a given Competition • The Mean Net Score expected from a given competition (under normal conditions) is able to be well determined by establishing the average handicap of the field, and type of competition (Par, Stableford, or Stroke), and whether the competition was for Men or Women. Exactly what is being taken into consideration when making the DSR calculation? Contd. • This expected Mean Net Score will be less than the net score players would need to achieve in order to play to their handicaps. Let’s call this difference the Normal Deduction. • It is related to handicaps as follows: [see next slide] Exactly what is being taken into consideration when making the DSR calculation? Contd. Exactly what is being taken into consideration when making the DSR calculation? Contd. • The difference between the Actual Mean Net Score and the Expected Mean Net Score is a first approximation to an adjustment that should be made for handicap purposes to account for Course Conditions. • It will be positive on an easy day, and negative on a difficult day. The Weighting Factor and how the system can work even for very small fields. • However, it is necessary to give the value above a weight, which accounts for field size, and which is also dependent on the average handicap of the players in the field. This weighting factor is close to 100% for large field sizes, but as low as 25% for a field size of one. • The following table gives some examples: [see next slide] The Weighting Factor and how the system can work even for very small fields. Contd. • Only Modest Field Sizes are needed to get a good result. • A DSR can be calculated from a dataset of 1 player. Weight Average Handicap of the field Men’s Stableford Women’s Stroke 15 25 Field Size 20% 2 4 50% 10 16 80% 40 60 90% 90 140 Page 26 The Weighting Factor and how the system can work even for very small fields. Contd. • So we ascribe a greater weight to the average net score of a large field than we do to the average net score of a small field. • But how have we related field size to confidence? Our statisticians have used Bayes’ Theorem. Page 27 The Weighting Factor and how the system can work even for very small fields. Contd. • From Bayes’ Theorem is derived the formula for weighting the Daily Estimate compared to the measured Scratch Rating. • Bayes proved that it is optimal. There can be no better estimate of the Daily Condition than the Bayes estimate. • More information on Bayes’ Theorem is provided at Appendix B. Page 28 When is the cut-off for scores to be submitted in order for the DSR to be calculated for a given day? • There is no hard and fast rule. • GA’s direction to its clubs is that scores for a given day are to be processed through GOLF Link as soon as is practicable (irrespective of whether a daily rating system is in operation). • For some clubs this will be on the day of play (preferable). • For other clubs this will be within a week of the day of play (acceptable). How will the system work in practice in terms of updating players’ handicaps on GOLF Link? - GOLF Link will calculate all Differentials against the DSR, NOT the Scratch Rating. - DSR will not provide any administrative impost on clubs. - The club will enter the scores of players into GOLF Link (in the same way it would if DSR was not in operation), press the button, and GOLF Link will perform all the necessary calculations. What is the implementation protocol for competitive play and for extra day scores? - DSRs will be calculated for competition scores AND extra day scores. - All extra day scores returned at a course on a given day will be processed through GOLF Link as a single Batch (in the same way that competition scores will be processed). - If an extra day score is not processed in the appropriate Batch, it will still be eligible to be used for handicapping. Comparison with other methods used worldwide. - EGA and CONGU both operate a daily rating component. - EGA and CONGU both treat as indicative the proportion of players in each handicap grade to return a good score. - DSR uses all scores (ie good AND poor). - The analysis performed by GA’s statisticians led them to the conclusion that there is material value in using good scores AND poor scores. Comparison with other methods used worldwide. Contd. - The CONGU component operates by adjusting the scratch rating against which a player’s score is compared (as will DSR). - The EGA component operates ostensibly by expanding or contracting the buffer zones (however it effectively operates by adding or subtracting a value from a player’s net score so as to account for a variation in conditions). Comparison with other methods used worldwide. Contd. - CONGU and EGA prefer a greater degree of statistical certainty than GA in order for the daily rating to vary from the Scratch Rating. - After consideration, GA took the view that a more dynamic approach is more likely to yield outputs that will align with the golfer’s view of the difficulty of the course on the day. How easy is DSR to understand? - The concepts underpinning the EGA & CONGU daily rating components are all readily understood by golfers. - GA firmly expects that golfers will readily understand the concepts underpinning DSR. How easy is DSR to understand? Contd. - The mathematics underpinning these methods are more esoteric. - GA believes there are two key determinants of whether a daily rating system is accepted by golfers. Firstly, the degree to which it is conceptually understood. Secondly, its ability to produce intuitive outputs. - Golfers want ratings that align with their view of the difficulty of the course on the day. Is there a simpler daily rating solution? • CCR • DSR Is there a simpler daily rating solution? Contd. - In Australia we previously operated a simple mathematical model (CCR) where the 12½% net score became the daily rating. - CCR was necessarily simplistic (as it operated prior to the age of universal computer use) and it was statistically inefficient. Is there a simpler daily rating solution? Contd. - The mathematics of CCR were readily understood by golfers and administrators. - However the commonly-held view of golfers (and administrators) was that CCRs were often more a reflection of the quality of the field than they were of the difficulty of the course (eg veterans’ fields would produce higher CCRs than regular handicap fields). Is there a simpler daily rating solution? Contd. - CCR did not work so well for clubs in regional areas or for women’s fields, ie in cases where fields were frequently small. - A further problem with women’s fields was that they typically exhibited materially higher average handicaps than do men’s fields. - Our statisticians are supremely confident that DSR will deliver materially better outcomes for our golfers than did CCR. Results of pilot study. - DSR was developed by the Daily Rating Statistical Review Group – comprised of experienced administrators and highly accomplished statisticians. - It was laboratory tested against millions of rounds. - It has also been trialled in a live environment in a diverse selection of Australian clubs. Results of pilot study. Contd. - There have been three phases to the live trial. - The first two phases were held across different seasons. - We are currently engaged in the 3rd and final phase. - The live trial has involved a diverse selection of 20 clubs (regional and metropolitan, large and small). Results of pilot study. Contd. - DSR values have been calculated each day and emailed to officials at each trial club. - These officials were asked to comment on the degree to which the calculated DSRs have aligned with their intuitive view of the difficulty of the course on the day. - The feedback from the live trial clubs has been very positive and extremely encouraging. APPENDIX A – The DSR Algorithms Normal Deduction: ND = mH+b Where H is the average handicap of the field, m and b are taken from the table below, representing the slope and intercept of the straight line of best fit. R is the correlation of this fit. Par P Stblfd S Stroke K m (0.052) (0.111) (0.124) Men b (2.777) (3.498) (4.372) R 0.943 0.972 0.973 m (0.062) (0.117) (0.146) Women b (2.514) (3.338) (3.939) R 0.959 0.953 0.984 Weighting Factor = n/((m'H+b')2/CSD2+n) Where n is the field size, H is the average handicap of the field, CSD (the Course Standard Deviation) is estimated at 1.5, and m’ and b’ are taken from the table below representing the slope and intercept of the straight line of best fit for the empirically derived standard deviation of the Normal Deduction. Par P Stblfd S Stroke K m' 0.023 0.057 0.083 Men b' 3.194 3.841 3.993 R 0.856 0.829 0.956 m' 0.027 0.057 0.081 Women b' 3.040 3.775 3.889 R 0.760 0.917 0.938 CPA (Course Parameter Adjustment) = Prior CPA + WCA x 0.02 x (0.7 for Men, 0.5 for Women) Where WCA (Weighted Condition Adjustment) is SR (Scratch Rating) minus DSR. Putting DSR into a single formula: DSR = SR - (S- (36+Par-SR+CPA-mH-b))n/((m'H+b')2/CSD 2+n) Where S is the actual average Stableford points scored in the competition, and the other symbols have their meaning as above. APPENDIX B – Bayes’ Theorem BAYES’ THEOREM Bayes’ Theorem starts with an estimate of some value you’d like to ascertain; then it assumes that you can take a sample which gives a further but limited estimate of the value; from these pieces of data the formula gives a final estimate which combines the initial estimate with the sample result. It effectively gives the amount of weight that can be given to the sample compared to the initial estimate, this in turn being based on the sample size and the inherent variability of the estimate and the sample. EXPLANATORY NOTES ON BAYES’ THEOREM Suppose there is a true answer to a question. Let’s take two examples to demonstrate: 1. What is the proportion of voters that will vote for the Republicans in the next election? 2. What is the true scratch rating of a golf course on which I will play tomorrow? Due to conditions, the course may play differently to the static scratch rating which may be considered as an average or normal rating. In the case of the voters, every four years the question is answered. But along the way, there are many polls of voters’ intentions. Eventually we will know the truth. In the case of the course, we will never know the true answer. We could only know if all the golfers in the land actually played on it tomorrow, clearly impossible. APPENDIX B – Bayes’ Theorem EXPLANATORY NOTES ON BAYES’ THEOREM (contd.) When we take a poll of voters, the smaller the sample, the less accurate the poll result. The greater the sample, the greater the accuracy, until on voting day, we get complete accuracy. When we take a sample of golfers’ scores tomorrow, we will get a “poll”, and by comparing their average score to what would be expected for a field of that composition, we will get an estimate of whether the course played easier, harder, or no different to normal. The more golfers in the sample, the better the poll result. If the average net score is higher than expected, then this is a poll result suggesting that the course played harder. Bayes however, was able to build in the concept of having an idea of what might be expected as an answer before you take the sample. In the case of voters, it might be voting patterns at the last election. And you can measure how variable this has been over the years. In the case of a daily scratch rating it might be the static scratch rating ascribed to the course. He then said we will only depart from the first idea if there is enough evidence to prove that there is indeed a better answer. So, in the case of voters in an area which in the past has voted 55% Republican, finding that only 30% of a sample of 100 voters intends to vote Republican at the next election, may not be convincing. The same result with a sample of 10,000 voters may be very significant. Bayes’ theorem allows the optimal weight to be ascribed to the poll result, and to determine if this result is significant. APPENDIX B – Bayes’ Theorem EXPLANATORY NOTES ON BAYES’ THEOREM (contd.) In the case of the daily scratch rating, application of the theorem allows us to come up with an estimate of the daily rating, but only lets it deviate from the static rating if, firstly, the average scores indicate a deviation and, secondly, the number of players making up the sample is sufficient to make a different result the most likely on the balance of probabilities. Of course the poll of golfer scores may indicate that there is no case for deviation at all, either because the sample is too small, or because the average of their scores was quite close to what would be expected. But if there is a deviation, after the Bayes weighting is applied, we can be certain that this estimate of the daily scratch rating is the optimal value available based on the evidence. It is more likely to be a reflection of the true daily rating on that day than the static rating. One might say that Bayes’ theorem allows us to anchor the new answer (the daily scratch rating) firmly to the original idea (the static scratch rating), and, conservatively, only allows deviation when the evidence is overwhelming. QUESTIONS