11 Comments
User's avatar
Alastair's avatar

There is, of course, another possible explanation for the handicap drift. The handicap system has ongoing fundamental flaws (please read: https://www.facebook.com/share/p/1AaiydYTUC).

These will have consequences that should be unsurprising to anyone who understands the problem. Just to recap, the main flaws in the system are miscalculating the probabilities of winning & losing matches incorrectly, and then assigning the wrong outcomes & handicap movements to these results. 

The top players play most, if not all, of their matches off level, and, as these matches are deemed to be "more important", the assigned handicap movements are larger. This will have a predictable effect: handicap drift. Why? The explanation is simple:

When two players meet, one expects the genuinely better player to win more of their matches when they play off level. However, as the current system rewards Wins (and, even worse, Big Wins) with large handicap changes, this means that their handicaps move more than they should. There will therefore be a tendency for the very best players to drift away from the other top ten/twenty players. Also, these other players will now have worse handicaps than they should do, and when they play someone lower down the handicaps, and genuinely not as good as them, they Win, or even more likely, given they now have a faulty handicap, gain a Big Win. This roughly corrects their own handicap, but now the next lesser player down the chain has a wrong handicap. They now play someone worse than themselves and the process repeats.

What happens is that over time, on average, these handicap points are passed down the handicap system to lesser players.

Obviously, the better tranche of players will be passing on these handicap points, so they will not experience quite the same extreme handicap drift as the lesser players. This is precisely what we see in the data: the biggest drifts have occurred for the worst quantiles, while the best quantile has seen the least drift.

It is an inherent process that will continue while the current system is in place.

The idea to reduce all handicaps is an excellent one, with which I heartily agree. But it is only a sticking plaster that does not address the underlying problem that causes the issue.

Expand full comment
Ben Geytenbeek's avatar

Hi Alistair

Rest assured that the IHSC, on behalf of the ITC, is currently undertaking a review of the handicap methodology which will be data-driven based on now having almost 1 million match results recorded across almost 24 years on RealTennisOnline, especially given that the last wholesale review of handicaps was at the turn of the century.

On your point regarding the best players, the preliminary analysis suggests that yes, the calculation as to the expected scorelines when playing off level is not working as expected, at least at the extremes of handicaps. A 0 handicap playing a 10 handicap off level is much more likely to win 6/0 6/0 than a 70 handicap playing an 80 handicap off level - which is likely down to the fact that at the beginner level much more of the final score is due to chance whereas at the elite level much more of it is due to skill. The calibration is more correct for middle-of-the-curve players. This has resulted in a situation where, at least at the top level, handicap tournament organisers have recently started playing off half-handicaps to give the better players more of a chance (e.g. the Seacourt Silver Racquet, Moore Family Office Summer Challenge).

On your "chain" theory, as I understand it, I do agree that this does affect the very extreme end of handicaps (the top 10 or so), by the time the chain reaches even the scratch handicaps the effect will have dissipated so much to be not meaningfully notable, especially given that the top players have so few matches each year with a non-null result. By contrast, the effect at the other end of the distribution - beginners improving quickly - has the weight of numbers to have a much more sizeable effect over the whole population.

But yes, this is far from the last news you will hear on handicaps in the near future!

Ben

Expand full comment
Ben Geytenbeek's avatar

Also I'll add that it isn't necessarily true that all games played in a match are statistically indistinguishable - if you win one game you are more likely to win the next as you are more likely to have finished the last at the service end, which is why the review on handicaps is taking a data-led statistical approach rather than a probability theory approach. That's not to say they won't end up in the same place, just that we want to be very thorough and aware of all the nuances - for many of the top players their livelihoods depend on their handicap so we want to get it right!

Expand full comment
Alastair's avatar

As to using a data-led rather than probability approach, there should be a big caveat attached to this: the data set is the results from realtennisonline, which of course are based on faulty handicaps in the first place! The situation is analagous to the current worries over AI large language models i.e. so much content on the internet now is AI-created, that there are no longer clean data sets to train AI; it is now feeding on itself. Unfortunately, the data set of real tennis matches is not "clean", in that the handicaps that are being used to calculate match results are not correct, so any inferences drawn from the data should be treated with some caution.

It would be also be great if the underlying maths could at least be correct......

Expand full comment
Alastair's avatar

I’ve emailed you my discussion paper on handicaps. I’d love to hear your thoughts on it, as you seem to be thinking about all of this in depth.

The current algorithm inherently introduces instability into the system and causes drift; this isn't just because of a "chain" effect from the best players upwards; the effect is throughout the handicap distribution. At the moment, if two players of identically equal standard play each other, the current system changes their handicaps in over 3/4 of their matches. (If two players are genuinely identical, probability will mean that an unusual score will only appear by chance that would change their handicaps should only change in 5% of their matches.)

In reality, players are never of identical standard, and one will be better than the other one. The better player will most likely win, and the current system changes their handicaps, even though in most cases, probabilistically speaking, it shouldn’t. This means that handicaps are changing more than they "ought" to.

The general tendency will be for better players to push worse players onto higher handicaps (and, yes, for lower players to push better players onto lower handicaps, but see below....). This effect will be true for all quantiles.

One might hope that the players who have then lowered their handicaps by these matches would be corrected by losing subsequent matches. But there are strong barriers to this: there is the artificial barrier of a 6 handicap point difference making a match Null; the number of better players becomes fewer & fewer the lower the handicap, so there are fewer opportunities for this correction to occur; the current system gives the better players much larger handicap movements for a Win or a Big Win, which means that they move out of range of worse players far more quickly; and then the number one player in the world is effectively a singularity, where there are no players better to correct their handicap. All of these effects mean that the corrective mechanism doesn't happen, and handicap points are passed up through the system - as indeed the data shows.

[for many years, Rob Fahey was untouchable, in handicap terms, as he was more than 6 points away from the next best player]

I don’t disagree with the idea that beginners coming into the game will probably have an effect on the handicap structure, but there is a more fundamental force driving handicap drift that is built into the algorithm.

Expand full comment
Ben Geytenbeek's avatar

I can't agree that handicaps are changing more than they "ought" to, for the following reason:

Handicaps are a "best estimate" of a player's ability relative to other players. But what it hides is a probability distribution, a player can play above or below their standard because of the weather, how much sleep they had last night, luck, how much their opponent matches their style etc. So underneath each handicap is a range that reflects the different possible outcomes from the match - I think we agree on this point. But where my interpretation differs to yours is how a handicap should be changed given new data. You take a frequentist approach, and argue that handicap changes should occur only when a result falls outside the bounds of statistical significance. I take a Bayesian approach, and argue that we should update our prior of what a player's ability is given the new result available. That isn't to say I necessarily agree with how the approach is implemented currently - it makes certain simplifying assumptions that I don't think hold up under scrutiny. In either case, one's handicap should follow a mean-reverting random walk. In the Bayesian case, steps are small and frequent, whereas in the frequentist case, steps are rare and large. Both are statistically valid approaches, and the implementation comes down to other values one ascribes to one's handicap. At some point, one has to leave the world of statistics and data and enter the realms of human psychology and game theory - what system best encourages a rational player to improve and an irrational player to want to keep playing (and paying)? This is a much tougher question...

I also don't agree that there are barriers to the restoring force that you highlight. The threshold that a 6 handicap point difference results in a null result only applies to matches where the the difference between the handicap difference played and the true handicap difference is more than 6, for most players above a 10 handicap this will almost always be true. That the system has larger movements for better players is not true - what is true is that the system has larger movements for longer-format competitive matches, which only get played for at Opens. Otherwise they move the same as everybody else. That the movement is larger for longer matches is a rough approximation for the statistical certainty in the result - no matter if you are Bayesian or frequentist, but it's current implementation is arguably poor in practice.

The restoring force also becomes a significant drag even after moving 1 or 2 points. Playing 2 points away from the "true" handicap in the current system has a 55% bias on games won, which multiplies up when you consider the likelihood to win a set, let alone a match.

On the data, the first step in the analysis is to confirm whether or not the system is working as intended, which we can show that it is for handicaps between 20 and 65 and for odds between level and owe 15-rec 15 (though I acknowledge you have to take me on faith for that assertion as I can't show my working at this stage as it is not yet ready for publication). Outside those bands, that is where we start to see issues and any changes that will be proposed will be about addressing those.

Data is a journey, and it is possible to pick apart the story it tells in spite of the vessel it travels in.

Expand full comment
Alastair's avatar

Interesting!

I don't disagree with a Bayesian approach, but at the moment, that isn't what is happening. My understanding of the current model is that it is frequentist, but it alters handicaps often & by a large amount. My model would alter handicaps less frequently & by a smaller amount.

I often played matches that were Null when I was playing off level, which was a common occurrence for single figure/teen handicap matches in National League or tournaments. And this is precisely what I meant by better players' handicaps moving by larger amounts: most, if not all, of their matches are tournament or "important" matches, so their handicaps are altered by a larger amount. If one IS taking a frequentist approach, then the ONLY thing one can say is that the handicaps are wrong, and so the only legitimate movement in the respective handicaps is 0.5 for both players; the longer match simply gives a better estimate for the probability.

I don't have access to the data, so I cannot judge what you say about the system for handicaps from 20 to 65. It would be VERY odd if it were to work correctly, given that the underlying maths is so entirely wrong, but maybe you are referring to whatever new system is being proposed? I'd love to be allowed to read that whenever permitted.

Expand full comment
Ben Geytenbeek's avatar

I would describe the current system as spiritually Bayesian, insofar as the goal is to update the prior assumption based on the new data - but agree that it isn’t in practice, since the movements are fixed regardless of how much data is available. The current system doesn’t do this at all well, especially when you have rapid improvers who have finished their provisional period.

When I describe the handicaps between 20-65 as “working well” what I mean is that it is successfully proposing odds that are giving the players a 50-50 chance of winning, and that players with better handicaps are more likely to beat players with worse handicaps. But again, I can’t make that more than assertion at this stage in this forum.

I agree that the way that the current system deals with “wrong odds” matches is flawed, and that where level matches outweigh handicap matches the results are skewed such that the games aren’t fair. It’s something we need to fix, but I’m not going to preempt the final findings here.

Expand full comment
Phil Dunn's avatar

Nice to see I'm not the only one sending Ben a bunch of things to consider about handicaps! (Admittedly mine weren't so mathsy)

Expand full comment
Joshua Greene's avatar

This is very interesting. I am curious how this compares with other activities with handicapping systems: golf, chess, and go.

Expand full comment
Ben Geytenbeek's avatar

Handicap inflation doesn’t occur in golf because you’re measuring yourself against a fixed point - the course. A scratch golfer is expected to score par regardless of the era.

Inflation and deflation does occur in Elo based and zero-sum rating systems like chess and go, though those games, though the size of those player-bases shrinks and grows much more substantially over time. Real tennis is more constrained in this sense: the number of players over time changes little because most clubs maintain an ideal number of members - they realistically cannot take on board a 50% increase in membership because then nobody would be able to play, and new courts are a rare and celebrated occurrence - but they do churn through members which is what is causing the effect here.

Another system where inflation occurs is in the new FIFA world rankings, but in those there are a fixed list of participants (the national FAs) unlike real tennis where participants come and go. But there, their implementation is not quite zero-sum. But they also don’t care about the absolute value of their ranking points, but in real tennis we do in a large number of circumstances

Expand full comment