Do we overestimate the impact of recent form?
Form is one of the most frequently discussed concepts in football. Players, coaches, journalists and pundits routinely refer to the momentum generated by a succession of victories. Form guides are ubiquitous in pre-match reviews, the idea being that recent results are indicative of a team's chances of winning their next match. But is this idea founded in fact?
Looking for the impact of form, or momentum, in football is tough. There are a multitude of confounding variables, most notably variations in the quality of opponents, the effects of home advantage, changing team selections and small sample sizes. Situations are not replicable, making it difficult to study whether win probability has been enhanced by a recent run of victories. Nevertheless, there are some simple statistical tests that we perform to try to detect whether form has any impact on outcome of subsequent matches.
One test is to look at the frequency with which winning streaks occur. Streaks are a natural feature of most random processes: if you flip a fair coin 38 times in a row, there is a 45% chance that you'll get five consecutive heads at some point in the sequence. If form truly has an impact on future results, you would expect streaks of a given length to occur more frequently in practice than they would do by chance. In this post I'll investigate whether there is any evidence for this.
The frequency of winning streaks
The experiment went as follows: for every EPL season since the 95/96 season I identified all teams that won between 14 and 29 matches in a season and divided them into four groups: those that won 14-17 matches, 18-21, 22-25 and 27-29 matches. The last group contained only 17 teams, of which 15 won the title that year (the exceptions are Liverpool in 13/14 and Spurs in 16/17). Only two clubs have managed 30 or more wins in a season: Chelsea in 16/17 and Man City in 17/18.
Within each group, I measured the proportion of teams that completed a winning streak that season, varying the length of the streaks from 2 to 15 matches[1]; the red diamonds in Figure 1 show the results. The top-left panel shows teams that won between 14 and 17 matches in a season. The curve drops quickly as the length of winning streaks is increased: nearly 90% of the teams in this group managed a 3-match winning 'streak', but less than 20% completed a 5-match streak. Of the teams that won between 18 and 21 matches (top-right panel), just under half managed a 5-match winning streak, but only 20% completed a 6-match streak. In comparison, more than half of teams in the 22-25 win group (bottom-left) completed a 6-match winning streak, while over 80% of those teams in the final group (bottom-right panel) achieved a 6-match streak.
What would we expect these results to look like if form has no impact on future results? To determine this, I shuffled around the ordering of results for the teams in each group, essentially destroying any information about winning streaks (but perhaps randomly creating some new ones)[2]. By repeatedly shuffling the order of matches, we can evaluate how frequently winning streaks occur purely by chance. As a reminder: if form has a significant impact on future results, you would expect streaks of a given length to occur more frequently in practice than they would do by chance.
The grey shaded regions in Figure 1 represent the range of results obtained from these repeated shuffles (specifically, the 5-95th percentiles). The red diamonds, the observed results, lie well within these regions in all four groups, indicating that there is no evidence that winning streaks occur more frequently than what you'd expect by chance.
I then took the analysis one step further. Rather than shuffling around the ordering of each team's results, I simulated the matches based on the difference in the Elo scores of either side (scroll down to the appendix if you're interested in the details). These simulations control for differences in team strength and home advantage but use no information about the recent form of each team. The results are shown as the blue dashed lines in Figure 2. The simulations almost exactly reproduce the observed rate of winning streaks in each of the four groups[3]. Again, there is no evidence that recent form has an impact on subsequent results.
Form vs home advantage
How big an effect would momentum need to have in order to detect it in this analysis? To answer this I made a simple modification to the simulations to mimic the effects of momentum, incorporating a temporary and cumulative boost to a team's Elo score (or 'strength') after every match won during a winning streak (more details are given in the appendix). It turns out that, to detect the impact of momentum, the boost gained from winning four consecutive games would need to exceed the benefits of home advantage. Given that home advantage is a very well-established factor in predicting match outcomes, that's quite a big effect.
The statistics of winning streaks is a simple but crude way to go about looking for the impact of form on upcoming matches. The problem is that a good run of form is a very relative concept. More sophisticated studies have attempted to control for the relative strength of opponents, home advantage and the importance of each match. However, it all gets very complicated very quickly and the results are inconclusive. Simple analyses based on linear regression with form as a predictor have also tended to find no evidence that form has any predictive power.
What about form/momentum in other sports? There have been quite a few studies of 'hot hands' in basketball - the hypothesis that a player is more likely to make (score) a shot if their previous attempts were also successful. A seminal paper on the topic studied the performance of university basketball players, concluding that there was no evidence for hot hands. The authors suggest that the hot hands 'fallacy' is an illusion: people underestimate how frequently sequences of successes or failures can occur completely at random and therefore try to come up with explanations for why they occur. However, more recent studies (e.g. here, here and here) have questioned the methodology used and argue that more accurate statistical tests demonstrate clear evidence for hot hands.
Given that basketball (free throws) and baseball provide more of a controlled environment for evaluating momentum than football does, it isn't much of a surprise that attempts to detect it at the player level in football have generally drawn a blank. That isn't to say that the feel-good factor associated with a run of wins (or goals) doesn't exist: after all, players and coaches frequently refer to it. But perhaps we tend to read too much into form as a guide to what might happen next.
Thanks to Bobby Gardiner, David Shaw and Roxanne Guenette for their comments.
-------
[1] I measure the proportion of teams that achieved a streak, rather than the rate at which streaks occur, to avoid issues related to counting the number of shorter streaks in a single longer streak.
[2] I randomised the home and away matches separately so that the sequence of home and away fixtures remained the same.
[3] The agreement isn't just limited to the probability of streaks occurring, the average number of winning streaks of a given length in the simulations are consistent with the observed data, too.
Appendix: Simulation Methodology
The core of the Elo simulations is the method for simulating match outcomes. The number of goals scored by each team in a match is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:
logμ = β0 + β1X1 + β2X2
There are two predictive variables in the model: X1 = ΔElo/400 where ΔElo is difference between that team's Elo score and their opponents', and X2 is a binary home-advantage indictor equal to one if the team is playing at home and minus one otherwise. Note that Elo scores are explicitly designed to be predictive of match outcomes. The Elo score for each team at the beginning of each season is taken from ClubElo; the simulations are not run 'hot'.
The beta coefficients are determined via linear regression using EPL matches from the 09/10 to 16/17 seasons, obtaining values β0 = 0.25, β1 = 0.81 and β2 = 0.13. All are significant, as is the change in deviance relative to an intercept-only model.
To incorporate the impact of momentum I included a cumulative boost, b, to a team's Elo score after each consecutive victory. For example, the Elo score of a team that has won four successive matches would be boosted by 4b; however, as soon as the team fails to win a game the total boost is reset to zero.
The figure below shows the impact this form model has on the frequency of winning streaks. The plot shows three values of b: 0 (blue lines), 10 (green) and 20 (red). For comparison, home advantage is worth a boost of 65 to the home side's Elo score. The grey shaded regions indicate the range expected for the b=0 case (i.e. momentum has no impact on future outcomes).
A momentum boost of 20 would produce winning streaks at a rate that clearly exceeds the frequency they occur when momentum is not included in the model. In this scenario, four consecutive victories produce a temporary boost that exceeds the benefit of home advantage.
I'm not convinced this is the correct method, grouping teams by their number of wins at the end of a season. Since total wins is something we only know at the end of a season, where we want to know how likely a team is to win the next given it won it's last 2,3,...,etc games. If I tell you that a team only won 14/38 games, no matter how much shuffling you'll almost never get 7+ streak. If I told you Man City won every game, then no matter how you shuffle the steak will be 38 games.
ReplyDeleteIf we use your coin flip analogy, what you've looked at is "What is the probability of seeing X heads in a row given that I saw (X+N) heads after 38 flips", when what we really want to know is "What is the probability of seeing X heads given that the last throw the coin landed on heads".
To answer this question, we need to ask ourselves is a team's season like a markov chain ? If we let X_jt represent the outcome of team (win/lose/draw) j at time t, does it depend on X_j(t-1) ? is there a correlation ? Does it depend on X_j(t-2) etc .... We can all agree that outcomes of a football game are a stochastic process, the question is "Does it have a memory ? and if so, how long is it ?".
As I discuss in the article, looking solely at winning streaks is a fairly crude test for the impact of form. A Markov-type process is the obvious model, but in practice it is very tough to implement. You need a large enough dataset to be able to control for (or marginalize over) variations in the relative strength of opponents and home advantage. A team's overall strength can vary significantly from one season to the next and so the coefficients of the process would also be time-varying. I'm not saying that it can't be done, but I wanted to write an intuitive blog post on the issue of form, rather than a lengthy and complicated piece on stochastic processes.
DeleteI would definitely be very interested in any attempt to build a proper predictive model that incorporates form, though. I have previously found that short-run estimates of form produce no improvement in predictive power to models that already incorporate longer-run estimates of team strength.
Why not just use the same methods used by the betting companies to get the odds.
ReplyDeleteThey seem fairly accurate
I didn't include betting odds because of the risk that bookmakers might be factoring in recent form into their odds.
DeleteThanks for the thought-provoking article!
ReplyDeleteI'm not quite sure about the assumption though: form, as I understand it, is a way to describe the most recent performances, while a streak only takes the results into account.
Wouldn't it make more sense to take xG/xGA into account?
Yes, I agree. Sadly I only have xG data for EPL matches for the last few seasons -- not nearly enough to do this type of analysis.
Delete