Do we overestimate the impact of recent form?

Form is one of the most frequently discussed concepts in football. Players, coaches, journalists and pundits routinely refer to the momentum generated by a succession of victories. Form guides are ubiquitous in pre-match reviews, the idea being that recent results are indicative of a team's chances of winning their next match. But is this idea founded in fact?

Looking for the impact of form, or momentum, in football is tough. There are a multitude of confounding variables, most notably variations in the quality of opponents, the effects of home advantage, changing team selections and small sample sizes. Situations are not replicable, making it difficult to study whether win probability has been enhanced by a recent run of victories. Nevertheless, there are some simple statistical tests that we perform to try to detect whether form has any impact on outcome of subsequent matches.

One test is to look at the frequency with which winning streaks occur. Streaks are a natural feature of most random processes: if you flip a fair coin 38 times in a row, there is a 45% chance that you'll get five consecutive heads at some point in the sequence. If form truly has an impact on future results, you would expect streaks of a given length to occur more frequently in practice than they would do by chance. In this post I'll investigate whether there is any evidence for this.

The frequency of winning streaks

The experiment went as follows: for every EPL season since the 95/96 season I identified all teams that won between 14 and 29 matches in a season and divided them into four groups: those that won 14-17 matches, 18-21, 22-25 and 27-29 matches. The last group contained only 17 teams, of which 15 won the title that year (the exceptions are Liverpool in 13/14 and Spurs in 16/17). Only two clubs have managed 30 or more wins in a season: Chelsea in 16/17 and Man City in 17/18.

Within each group, I measured the proportion of teams that completed a winning streak that season, varying the length of the streaks from 2 to 15 matches^[1]; the red diamonds in Figure 1 show the results. The top-left panel shows teams that won between 14 and 17 matches in a season. The curve drops quickly as the length of winning streaks is increased: nearly 90% of the teams in this group managed a 3-match winning 'streak', but less than 20% completed a 5-match streak. Of the teams that won between 18 and 21 matches (top-right panel), just under half managed a 5-match winning streak, but only 20% completed a 6-match streak. In comparison, more than half of teams in the 22-25 win group (bottom-left) completed a 6-match winning streak, while over 80% of those teams in the final group (bottom-right panel) achieved a 6-match streak.

Figure 1: the proportion of EPL teams that have completed winning streaks of lengths varying from 2 to 15 matches in a season. Teams are grouped by the total number of matches they won each season: 14-17 (top-left), 18-21 (top-right), 22-25 (lower-left) and 26-29 (lower-right). The grey region shows the range produced by repeatedly shuffling around the order of results for each team (removing any impact that form might have on the results).

What would we expect these results to look like if form has no impact on future results? To determine this, I shuffled around the ordering of results for the teams in each group, essentially destroying any information about winning streaks (but perhaps randomly creating some new ones)^[2]. By repeatedly shuffling the order of matches, we can evaluate how frequently winning streaks occur purely by chance. As a reminder: if form has a significant impact on future results, you would expect streaks of a given length to occur more frequently in practice than they would do by chance.

The grey shaded regions in Figure 1 represent the range of results obtained from these repeated shuffles (specifically, the 5-95th percentiles). The red diamonds, the observed results, lie well within these regions in all four groups, indicating that there is no evidence that winning streaks occur more frequently than what you'd expect by chance.

I then took the analysis one step further. Rather than shuffling around the ordering of each team's results, I simulated the matches based on the difference in the Elo scores of either side (scroll down to the appendix if you're interested in the details). These simulations control for differences in team strength and home advantage but use no information about the recent form of each team. The results are shown as the blue dashed lines in Figure 2. The simulations almost exactly reproduce the observed rate of winning streaks in each of the four groups^[3]. Again, there is no evidence that recent form has an impact on subsequent results.

Figure 2. As in Figure 1, but including the results of Elo-based simulations in which recent form has no impact on the outcome of upcoming matches (blue dashed lines). The simulations are entirely consistent with the observed frequency of winning streaks in the EPL (red diamonds).

Form vs home advantage

How big an effect would momentum need to have in order to detect it in this analysis? To answer this I made a simple modification to the simulations to mimic the effects of momentum, incorporating a temporary and cumulative boost to a team's Elo score (or 'strength') after every match won during a winning streak (more details are given in the appendix). It turns out that, to detect the impact of momentum, the boost gained from winning four consecutive games would need to exceed the benefits of home advantage. Given that home advantage is a very well-established factor in predicting match outcomes, that's quite a big effect.

The statistics of winning streaks is a simple but crude way to go about looking for the impact of form on upcoming matches. The problem is that a good run of form is a very relative concept. More sophisticated studies have attempted to control for the relative strength of opponents, home advantage and the importance of each match. However, it all gets very complicated very quickly and the results are inconclusive. Simple analyses based on linear regression with form as a predictor have also tended to find no evidence that form has any predictive power.

What about form/momentum in other sports? There have been quite a few studies of 'hot hands' in basketball - the hypothesis that a player is more likely to make (score) a shot if their previous attempts were also successful. A seminal paper on the topic studied the performance of university basketball players, concluding that there was no evidence for hot hands. The authors suggest that the hot hands 'fallacy' is an illusion: people underestimate how frequently sequences of successes or failures can occur completely at random and therefore try to come up with explanations for why they occur. However, more recent studies (e.g. here, here and here) have questioned the methodology used and argue that more accurate statistical tests demonstrate clear evidence for hot hands.

Given that basketball (free throws) and baseball provide more of a controlled environment for evaluating momentum than football does, it isn't much of a surprise that attempts to detect it at the player level in football have generally drawn a blank. That isn't to say that the feel-good factor associated with a run of wins (or goals) doesn't exist: after all, players and coaches frequently refer to it. But perhaps we tend to read too much into form as a guide to what might happen next.

Thanks to Bobby Gardiner, David Shaw and Roxanne Guenette for their comments.

-------

[1] I measure the proportion of teams that achieved a streak, rather than the rate at which streaks occur, to avoid issues related to counting the number of shorter streaks in a single longer streak.

[2] I randomised the home and away matches separately so that the sequence of home and away fixtures remained the same.

[3] The agreement isn't just limited to the probability of streaks occurring, the average number of winning streaks of a given length in the simulations are consistent with the observed data, too.

Appendix: Simulation Methodology

The core of the Elo simulations is the method for simulating match outcomes. The number of goals scored by each team in a match is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:

logμ = β₀+ β₁X₁+ β₂X₂

There are two predictive variables in the model: X₁ = ΔElo/400 where ΔElo is difference between that team's Elo score and their opponents', and X₂ is a binary home-advantage indictor equal to one if the team is playing at home and minus one otherwise. Note that Elo scores are explicitly designed to be predictive of match outcomes. The Elo score for each team at the beginning of each season is taken from ClubElo; the simulations are not run 'hot'.

The beta coefficients are determined via linear regression using EPL matches from the 09/10 to 16/17 seasons, obtaining values β₀ = 0.25, β₁ = 0.81 and β₂ = 0.13. All are significant, as is the change in deviance relative to an intercept-only model.

To incorporate the impact of momentum I included a cumulative boost, b, to a team's Elo score after each consecutive victory. For example, the Elo score of a team that has won four successive matches would be boosted by 4b; however, as soon as the team fails to win a game the total boost is reset to zero.

The figure below shows the impact this form model has on the frequency of winning streaks. The plot shows three values of b: 0 (blue lines), 10 (green) and 20 (red). For comparison, home advantage is worth a boost of 65 to the home side's Elo score. The grey shaded regions indicate the range expected for the b=0 case (i.e. momentum has no impact on future outcomes).

A momentum boost of 20 would produce winning streaks at a rate that clearly exceeds the frequency they occur when momentum is not included in the model. In this scenario, four consecutive victories produce a temporary boost that exceeds the benefit of home advantage.

Figure A1: The proportion of teams that achieve a winning streak of a given length in a season, based on simulations in which form can have an impact on future results. The blue lines show the results when form has no impact on future results, the green and red lines show the results of simulations when each victory temporarily boosts a team's Elo score by 10 and 20 points, respectively (until the team fails to lose a match). The grey shaded region indicates the distribution of the null hypothesis, i.e. form has no impact on future results. An Elo boost of at least 20 points per win would be required to reject the null hypothesis at the 5% level.

Comments

Anonymous22 March 2019 at 04:42
I'm not convinced this is the correct method, grouping teams by their number of wins at the end of a season. Since total wins is something we only know at the end of a season, where we want to know how likely a team is to win the next given it won it's last 2,3,...,etc games. If I tell you that a team only won 14/38 games, no matter how much shuffling you'll almost never get 7+ streak. If I told you Man City won every game, then no matter how you shuffle the steak will be 38 games.

If we use your coin flip analogy, what you've looked at is "What is the probability of seeing X heads in a row given that I saw (X+N) heads after 38 flips", when what we really want to know is "What is the probability of seeing X heads given that the last throw the coin landed on heads".

To answer this question, we need to ask ourselves is a team's season like a markov chain ? If we let X_jt represent the outcome of team (win/lose/draw) j at time t, does it depend on X_j(t-1) ? is there a correlation ? Does it depend on X_j(t-2) etc .... We can all agree that outcomes of a football game are a stochastic process, the question is "Does it have a memory ? and if so, how long is it ?".
xavier27 March 2019 at 06:23
Why not just use the same methods used by the betting companies to get the odds.
They seem fairly accurate
Unknown7 May 2019 at 04:53
Thanks for the thought-provoking article!

I'm not quite sure about the assumption though: form, as I understand it, is a way to describe the most recent performances, while a streak only takes the results into account.

Wouldn't it make more sense to take xG/xGA into account?

EightyFivePoints