Leagues within a League: How the EPL Table Evolves

We’re nearly a quarter of the way through the EPL season and the league already has a familiar feel to it. Manchester City are top, Arsenal are above Spurs, and Sunderland anchor the table having failed to win a single game so far. There is clearly a lot of football still to be played, but does the table already resemble how it’ll look come the end of May?

Conventional wisdom tells us that the turn of the year is a crucial period. By the beginning of January we are supposed to have a good idea of how things are shaping up. In 9 of the last 20 EPL seasons, the team that was top at January went on to win the league. 56% of teams in the bottom three on new year’s day will be relegated. However, you get pretty much the same results if you measure these stats at the beginning of December or the beginning of February, so perhaps we don’t learn that much over the Christmas period after all.

In this post I’m going to look back over the last 20 seasons to investigate how the league table actually evolves over a season and, in particular, when in the season we start to have a reasonable picture of where teams might finish.

Rank Correlations

A good starting point is to measure the correlation between the final league positions and those at some earlier point in the season. Essentially you’re measuring the degree to which the orderings of the teams are the same. If the team rankings were identical, we’d measure a correlation of 1; if they were completely different we’d expect the correlation to be close to zero.

Figure 1 shows the correlations between the league rankings after each game week and the rankings at the end of the season, for the last 20 EPL seasons. The grey lines show the correlations for the individual seasons; the red line shows the average.

Figure 1: The correlation between the league rankings after each gameweek and the final rankings at the end of the season. Grey lines show results for each of the last 20 EPL seasons, the red line shows the average correlation for each gameweek.

The most striking thing about this plot is that the correlation rises so quickly at the beginning of the season. You get to an average correlation of 0.8  - which is very high[1] - by the 12th round of games. There’s some variation from season-to-season, of course, but the general picture is always the same: we learn rapidly in the first 12 or so games, and then at a slower, even pace over the rest of the season.

This implies is that we know quite a lot about how the final league rankings will look after just a third of the season. But there’s no mantra that states ‘top in Halloween, champions in May’, so why is the correlation so high so soon, and what does it actually mean?

Leagues in leagues

I think that the explanation is provided by what is sometimes referred to as the ‘mini-leagues’. The idea is that the EPL can be broken down into three sub-leagues: those teams competing to finish in the top four (the champions league places), those struggling at the bottom, and those left in the middle fighting for neither the riches of the champions league nor for their survival. 

Figure 2 demonstrates that these mini-leagues are already established early in the season. It shows the probability of each team finishing in the top 4 (red line) or bottom 3 (blue lines), based on their ranking after their 12th game. The results were calculated from the last 20 EPL seasons.

Figure 2: The probability of finishing in the top four (red line) or bottom three (blue line) based on league position after 12 games. The red, white and blue shaded regions indicate the three ‘mini-leagues’ within the EPL.

The red-shaded region shows the ‘top’ mini-league: the teams with a high chance of finishing in the champions league places. Teams below 7th place are unlikely to break into this elite group. Similarly, teams placed 14th or above are probably not going to be relegated; therefore, those between 7th and 14th position are in the middle ‘mini-league’. Teams in the last third seem doomed to be fighting relegation at the end of the season: they make up the final mini-league.

The high correlation we observed after twelve games in Figure 1 is consequence of the mini-leagues. It’s entirely what you’d expect to measure from a table that is already clustered into three groups – top, middle and bottom – but where the final ordering within each group is still to be determined.

I’m not suggesting that membership of the mini-leagues is set in stone – there’s clearly promotion and relegation between them throughout the season (and yo-yo teams) – but by November there is a hierarchy in place. Even at this relatively early stage of the season, most teams will have a reasonable idea of which of third of the table they are likely to end up in.

Finally, awareness of this may also explain the recent increase in the number of managers getting sacked early in the season. Last year three EPL managers lost their jobs before the end of October and we've already lost one this season. If Mourinho doesn't find himself in the top eight after the next few games, the pressure may ramp up several notches higher.


Thanks to David Shaw for comments.

[1] And certainly significant.


Popular posts from this blog

Using Data to Analyse Team Formations

Structure in football: putting formations into context

From Sessegnon to Sanchez: How to calculate the correct market salary for EPL players.