Final Table Predictions for the EPL

In a previous post I looked at how the EPL league table evolves over a season, showing that we already have a decent idea of how the final league table will look after just a third of the season.

I’ve now taken that analysis a step further and built a simple model for predicting the total number of points each team will accumulate over the season (and therefore their final rankings). What follows is a short summary of how the model works; I've provided more technical detail at the end.

Season simulations

Each team starts with their current points total. I then work my way through the fixture schedule (currently 260 matches), simulating the outcome of each game. Results are generated based on the Elo rankings of each team – which I update after each simulated match – and the benefits of home advantage (scroll down to the last section for more details). At the end of the ‘season’, I tally up the final points totals for each team.

This process is repeated 10,000 times to evaluate the range of points that each team ends up on; I then make a final league table based on their averages. The probability of each team finishing the season as champions, in the top four or bottom three is calculated based on the frequency at which it occurs within the 10,000 runs.

Final table predictions 

Using all the results to date, the projected EPL table looks like this.

The box plots indicate the distribution of each team's points totals over the 10,000 simulated seasons. The green bars indicate the 25th to 75th percentiles and the dashed lines (‘whiskers’) the 5th to 95th percentiles. For example, in 50% of the simulations Man City finish on between 71 and 81 points and in 90% of the simulations they accumulate between 63 and 89 points. The vertical line in the middle of the green bars shows the median[1]. The numbers to the right of the plot show the probability of each team: 
a) winning the title (Ti);
b) finishing in the champions league spots (CL);
c) being relegated (rel).

You can see that the table is bunched into three groups: those with a decent chance of making it into the champions league, the solidly mid-table teams and the remainder at the bottom. Let’s look at each group in turn.

Top Group: This group contains Man City, Chelsea, Liverpool, Arsenal, Spurs and, if we’re being generous, Man United. These are the teams with a fighting chance of finishing in the top four. City, Chelsea, Liverpool and Arsenal are so tightly bunched they are basically indistinguishable: you can’t really predict which of them will win the league. However, there is a 93% probability that it’ll be one of those four. Spurs go on to be champions on only 6% of the simulations and United in less than 1%. Indeed, United finish in the top four only 17% of the time – roughly a 1 in 6 chance.

Middle Group: This group includes Southampton, Leicester, Everton, Watford and West Brom. The distribution of their points totals indicate that they are likely to collect more than 40 points, but less than 60. That makes them reasonably safe from relegation but unlikely to finish in the top four (last season, the 4th placed team – Man City – finished with 66 points). They can afford to really focus on the cup competitions (and for Leicester, the champions league).

Bottom Group: Finally, we have the remaining nine teams, from Stoke down to Hull. According to my simulations, these teams have at least a 10% chance of being relegated. The bottom 5 in particular collect less than 40 points on average and are relegated in at least a third of the simulations, with Sunderland and Hull going down more often than not. 

Next Steps

My plan is to update this table after each round of EPL games (which you can find here). Hopefully, we should see the table beginning to crystallize as the season progresses, with the range of points totals narrowing and thus the final league positions becoming easier to predict.

There is also plenty of information that could be added. The simulations know nothing about injuries and suspensions, future transfers, managerial changes and grudge matches. They also do not take into account fixture congestion and cup participation. I’m going to investigate some of these issues and incorporate anything that reliably adds new predictive information.


Specific Model Details

This section takes a look at what is going on under the hood in a bit more detail.

The core of the calculation is the method for simulating match outcomes. For each match, the number of goals scored by a team is drawn from a Poisson distribution with the mean, μ, given by a simple linear model:

There are two predictors in the model: X1 = ΔElo/400, the difference between the team's Elo score and their opponents', and X2 is a binary home/away indictor equal to 1 for the home team and -1 for the away team. Note that Elo scores are explicitly designed to be predictive of match outcomes. The initial Elo score for each team is taken from; after each simulated fixture the Elo scores are updated using the procedure described here.

The beta coefficients are determined via linear regression using all matches for the seasons 2011/12 to 2015/16, obtaining values β1 = 0.26, β2 = 0.71, β3 = 0.13. All are highly significant, as is the change in deviance relative to an intercept-only model. Running the regression on earlier seasons obtains similar results. 

How good are the match predictions?

A good way of answering this question is to compare the match outcome forecasts generated by this model with the probabilities implied by bookmaker's betting odds. There are a number of different metrics you can use to compare forecast accuracy, I’ve chosen two: the Brier score and the geometric mean of the probabilities of the actual match outcomes. It turns out the Poisson model and the bookies do equally well: they have identical scores for both metrics (0.61 for the Brier score and 0.36 for the average probability - consistent with what this analysis found).

The plot below shows that there is a strong relationship between the predicted probability of home wins, away wins and draws for the EightyFivePoints model and the bookmaker’s forecasts (note that I've 'renormalised' the bookmaker's odds such that the outcome probabilities sum to 1 for any given match). This makes me think that they’re doing something quite similar, with a few extra bells and whistles.

Comparison of probabilities assigned to ‘home win’, ‘away win’ and ‘draw’ by the Poisson model and those implied by bookmakers odds. All EPL matches from the 2011/12 to 2015/16 seasons are plotted.

One stand out feature is that draws are never the favoured outcome. This suggests that one of the keys to improving the accuracy of match outcome predictions is to better identify when draws are the most likely outcome. After all, more than a quarter of games end in draws.

[1] Which happens to be close to the mean, so there isn’t much skew.


Popular posts from this blog

Using Data to Analyse Team Formations

Structure in football: putting formations into context

Exceeding Expected Goals