Exceeding Expected Goals

From blogs to the BBC, the concept of expected goals (or xG) has entered the mainstream media's lexicon. It has caught on because it's a useful concept; it's a useful concept because football is a low-scoring game. Chance (or luck) can be the difference between victory and defeat, a good day or an off-day. Expected goals, however, measures what would have happened on an average day.

It's a simple quantity to measure. Shots are assigned a number between the 0 or 1: the proportion of similar shots (from the same position, for example) that have resulted in a goal. This number, sometimes referred to as chance quality, is then added up over every shot taken by a team during a match (or a season) to calculate the number of goals that you would have 'expected' them to score.

Conventional xG models do not take into account the identity of the player taking the shot. You need a dataset comprising many thousands of shots to properly measure chance quality over all positions and situations, meaning that you must aggregate shots from many different players. Of course, you would expect attacking players of the quality of Harry Kane, Mo Salah or Eden Hazard to be more likely to convert a chance than less illustrious forwards, or the average defender or midfielder. Given that xG is calibrated based on the success rate of shots from mostly inferior players, we should therefore expect elite forwards, such as Kane, to outscore the xG total of their shots over a season. But to what extent?

In this post I describe a simple statistical method for measuring xG outperformance, the rate at which elite strikers outscore the expected goal total of their chances. The next section describes the methodology, then I'll dive into the results: who are the EPL's most efficient goalscorers?

Measuring outperformance

I define outperformance, $o_p$, for a player p as:
$\Large{o_p =  \frac{G_p}{xG_p }} \;$,

where $G_p$ is the number of goals scored by the player, and $xG_p$ is the expected goals tally of his shots (irrespective of whether they resulted in a goal or not). For an average player, $o_p$ should be close to 1; for a defender it is likely to be less than 1 and for an elite striker it should be significantly greater than 1.

The easiest way of measuring a player's outperformance is simply to divide the number of goals scored by the xG total of his chances over some period of time. However, the number and quality of chances varies enormously from player-to-player -- Harry Kane has taken many more shots than, say, Josh King -- and so the uncertainty in our measure of outperformance will be larger for some players than others. To confidently identify the players that consistently out-score their xG tally we need to measure the distribution of their outperformance.

I took an empirical Bayes approach to measuring outperformance, multiplying a prior (measured from the data) by a Poisson likelihood function to infer the posterior distribution of outperformance for each player. Specific details are given in the Appendix.

I measured chance quality using a model I described in an earlier post. This model, which was calibrated on over 15000 shots from Stratagem's event data, takes into account not only the distance and angle to goal, but also the number of intervening defenders and the defensive pressure exerted on the shot-taker.

The EPL's most efficient goal-scorers

Figure 1 below shows the estimated outperformance for all current EPL players that took at least 100 shots over the 2016/17 and 2017/18 seasons (excluding free-kicks, for which the xG model is not well calibrated). The red diamond shows the posterior mean; the blue error bars indicate +-1 standard deviation around the mean.

Figure 1. The posterior distribution of outperformance -- the ratio of goals to expected goals -- for current EPL players that took at least 100 shots over the 16/17 and 17/18 seasons. The red diamond indicates the mean; the blue error bars $\pm1$ s.d. 

Eden Hazard tops the list with an estimated outperformance of $1.34\pm0.21$, indicating that, on average, he has scored around 33% more goals than an average player would have from the same chances. In the 2016/17 and 2017/18 he scored 28 goals from 127 shots (excluding free-kicks), substantially exceeding his xG total of 18.4. Other xG models have also found that Hazard consistently outperforms his xG. 

Mo Salah and Heung-Min Son have also significantly outperformed, Salah scoring 32 from an xG tally of 21.8, and Son 26 goals from an xG tally of 17.4. It is interesting that the top three players -- Son, Salah and Hazard -- are all pacey attacking midfielders, scoring the majority of their goals from open play. 

Only five players have an outperformance significantly greater than 1.0 at the 90% confidence level: the three players above, plus Romelu Lukaku and Harry Kane. While Josh King is ranked slightly higher than Kane or Lukaku, the posterior distribution of his outperformance is significantly broader as there is less data (i.e., fewer shots) to assess him by.

Christian Benteke is, by far, the biggest under-performer, and the only player with a mean outperformance significantly below 1. He has clearly had a difficult couple of seasons, having scored only 18 goals from an xG tally of 26.7. Benteke's problems appear to be associated with shots taken with his favoured right foot: while his goals are reasonably consistent with his xG totals for his left-footed shots and headers, he has scored only 7 goals from an xG total of 14.6 from his right-footed shots in the last two seasons. 

Perhaps one surprising result is the performance of Sergio Aguero. Despite scoring 40 goals over the 2016/17 and 2017/18 seasons (excluding free kicks), Aguero has an estimated outperformance of almost exactly 1, his 218 shots adding up to an xG total of 40.8. He's also somewhat one-footed, with his left-footed goals substantially trailing the xG tally of his left-footed shots. Expected goals data on understat.com indicates that, over the last 5 seasons, Aguero has scored 95 goals from an xG tally of 90.3, implying an outperformance of 1.05: significantly less than Kane or Hazard over the same period. 

Individual Expectations

Of course, outperformance is a solely measure of chance conversion rate: it has nothing to say about the ability of players to create chances, either for themselves or their teammates. Nevertheless, it provides a machinery for personalizing expected goals, enabling player-specific assessments of finishing skill in limited sample sizes. It would be interesting to measure outperformance over the course of a player's career, generating aging curves that are independent of the quality of the teams that he played in. It could also be used to improve xG-based predictions of match results -- a topic that I will return to in a future post.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform.


Appendix: A Bayesian estimate of outperformance.

The posterior distribution for the outperformance $o_p$ of a player $p$ is given by:

$p(o_p | G_p ; xG_p ) \propto {\displaystyle \prod_{i=1}^{n_p}p(G_{i,p} |  o ; xG_p )} p(o)$ ,

where $G_{i,p}$ is the number of goals scored by the player in match $i$ of the $n_p$ matches he played during 16/17 and 17/18 seasons, and $xG_p$ is his average expected goals (per game) based on his shots in those matches, measured using the model described here. The prior, $p(o)$, is a gamma distribution with mean 1.06 and standard deviation of 0.3, measured from the distribution of $G/xG$ for all EPL players that took at least 50 shots over the 16/17 and 17/18 seasons. The mean is greater than 1.0 because the 50 shot threshold removes defensive players from the sample, which introduces a selection bias.

$p(G_{i,p} | o ; xG_p )$ is the Poisson likelihood of the player scoring $G_i$ goals in the match conditional on the product of $o$ and $xG_p$, where $xG_p$ is taken to be a parameter (rather than a variable) for each player. The gamma distribution is the conjugate prior for a Poisson likelihood function, so the posterior distribution of player outperformance is also a gamma distribution.

In practice, outperformance is estimated for each player by rescaling the prior to $p(xG_p o)$ -- treating $xG_p$ as a scaling factor -- with the posterior distribution rescaled to the distribution of $o_p$ by dividing $xG_p$ out again.


  1. Insightful once again - I always look forward to reading this blog! Thanks from Belgium :^)

  2. Hi and thank you for a great article.

    I wonder how Figure 1 would look if you also excluded headers from the data?

    I believe that it is harder to "beat" xG with headers, and therefore most attackers on the top of the list will either be really, really good at heading the ball (Lukaku, Kane, Alli), or they really seldom try headers at all (Hazard, Salah, Son, Mane, Mahrez).

    1. Thanks, Hans. The ordering of the players in Figure 1 doesn't change much when I exclude headers, but their overperformance does tend to increase slightly. I suspect that's because attacking players are not necessarily the best finishers when it comes to headers -- about 10% of all goals are scored by defenders, but they account for nearly a third of headed goals.


Post a Comment

Popular posts from this blog

Using Data to Analyse Team Formations

Structure in football: putting formations into context

From Sessegnon to Sanchez: How to calculate the correct market salary for EPL players.