Expected Goals: Foot Shots Versus Headers

Introduction

My last article on expected goals introduced the concept of using exponential decay to estimate the probability of scoring based on the shooter’s distance from the goal. The article received lots of feedback (thanks everyone!!), with a couple of common comments standing out that I wanted to address.

Simplifying The Model

One common theme was whether the model was at risk of over-fitting and this is certainly something I was concerned about myself. In fact, I have since simplified the model to the equation below to help minimise this risk:

$expg=exp(-distance/a)$

Figure 1: Simplified Expected Goals Equation

As well as reducing the complexity of the model and making it easier to calculate the expected goals, the new equation has fewer parameters so the potential for overfitting is lower. The correlation between actual / expected goals has fallen slightly from 0.98 to 0.97 but the advantages of the simpler equation far outweigh such a minimal change.

Headers Versus Foot Shots

Another common question was whether it was important to split out headers and foot shots into separate models as the previous articles have so far ignored headers due to lack of data.

To investigate this I have been busy all summer collecting more shot data. I’m up to 45,000 shots in total now, including around 7,500 headers so I’m at the point where I’m happy to start the preliminary work comparing foot / headed shots although I certainly want more headers before drawing any definite conclusions.

I’ve run through all the curve fitting again for both headers and foot shots and plotted the resulting probability curves in Figure Two below.

Pelican

Figure 2: Expected Goals: Shots Versus Headers

As you can see, headers have a noticeably lower chance of leading to a goal. The gap between head and foot shots appears largest around the ten metre mark, where foot shots have pretty much twice the probability of scoring. By 22 metres the chance of scoring from a header is virtually zero, while foot shots don’t reach this level until around 40 metres out.

Conclusions

But is this difference significant and do we actually need to bother creating separate expected goals models for headers and foot shots?

Well, if we compare the two probability curves against each other then the p value comes out at 0.064. Typically we take p values of 0.05 or lower to signify significance so by that count there is no real difference between the two.

However, p values should never be about some absolute cut off where <= 0.05 equals significance and everything else can just be ignored.

Having a value close to significance is suggestive that there may be a real difference there, especially when there is still a limited data size for headers so it’s certainly possible that headers and foot shots will warrant separate models. Luckily with the current equation this is really simple to do as we just need to alter the value of a as shown below in the appendix. This is an area I’ll be exploring in more detail as I add more headers to my database.

Appendix: Using the Expected Goals Model

To use the expected goals model you just need two numbers:

x = distance from goal in metres along x axis

y = distance from centre of goal in metres along y axis

These can then be used to calculate the total distance the shot is taken from:

$distance=sqrt(x^2+y^2)$

The expected goals for the shot is then just:

$expected goals=exp(-distance/a)$

where a = 4.4 for headers and 7.1 for foot shots

Example

Here’s an example for a player taking a header from the penalty spot.

x = 11 as penalty spots are roughly 11 metres from the goals (equal to 12 yards)

y = 0 as penalty spots should be level with the centre of the goal

$distance=sqrt(11^2+0^2)=11$

$expected goals=exp(-11/4.4)=0.08$

So on average, a header from the penalty spot would be worth around 0.08 goals.

Easy, just don’t forget you need to use negative distance inside the exponential!