Blog List.

Sharing xG Using Multi-touch Attribution Modelling

Introduction

A question that often comes up in data science is how you determine the performance of different marketing channels. For example, somebody might land on your website by clicking an advert on Google. They may find something they want to buy but leave the site only to return a few minutes later via an affiliate link with a discount code. They then add something to their shopping basket but never complete the purchase so you send them an email to remind them and finally they convert.

Which channel drove the conversion here, was it the final channel which they converted from? Was it the first one as it brought them onto the site in the first place? Or was it one of the other ones in-between that helped guide them to the conversion?

Okay, this is a blog about football analytics so why I am writing about marketing? Well, consider this next example - Aymeric Laporte tackles the opposition's striker to win the ball and passes it to Fernandinho. Fernandinho then plays a short ball to Bernardo Silva, who then knocks it out wide to Raheem Sterling. Raz runs down the wing, beats the fullback and crosses the ball to Sergio Agüero who then scores. Which player was responsible for the goal here, was it Laporte for winning possession in the first place, was it Agüero since he scored or was it one of the players in-between who helped move the ball down the pitch to Agüero so he could score the goal?

Hopefully you've spotted that both examples are effectively the same problem - how do you determine the value of all the events leading up to a conversion?

Heuristics

Traditionally, this has been 'solved' using simple heuristics. For example the last player who touches the ball is awarded the goal or the last marketing channel a customer interacts with gets the credit for their purchase. Depending on the metric being measured, people will occasionally give credit to the first event in the sequence instead, or perhaps share everything out equally because it sounds fairer. If they are feeling really adventurous they may even apply some sort of curve to it but there's no real scientific rationale being used, it's typically just somebody's personal preference.

Multi-touch Attribution Models

A more scientific approach taken from the world of marketing analytics is to use multi-touch attribution modelling to quantify the importance of each event in the sequence and assign a fractional amount of credit to it based on how much it drives the final outcome.

There are lots of different ways of doing this, including using Markov chains. These are mathematical systems that can be used to model the probability of sequences transitioning from one event to another. For example, the probability of a customer clicking through to a company's home page from a tweet, followed by the probability of that being their last interaction (and therefore failing to convert) or the probability they move onto some other interaction with the company.

We can apply this same principle to football, e.g. if Sergio Agüero is in possession of the football then what is the probability he passes, what is the probability he scores, what is the probability the sequence of possession ends with him?

Attributing xG Using Markov Chains

To apply multi-touch attribution modelling to football and Expected Goals (xG) I created a dataset of possession sequences from the Premier League where each sequence contained the players involved plus a True / False flag terminating the sequence to designate whether it ended with a shot or not and used it to train a Markov Chain.

[Ederson, Laporte, Stones, Sterling, True]

Figure 1: Example possession sequence used to train the Markov chain

The trained Markov Chain was then used to simulate possessions by picking a starting player and taking a random walk through the probabilities until it hit a True / False event. From here we can calculate the importance of each player in terms of shot generation - essentially, each possession sequence's propensity to lead to a shot changes as different players become involved. These differences in shot propensity can then be used to reattribute the xG from a given shot across all the players in the possession leading up to it - the more important a player is the more xG is awarded to them even if they didn't take the shot.

Results

The table below shows the attributed xG (axG) for Manchester City's 2018/2019 season. The first thing to note is that the most attacking players, such as Agüero and Jesus, have lower axG compared with xG. This is to be expected as traditional xG models will credit them with 100% of the value of the shot whereas axG takes some of that xG and reassigns it to the players involved in the build up play. They still come out with the highest axG scores overall though as these are the players taking the majority of the shots generating the xG so their presence in the possession sequences is important in terms of shot generation.

name xg axg axg:xg
sergio agüero 19.92 15.86 0.8
raheem sterling 13.14 12.2 0.93
david silva 8.07 8.79 1.09
bernardo silva 6.74 7.78 1.15
gabriel jesus 8.76 7.22 0.82
leroy sané 5.52 5.84 1.06
riyad mahrez 5.72 5.37 0.94
ilkay gündogan 4.52 4.74 1.05
aymeric laporte 3.36 4.23 1.26
fernandinho 1.74 2.58 1.48
kevin de bruyne 1.99 2.26 1.14
kyle walker 0.45 1.66 3.69
nicolás otamendi 1.39 1.4 1.01
phil foden 1.72 1.06 0.62
john stones 0.51 0.92 1.8
oleksandr zinchenko 0.19 0.9 4.74
vincent kompany 0.35 0.59 1.69
danilo 0.44 0.54 1.23
benjamin mendy 0.2 0.36 1.8
fabian delph 0.08 0.29 3.62
ederson 0 0.23

Table 1: Manchester City axG 2018/2019

Looking at the ratio of xG to axG shows that the biggest beneficiaries for Manchester City are their defenders, particularly the fullbacks. Kyle Walker has a 3.7 fold increase in xG credited to him and Oleksandr Zinchenko has a 4.7 fold increase. Laporte and Fernandinho also have noticeable increases too reflecting their importance in City's build up play.

The Importance of Full Backs

It's not just Manchester City's full backs who do well when we reattribute xG, it's pretty common across all other teams too as shown in the table below. These players are typically out wide where they can't take many shots but are important for getting the ball into the danger zones for the attacking players. It's still small volumes of xG compared with attackers but once we start accounting for fullbacks' involvement in the build up play then their xG numbers increase noticeably.

team_name name xg axg axg:xg
Burnley charlie taylor 0.09 0.73 8.11
Arsenal stephan lichtsteiner 0.02 0.15 7.5
Bournemouth simon francis 0.03 0.22 7.33
West Ham arthur masuaku 0.08 0.42 5.25
Manchester City oleksandr zinchenko 0.19 0.9 4.74
Everton lucas digne 0.46 2.16 4.7
Manchester City kyle walker 0.45 1.66 3.69
Arsenal sead kolasinac 0.45 1.63 3.62
Tottenham kieran trippier 0.4 1.41 3.52
Bournemouth diego rico 0.08 0.28 3.5
Liverpool trent alexander-arnold 0.66 2.07 3.14
Manchester United ashley young 0.47 1.42 3.02
West Ham pablo zabaleta 0.17 0.51 3
Watford josé holebas 0.6 1.79 2.98
Liverpool andrew robertson 1.18 2.83 2.4
Newcastle United javier manquillo 0.07 0.16 2.29
Burnley matthew lowton 0.33 0.75 2.27
Leicester ricardo pereira 1.35 2.81 2.08
Leicester ben chilwell 1.05 2.14 2.04
Bournemouth nathaniel clyne 0.07 0.14 2
Manchester United luke shaw 1.1 2.15 1.95

Table 2: Fullback axG 2018/2019

Top 25 Players by axG

team_name name xg axg axg:xg
Liverpool mohamed salah 18.83 18.99 1.01
Manchester United paul pogba 15.6 17.06 1.09
Arsenal pierre-emerick aubameyang 20.84 16.92 0.81
Manchester City sergio agüero 19.92 15.86 0.8
Fulham aleksandar mitrovic 15.14 15.52 1.03
Wolverhampton Wanderers raúl jiménez 15.46 14.48 0.94
Leicester jamie vardy 15.59 13.93 0.89
Chelsea eden hazard 10.5 13.57 1.29
Brighton glenn murray 11.87 13.11 1.1
Tottenham harry kane 13.64 12.66 0.93
Everton gylfi sigurdsson 11.94 12.41 1.04
Bournemouth joshua king 13.07 12.34 0.94
Newcastle United salomón rondón 11.86 12.32 1.04
Manchester City raheem sterling 13.14 12.2 0.93
Liverpool sadio mané 14.54 11.7 0.8
Arsenal alexandre lacazette 12.3 11.23 0.91
Liverpool roberto firmino 11.96 11.04 0.92
Burnley ashley barnes 11.25 10.83 0.96
Bournemouth callum wilson 12.24 10.18 0.83
Southampton danny ings 9.51 10 1.05
Watford troy deeney 9.56 9.71 1.02
Everton richarlison 9.37 8.91 0.95
Manchester City david silva 8.07 8.79 1.09
Manchester United romelu lukaku 10.18 8.57 0.84
Manchester City bernardo silva 6.74 7.78 1.15

Table 3: Top 25 Players by axG 2018/2019

Conclusions

It's worth clarifying that this is not an expected possession value (EPV) model. It's taking the output from a shots-based xG model and redistributing it across the players involved in the build up play based on the propensity of a shot occurring from that particular group of players.

In many ways, the output of the model is closer to a Shapley Value in that it's looking at all the different combinations of players in the possession sequences to quantify how much each player contributed to the propensity of a shot occurring. In fact, this is something I want to play around with further to see what other uses it has.

Whilst the approach described here is perhaps not as complex as some EPV models, it has a couple of advantages. First of all, it's quick to process the data, but most importantly it's easy to explain to stakeholders. This isn't some complicated and uninterpretable black box that senior management need to take a leap of faith to trust, multi-touch attribution is just sharing out xG more fairly based on the probabilities of shots occurring during the sequence of play and for me that's a big win. A simpler approach that people can relate to often has a bigger impact in a business than a bigger model that's much more complicated to get buy in for.

Multi-touch attribution has pretty much achieved this in marketing analytics now, it's a significant improvement from the simple heuristics without being so complex it scares the C-Suite off. Perhaps it could also play a similar role in football as a step up from xG without ostracizing the more data-reluctant coaches?

Comments

Get In Touch!

Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.

Thanks!

About

Pena.lt/y is a site dedicated to football analytics. You'll find lots of research, tutorials and examples on the blog and on GitHub.

Social Links

Get In Touch

You can contact pena.lt/y through the website here