I recently gave a presentation to the Manchester R Users' Group discussing how to predict football results using R. My presentation gave a brief overview of how to create a Poisson model in R and apply the Dixon and Coles adjustment to it to account for dependance in the scores.
The slides are below for anybody interested and contain enough example R code to get you started. Unfortunately, there are no slide notes though but hopefully the slides should be descriptive enough to get you going!
Example code from the presentation can be found at my GitHub account
Anonymous - November 3, 2014
Can you explain the +/- Dixon/Coles adjustment?
Martin Eastwood - November 3, 2014
Sure, if you are interested in the theory behind it better than I recommend reading Dixon And Coles paper where they propose their adjustment to account for dependency between the scores – http://www.math.ku.dk/~rolf/teaching/thesis/DixonColes.pdf
Peter - November 4, 2014
Is it possible for you to share the R code, how you implemented this adjustment in the model?
Martin Eastwood - November 4, 2014
Hi Peter – I’m not planing on adding the Dixon and Coles adjustment to the code as it was intended just as a simple demonstration rather than a full model. The adjustment requires carrying out an optimisation to estimate rho, which in turns requires a cost function etc so it increases the complexity of the example considerably.
Jonas - November 3, 2014
Do you apply the Dixon & Coles adjustment to the probabilities you got from the independent goals model? Do you estimate the rho parameter independently of the other parameters then?
Martin Eastwood - November 3, 2014
Hi Jonas, yes that’s right. You’ll need to run an optimisation to get rho and then use that to modify the probabilities from the Poisson model.
Jonas - November 3, 2014
That’s a neat trick, probably much easier than to fit the comlete Dixon & Coles model :)
Seth Dobson - November 4, 2014
Hi Martin! Thanks for posting this. Looking forward to trying it out on the SPFL.
Have you ever tried the fbRanks package in R?
Martin Eastwood - November 4, 2014
I didn’t even know it existing, will take a look!
Ian - March 12, 2014
After running the glm function to create the model, am I correct in saying that the values under "Estimate" are the attack/defence strengths? So "teamAston Villa" would have be an attack strength of -0.53733 and "opponentAston Villa" would be a defence strength of 0.36923?
Is there just an overall attack/defence strength and not a home attack/defence strength and an away attack/defence strength?
Martin Eastwood - March 12, 2014
IIRC yes this example gives you an attack/defence strength per team plus an estimate for home field advantage too.
William - August 09, 2016
Hi Martin,
I have written a dixon-coles model in python using scipy.optimize.minimize to minimize the log-likelihood as explained in http://www.math.ku.dk/~rolf/teaching/thesis/DixonColes.pdf and calculate the parameters however i appear to be getting defense parameters that are too high so my model over predicts the number of goals, i was wondering if you also used scipy.optimize.minimize or if you minimized the likelihood function in a different way?
Martin Eastwood - August 13, 2016
Hi William - I use R's optim function but it should be the same thing. Sounds like their may be a bug in your code somewhere?
James - January 09, 2017
Thanks for your work. It's truly great. However, I've encountered this error in the presentation named above
df <- apply(df, 2, function(row){ + data.frame(team=c(row[\'HomeTeam\'], row[\'AwayTeam\']), + opponent=c(row[\'AwayTeam\'], row[\'HomeTeam\']), + goals=c(row[\'FTHG\'], row[\'FTAG\']), + home=c(1, 0)) + }) Error in apply(df, 2, function(row) { : dim(X) must have a positive length
Martin Eastwood - January 09, 2017
Hi James - that error typically appears in R when a dataframe gets coerced into a flat vector and therefore is just considered an array rather than an object with columns and rows that the apply function can be used against
Jordan - February 15, 2017
Hi Martin,
I was wondering if you have got around to implementing the Dixon and Coles method in R yet. I am attempting it myself currently but am struggling to understand what needs to go into the optim function. Does the tau need to be included when calculating the alpha and beta values and are they just the same as in the Poisson/Maher model and just the tau comes involved when predicting the outcome of the 0-0, 1-0, 0-1 and 1-1 scores?
Thanks
Martin - February 15, 2017
Hi Jordan
It all depends on how you create you model, you can combine everything together and calculate it all at once or split it out. I tend to go with a two stage approach and get alpha and beta first, then rho and lambda in the second stage of optimisation. I've not read the Maher paper for a while but if I remember correctly alpha and beta are essentially the same as in Dixon and Coles, except that Maher have separate home / away parameters while Dixon and Coles use a home advantage parameter instead. And yes, Tau is just used to modify the low-scoring matches.
Hope that helps!
Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.
Thanks!