Blog List.

Ranking Football Teams Using Google's Page Rank Algorithm

Introduction

I've discussed various techniques for ranking football teams on my blog before, such as using Massey Ratings to account for strength of schedule, but I've not covered the most famous ranking algorithm of them all yet, Google's PageRank.

Google PageRank

The PageRank algorithm (Figure One) was initially developed by Larry Page and Sergey Brin back in the mid nineties whilst working on a research project at Stanford University. When Page and Brin later founded Google, PageRank became the cornerstone for how their search engine ranked webpages and determined the most relevant set of results for a user's query.

\(PR(A) = (1-d) + d(PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))\)

where:

PR(A) is the PageRank of page A

PR(Ti) is the PageRank of pages Ti which link to page A

C(Ti) is the number of outbound links on page Ti

d is a damping factor ranging between 0-1*

Figure One: Google's PageRank Algorithm

Google's search algorithm has evolved considerably over the years since then, with updates such as Panda, Hummingbird and RankBrain brought in to help deal with content farms and to better understand ambiguous queries. However, PageRank still remains the central method for determining a web page's rank in the search results.

How Does It Work?

The PageRank algorithm essentially counts links on the web and treats them like votes of support. The more links there are leading to a specific web page then the more votes there are for that page being of high quality. Not all votes are counted equally though. Votes coming from pages themselves considered high quality count for much more than from pages with few links leading to them.

This puts us in a bit of a tricky situation though as it means the rank of a web page is dependant on the ranks of all the pages linking to it, which are themselves dependant on the ranks of all the pages pointing to them and so on. Plus, when you factor in that two web pages can both link to each other then you end up with enough circularity to make this initially seem an impossible calculation.

It turns out that this problem can be solved fairly easily though through brute-force iteration. We start off by giving each webpage a default score we can use to start calculating its PageRank and then iteratively move through the system updating all the pages' ranks based on the ranks of the all pages linking back to them until the whole system converges and the ranks settle down to their true values (or at least close enough we don't mind the remaining error).

If you want to be more elegant though, you can actually skip this brute force approach and solve the whole system using linear algebra but I'm going to leave that for a future article.

How Does This Apply To Football?

Instead of using using links as votes of support from one web page to another we can use goals as votes of support from one team to another, where the more goals a team concedes then the stronger their vote of support for the opposition that scored against them.

A handy feature of the PageRank algorithm is that web pages only get one vote that ends up being shared out equally between all the other pages they are linking to. When we apply this to football it means that the more goals you score against a team, the greater the share of their vote you receive.

Results

Table One below shows the rankings for the top 100 European teams over the past year based on domestic league and European fixtures (domestic cup competitions are not currently included) as calculated using Google's PageRank algorithm.

Rank Team
1 paris saint-germain
2 fc barcelona
3 bayern münchen
4 atlético madrid
5 borussia dortmund
6 real madrid
7 sevilla fc
8 manchester city
9 arsenal fc
10 olympique lyon
11 valencia cf
12 vfl wolfsburg
13 tottenham hotspur
14 juventus
15 sl benfica
16 villarreal cf
17 cska moskva
18 athletic bilbao
19 chelsea fc
20 shakhtar donetsk
21 as roma
22 celta vigo
23 inter
24 ssc napoli
25 bayer leverkusen
26 as monaco
27 bor. mönchengladbach
28 stade reims
29 as saint-étienne
30 lazio roma
31 fc lorient
32 zenit st. petersburg
33 fk krasnodar
34 1899 hoffenheim
35 toulouse fc
36 acf fiorentina
37 everton fc
38 hellas verona
39 malmö ff
40 sampdoria
41 olympique marseille
42 lille osc
43 rsc anderlecht
44 celtic fc
45 rb salzburg
46 dinamo moskva
47 olympiakos piräus
48 real sociedad
49 afc ajax
50 liverpool fc
51 montpellier hsc
52 manchester united
53 fc porto
54 fc augsburg
55 psv eindhoven
56 club brugge kv
57 leicester city
58 asteras tripolis
59 west ham united
60 sassuolo calcio
61 southampton fc
62 girondins bordeaux
63 crystal palace
64 lokomotiv moskva
65 brøndby if
66 molde fk
67 dinamo kiev
68 fc københavn
69 fenerbahçe
70 paok saloniki
71 ogc nice
72 fc schalke 04
73 werder bremen
74 stoke city
75 terek grozniy
76 sporting cp
77 espanyol barcelona
78 kaa gent
79 bsc young boys
80 torino fc
81 galatasaray
82 dnipro dnipropetrovsk
83 hamburger sv
84 stade rennes
85 levante ud
86 fc twente
87 rosenborg bk
88 gd estoril
89 ac milan
90 fc basel
91 fc nantes
92 hannover 96
93 panathinaikos
94 az alkmaar
95 sunderland afc
96 sm caen
97 rubin kazan
98 fk ural
99 1. fsv mainz 05
100 standard liège

Table One: Top 100 European Teams As Ranked By The Google PageRank Algorithm

The initial results seem pretty feasible, with the top five spots comprising Paris Saint-Germain, Barcelona, Bayern München, Atlético Madrid and Borussia Dortmund.

Olympique Lyon are somewhat of a surprise though in position ten but they beat Paris Saint-Germain earlier in the season and have been playing in the Champions League so perhaps they are better than my pre-conceptions? FK Krasnodar also appear higher than I was expecting but looking back at their results they did quite well in the Europa League this season, including beating Borussia Dortmund, so it's perhaps not unreasonable for them to appear in the top half of the rankings too.

The are a number of ideas for improving this concept further, such as adding a decay into the data so more recent results carry greater importance in the rankings or adding in home field advantage (which is currently missing), so no doubt there'll be an update to this blog in the future once I've refined things.

Comments

Get In Touch!

Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.

Thanks!

About

Pena.lt/y is a site dedicated to football analytics. You'll find lots of research, tutorials and examples on the blog and on GitHub.

Social Links

Get In Touch

You can contact pena.lt/y through the website here