With the transfer window well under way I thought I'd discuss my footballer recommendation engine for identifying potential transfer targets.
Recommendation engines have become increasingly popular over the past few years as a way for companies to personalize content. Whether it's Amazon recommending books, Twitter suggesting people to follow or Netflix suggesting what films to watch, they are typically generating recommendations using versions of a technique called Collaborative Filtering.
Collaborative Filtering takes information about users' behaviors and uses it to calculate their personal preferences. We then assume that if users exhibit the same behaviors they'll likely agree about other things. For example, if a group of people like watching the same films as you, then you'll probably like watching the films they've seen that you've not.
There are countless different approaches for carrying out collaborative filtering, each of which add their own unique flavor to the recommendations they produce. Recommendation engines have been extensively studied over the past couple of decades and many of the algorithms are well documented but the key to making your recommendations successful is often finding an approach that works for your particular domain. For example, quantifying the similarities across sports data and recommending footballers requires a different approach to how you would recommend books or films.
As an aside, Amazon have some interesting patents discussing recommendations that are well worth a read if you are interested in the topic. Patents are normally fully of lexicon-mangling legalese but to their credit Amazon's are actually really accessible and easy to read.
Okay, let's take a look at some of the recommendations.
I'm a Manchester City fan so let's start there as there's likely to be plenty of transfers this summer - in fact, the next transfer looks like the departure of Pablo Zabaleta to Roma. With Zabaleta gone City's only other senior right fullback is 33-year-old Bacary Sagna so unless there's some crazy change in tactics from Pep Guardiola it seems reasonable to assume they'll be looking for a replacement.
In his prime Zabaleta was one of the greatest fullbacks to have graced the Premier League so let's look at recommendations for similar players to him. We don't want similar players to last season's Zabaleta though as his legs were clearly slowing down and he was struggling to keep up with the game. Instead, let's look at the Pablo Zabaleta who was Manchester City's player of the year back in 2012/2013 and see which modern day players compare with him.
Figure One: Players Similar To Pablo Zabaleta
Reassuringly, the top ten players recommended are all right fullbacks. At no point do I define player's positions in the algorithm, the recommendation engine learns this implicitly from the data and incorporates it into the recommendations. Also, to my eye they all look to be attacking fullbacks too, which matches with Zabaleta's style of bursting down the wing on the overlap to support the attack.
The players are ranked by how similar the recommendations are to the real thing, ranging from zero where there is no similarity to one where they are identical to each other. The top recommendation here is Paris Saint-Germain's Gregory van der Wiel with a similarity of 0.87, making him a very close match. Typically, the recommendations don't really go above 0.8 that often so it's quite rare to get such a close match.
Another player who is (hopefully) on his way out is Yaya Touré. Yaya has been immense over the years for Manchester City, but like Zabaleta his legs are fading fast and he is struggling / can't be bothered (delete as appropriate) to keep up with the game so who's his closest like-for-like replacement?
Figure Two: Players Similar To Yaya Touré
The first thing to notice here is that the closest match only has a similarity of 0.65. This is low - Yaya really is a unique player and there is just nobody around who is similar to him. Interestingly though, the top recommendation is Ilkay Gündogan who Manchester City have already signed this summer. So while there is no like-for-like Yaya replacement out there, City have already managed to buy the closest match there is. Top marks to City's scouting department there!
Let's take a quick look at a few other interesting players too. Leicester City have done a good job of holding on to their title-winning team so far so who would you sign if you want Riyad Mahrez but can't tempt him away from the KP Stadium?
Figure Three: Players Similar To Riyad Mahrez
Paulo Dybala is the closest match, with a decent similarity of 0.81. Yeah, good luck trying to convince him to leave Juventus. Second in the list is Nathan Redmond, who's recently been snapped up by Southampton for a bargain £10 million. Third in the list is 19-year-old Milot Rashica who's been getting decent reviews playing in Vitesse's midfield and is rumored to be a target for Napoli. There's also Leroy Sané on there who's hopefully on his way to Manchester City this summer, and Ousmane Dembele who's recently moved to Borussia Dortmund for the crazy low fee of €15 million, so it's a pretty strong list.
How about you want to sign Neymar for your team but can't afford the real thing? Figure Four: Players Similar To Neymar
The closest match to Neymar is Liverpool's Philippe Coutinho. The similarity is below 0.8 so it's not super high but it's still a fairly reasonable match. Second on the list is Ajax's Amin Younes with a virtually identical score to Coutinho - by the way, it's pretty amazing how often Ajax players crop up in these recommendations. You can pretty much pick any elite player and guarantee Ajax have someone in their teens / early twenties who profiles like them!
And finally, because people are bound to ask - here's Lionel Messi. Figure Five: Players Similar To Lionel Messi
*Insert usual disclaimer that I'm not advocating signing players based purely on data science / analytics / machine learning / statistics and that the goal should be to combine all of the above with the domain knowledge from professionals inside the game yada yada yada.*
Collaborative filtering is a useful technique for taking large amounts of data and filtering it down to a single value - in this case how similar different players are to each other. This information can then help refine shortlists for potential transfer targets and be used alongside more traditional scouting.
It can also be used to identify interesting youngsters. For example, there is currently a 20-year-old out there with a similarity to Gareth Bale of 0.81, a 21-year-old goalkeeper with a similarity to Manuel Neuer of 0.88 and two players in their early twenties with similarities of 0.85 to Sergio Agüero.
Much like Yaya Touré, many of the real elite players are pretty unique in what they do and there are very few of these youngsters out there who profile like them, at least outside of Ajax anyway!
Joshua - July 19, 2016
Quite an interesting piece with regards to the 'similarity' of the players between each other. However when you say similarity do you mean between the attributes that each player possesses skill wise or the actual numbers that they produce statswise and what about comparing players physical attributes as this would suit different leagues and the level of the team they play fo, as I'm guessing that Lionel Messi's numbers would decrease if he were to play for a League 2 side where he wasn't able to receive the service of which he does at Barca and having opposition who would be able to use lets call it a more physical' style of play
Also is this based on scientific principles from Journal articles or .....?
Martin - July 19, 2016 Thanks for your message Joshua. You are correct it does not account for the league a player is in so numbers may well change if a player moves leagues. However, it's not just a player's raw stats that go into the recommendations so it should be fairly robust to being skewed by this. And yes it is based on scientific principles, namely collaborative filtering.
Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.