My last article looked at how well my PlayerRatings model predicted which young players would go on to have successful careers. The results looked really exciting, with the majority of players aged 21 or under who were flagged by the model as having high potential going on to become world class players.
Although it's fantastic that the model is correctly predicting these players' careers, there's something else we need to test and that's false negatives - how many talented players did the model miss and incorrectly flag as being unlikely to be make the grade.
False negatives here refers to those players who were initially ranked lowly by their PlayerRating score but went on to become world class players anyway. One way to identify these players is to work backwards from the top players of today to see how they were rated early on in their careers compared with their peers.
The table below shows the top 25 players in the world today as judged by their PlayerRating score (incidentally, I'd be interested to hear any feedback on this list. Does it seem feasible? Are there any major names you think are missing? Anyone who doesn't deserve to be in the Top 25?).
|Ángel Di María||21|
Table1: The World's top 25 players by PlayerRating
Okay, now we've identified the best players in the world, let's take a look at how they ranked on their 21st birthday compared with all the other players aged 21 or under at that time.
If a player's ranking was considerably lower at age 21 compared with their peers, yet they've still gone on to become world class, it suggests their initial PlayerRating score was too low and so can be classified as a False Negative.
|Player||Rank At Age 21|
|Ángel Di María||33|
Table2: Player's ranks at age 21 compared with all other players aged 21 and under at the time
Again, the results look pretty impressive - the mode (or most frequently occurring rank) was position one, with a median rank of five suggesting that the overwhelming majority of today's world class players were correctly flagged as having high potential by the time they were 21.
The results aren't perfect though, especially for Maicon, Marcelo, Dani Alves and perhaps Ángel Di María but there is an obvious connection between these players - they all started their careers playing for South American teams.
Unfortunately, since we are going back 8-10 years in many cases, there was less data publicly available documenting the earliest stages of these player's careers, hence their reduced ratings. As I've mentioned in previous articles, minutes-played is an important factor in determining how confident the PlayerRating model is in its predictions. Without an adequate volume of data, particularly in the early years of a player's career, the model tends to be cautious in its recommendations.
Going forwards though, this becomes a non-issue as all the match data needed to calculate PlayerRatings is now freely available (albeit with a fair bit of hard work...) but it does help reinforce the point that any model is only ever as good as the data available to train it on.
Submit your comments below, and feel free to format them using MarkDown if you want. Comments typically take upto 24 hours to appear on the site and be answered so please be patient.