Artificial Intelligence tools - and their growing use in selecting yearlings
/Book 1 of the Tattersalls October Yearling Sale is traditionally where some of the finest horseflesh in the world is bought and sold. The 2022 record-busting auction saw 424 lots pass through its hallowed rotunda for a total of 126,671,000 guineas. One of the jewels in the crown was undoubtedly lot 379, a Frankel colt out of Blue Waltz, who was knocked down to Coolmore's M.V. Magnier, joined by Peter Brant, for 1,900,000 guineas.
It is easy to see why lot 379 made Coolmore open its purse strings. He has a stallion’s pedigree, being out of a Pivotal mare. His sire has enjoyed a banner year on the track, with eight individual Gp/Gr1 winners in 2022. He is a full brother to the winning Blue Boat, himself a 450,000 guineas purchase for Juddmonte Farms at Book 1 in 2020. Lot 379 is undeniably impressive on the page.
But it is not his impeccable pedigree that makes Tom Wilson believe lot 379 has the makings of a future champion. “The machine doesn’t have any biases. It doesn’t know whether it’s a Galileo or a Dubawi or a Havana Grey,” he says. “The machine just looks at the movement of the horse and scores it as it sees it. It has no preconceptions about who the elite sires in the market are. It’s completely neutral.”
The “machine” to which Wilson is referring is, in reality, a complex computational model that he claims can predict with 73 percent accuracy whether a horse will be elite (which he defines as an official rating of 90 or above, or the equivalent in its own jurisdiction) or non-elite (horses rated 60 or below) based on its walk alone. It’s a bold claim. So how does he do it?
First, Wilson taught an open source artificial intelligence tool, DeepLabCut, to track the movements of the horse at the walk. To do this, he fed it thousands of hours of footage. He then extracted around 100 frames from each video and manually labeled the body parts. “You teach it what a hock is, what a fetlock is, what a hip is,” he explains. “Eventually, when you feed new videos through, it automatically recognises them and plots the points. Then you can map the trajectories and the angles.”
He then feeds this information into a separate video classification algorithm that analyzes the video and compares it to historic data in order to generate a predicted rating for the horse. “Since 2018, I’ve taken about 5,000 videos of yearlings from sales all around the world with the same kind of biometric markers placed on them and then gone through the results and mapped what performance rating each yearling got,” he says. “So we’re marrying together the video input from the sale to the actual results achieved on track.”
Lot 379 has a projected official rating of 107 based on his biomechanics alone, the highest of all the Frankel’s on offer in Book 1 (yes, even higher than the 2,800,000 gns colt purchased by Godolphin). Wilson’s findings have been greeted with skepticism in some quarters. “There’s so many other factors that you can’t measure,” points out trainer Daniel Kübler. “There’s no way an external video can understand the internal organs of a horse, which you can find through vetting. If it’s had an issue with its lungs, for example, it doesn’t matter how good it looks. If it’s inefficient at getting oxygen into its system, it’s not going to be a good racehorse.”
“It’s not a silver bullet,” concedes Wilson. “There are multiple ways to find good horses. It’s just another metric, or set of metrics, that helps.” But is it really “just another metric,” or the opening salvo in a data revolution that has the potential to transform the way racehorses are bought and sold?
Big data. Analytics. Moneyball. It goes by many names, but the use of data in sports is, of course, nothing new. It was brought to popular attention by Michael Lewis in his 2003 book Moneyball and by the 2011 film of the same name starring Brad Pitt.
It charted the fortunes of the Oakland Athletics baseball team. You know the story: Because of their smaller budget compared to rivals such as the New York Yankees, Oakland had to find players who were undervalued by the market. To do this, they applied an analytical, evidence-based approach called sabermetrics. The term ‘sabermetrics’ was coined by legendary baseball statistician Bill James. It refers to the statistical analysis of baseball records to evaluate and compare the performance of individual players. Sabermetrics has subsequently been adopted by a slew of other Major League Baseball teams (in fact, you would be hard pressed to find an MLB team that doesn’t employ a full-time sabermetrics analyst), and ‘moneyball’ has well and truly entered the sporting lexicon on both sides of the Atlantic.
Take Brentford FC. As recently as 2014, the West London club was languishing in the third tier of English football. Today, Brentford is enjoying its second consecutive season in the top flight (Premier League), bucking the trend of teams that gain promotion only to slingshot back down to the lower leagues after one season.
What is their secret? Moneyball. Brentford’s backroom staff has access to vast streams of data that detail how their players rank across a number of key metrics. This information helps them make day-to-day training ground decisions. But crucially, it also shapes their activity in the transfer market by helping them to identify undervalued players to sell on for a profit. Players such as Ezri Konsa, purchased from Charlton for a rumored £2.5 million in 2018 before being sold, one year later, to Aston Villa for a £10 million profit. Think of it as the footballing equivalent of pinhooking.
The bottom line is that data analysis has already transformed the way athletes are recruited and trained across a range of sports. It stands to reason, therefore, that statistical modeling could help buyers who are spending, on average, 298,752 guineas for a yearling at Book 1 make informed purchasing decisions.
“I’ve always been interested in applying data and technology to an industry that doesn’t exactly embrace technology.” That’s according to star bloodstock agent Bryon Rogers. Rogers is widely regarded as the godfather of the biometrics movement in racing. “The thoroughbred industry is one that moves slowly, rather than quickly,” he adds, with a dash of irony.
Having cut his teeth at Arrowfield Stud in his native Australia and Taylor Made Farm in Kentucky, in 2011 he started his own company, Performance Genetics. As its name implies, the company initially focused on DNA sequencing, attempting to identify markers that differentiated elite and non-elite horses.
From there, it branched out into cardiovascular and biomechanical research. Rogers quickly discovered that it was the biomechanical factors that were the most influential in terms of identifying future elite horses. “When you put all the variables in, the ones that surface to the top as the most important are actually the biomechanical features: the way the horse moves and the way the horse is constructed. They outweigh DNA markers and cardiovascular measurements,” he explains.
According to Rogers, roughly a fifth (19.5 percent, to be exact) of what makes a horse a horse is explained by the way it moves. “That’s not to say that [those other factors] are not important. It’s just that if you’re ranking them by importance, the biomechanical features are more important than the cardiovascular ones.”
His flag bearer is Malavath. Purchased at the 2020 Goffs Premier Yearling Sale for £29,000, she was first sold for €139,200 at the Arqana Breeze Up Sale the following year. “I know when I’ve found one,” recounts Rogers. “I walked up to her [at the sale], and there was nobody else there. At that time, [her sire] Mehmas wasn’t who he was. But her scores, for us, were an A plus. She shared a lot of the common things with the good sprinter-milers that we’ve got in the database. A lot of the dimensions were very similar, so she fit into that profile.” She has since proven herself as a Gp2 winner and most recently finished second behind Kinross in the Prix de La Forêt on Arc day.
In December 2022, Malavath sold again, but this time for €3.2m to Moyglare Stud and is set to continue her racing career in North America under the tutelage of Christophe Clement.
A find like Malavath has only been made possible through the rapid development of deep learning and artificial intelligence in recent years. Rogers’s own models build on technology originally developed for driverless cars—essentially, how a car uses complex visual sensors and deep learning to figure out what’s happening around it in order to make a decision about what to do next.
But wait. What is deep learning? Here comes the science bit! Machine learning and deep learning are both types of artificial intelligence. “Classical” machine learning is A.I. that can automatically adapt with minimal human interference. Deep learning is a form of machine learning that uses artificial neural networks to mimic the learning process of the human brain by recognising patterns the same way that the human nervous system does, including structures like the retina.
“My dad’s an eye surgeon in Australia and he was always of the opinion that what will be solved first in artificial intelligence will be anything to do with vision,” says Rogers knowingly. Deep learning is much more computationally complex than traditional machine learning. It is capable of modeling patterns in data as sophisticated, multi-layered networks and, as such, can produce more accurate models than other methods.
Chances are you’ve already encountered a deep neural network. In 2016, Google Translate transitioned from its old, phrase-based statistical machine translation algorithm to a deep neural network. The result was that its output improved dramatically from churning out often comical non-sequiturs to producing sentences that are closely indistinguishable from a professional human translator.
So does this mean that the received wisdom around how yearlings are selected is outdated, subjective and flawed? Not exactly. “There are so many different ways of being a good horse; I don’t think [selecting horses] will ever completely lose its appeal as an art form,” says Rogers. “But when we get all this data together and we start to look at all these data points, it does push you towards a most predictable horse.” In other words, following the data will not lead you to a diamond in the rough; rather, it’s about playing the percentages. And that’s before all before the horse goes into training.
After that point, the data only gets you so far. “I would say [the use of biomechanical modeling] probably explains somewhere between 30 to 40 percent of outcome,” says Rogers. “It’s very hard to disentangle. The good racehorse trainer has got all the other things working with him: he’s got the good jockeys, the good vet, the good work riders. He’s got all of those things, and their effect on racetrack outcomes is very hard to model and very hard to disaggregate from what we do.”
Nevertheless, it does not look like big data is going away any time soon. “It might be a couple of years away,” says Rogers. “As bloodstock gets more and more expensive and as the cost of raising a horse gets more and more expensive, the use of science is going to rise.” He believes there’s already an analytics arms race happening behind the scenes.
“For me, it isn’t a case of if it’s valuable; it’s a case of when it will be recognised as being valuable.” That’s Wilson again. “What you see in every sport is a big drive towards using statistical analysis and machine learning to qualify and understand performance. Every other sporting sector tells us that these methods will be adopted, and the ones that adopt them first will gain a performance edge over the rest of the field.”
Comparisons to Deep Blue’s defeat of Garry Kasparov might be premature, but it is clear that the racing industry is fast approaching a tipping point. “I don’t think the machine on its own beats the human judge,” says Wilson. “But I think where you get the real benefit is when you use the information you've been given by machine learning and you combine that with deep human expertise. That’s where the application of these types of things are the most successful in any sport. It’s the combination of human and machine that is power. Humans and machines don’t have to compete with each other.”
So will more trainers be adopting the technology? “There’s lots of different data points that you can use to predict a horse’s potential, and it’s understanding all of the pieces together,” says Kübler. “I’d want a bit more proof of concept. Show me that your system is going to save me loads of time and add loads of value. We’ll see in three or four years’ time how good it was.”
In the meantime, all eyes will be on Lot 379.