How racing is making strides into Big Data

Words - Alysen Miller

Unless you’ve been living under a rock, you may have noticed many male footballers wearing what appears to be a sports bra during training and matches. This is not a political statement, a show of solidarity with their female counterparts, perhaps; nor is it the latest fashion craze. Rather the bras are, in reality, GPS tracker vests. Containing a small Global Positioning System gadget, they allow team managers and trainers to collect and analyze players’ individualized GPS data in order to make informed decisions about tactics and training.

Like all big-money sports, the top football clubs now employ legions of data nerds to crunch the numbers on all aspects of their players’ performances. Premier League football club, Arsenal, uses the STATSports system to gather physical data on all their players, from the under-12s to the men’s and women’s first teams. Marketed as “the most advanced wearable tech on the market” (that’s the famous bra), it records some 250 separate metrics, including accelerations and decelerations, average heart rate, calories burned, distance per minute, high-speed running, high-intensity distance, max speed, sprints and strain. The statistics are available live during training sessions so coaches can make real-time adjustments where necessary. 

And it goes beyond wearable tech. Players at last year’s World Cup in Qatar were able to get insights into their on-field performance through FIFA’s own player app. Physical performance metrics were collected through a highly accurate in-stadium tracking system, including multiple cameras located around the pitch. These included distance covered at various speed thresholds, number of actions above 25 miles per hour (about 40 kilometers per hour), and maximum speed – all displayed on positional heat maps. Thanks to this data, we know that Kylian Mbappé hit a top speed of 35.3 kilometers per hour (about 22 miles per hour) against Poland in the round of 16. Impressive for a two-legged athlete, even if he won’t be giving the likes of Flightline or Baaeed a run for their money.

Football is following in the footsteps of baseball and American football by embracing “Big Data”. Not only does this enhance teams’ abilities to play and train, it adds another dimension to the spectator experience. Who doesn’t want to know how far their favorite player ran? Horse racing, by contrast, still relies on a mathematical speed model, Timeform, developed in the 1950s.

“When you look at other professional sports, racing’s a fair way behind in terms of how we measure the athlete,” says David Hawke. “Basically, we don’t measure the athlete in a biometric sense at all, whereas most other professional sports measure their athletes in competition, when the athletes are at their highest output and highest exertion. And this is the crucial point.” Hawke is hoping to change all that. He is the managing director of StrideMaster, a system that combines GPS and motion capture technologies to produce detailed insights into the horse’s performance. 

“When we developed the technology, back in 2010, it was essentially technology for race day: tracking horses, getting all their times—all the normal race track performance information that punters might want to see,” he explains. In the course of gathering this information, Hawke accumulated a treasure trove of biometric data. In 2018, he joined up with Dr David Lambert. Kentucky-based Dr Lambert is an expert on equine physiology and the founder of a company called Equine Analysis Systems, which leverages this understanding of how the horse moves to select elite, high-performance thoroughbreds. 

He is looking for the top one percent, the cream of the crop. Hawke’s idea was to take this hypothesis and turn it on its head; in other words, to find the one percent “who were in trouble.” In this way, by identifying the horses that are trying to cope with a problem, vets and trainers would have a crucial data point which could be used to help prevent injuries before they happen.

So how does it work? Here comes the science part. Essentially, every horse has a unique stride “fingerprint.” Thanks to Hawke’s data, we not only know what that fingerprint looks like, but also when the horse deviates substantially from that fingerprint.

 The first step is to collect high-resolution data of the horse at the gallop. This is because, as prey animals, horses are disinclined to show lameness at the walk or trot (the traditional way of assessing a horse’s soundness). “The forces that are at play when a horse is going at 40 miles per hour compared to when it’s being trotted up at five miles per hour are completely different,” says Hawke. “The price that the horse pays for going fast is that it gives up autonomy over a number of things,” he continues. “It gives up autonomy over its breathing, for example. It becomes a mechanical breather. It also gives up autonomy over its footfall. If it’s got a raging foot abscess at the walk or the trot, it will decide not to put its foot down. But at the gallop, it can’t do that. It has no choice over when it puts each foot down. So the only option it’s got left to manage an issue that’s impacting it is postural change: it’s going to hold itself differently; it’s going to use different muscles to try and take the pressure off.” 

To capture these changes, samples are taken from three axes: the vertical, the longitudinal and the medial. This data is captured by a device about the size of an iPhone that’s slipped into the saddle cloth. These samples are then broken down further: “We split the stride up into three parts,” Hawke explains. “We have the hind leg stance phase, which is the primary propulsion and power source for the horse. Hind leg spring function is absolutely critical to a good stride, so if anything’s wrong at the back end, that immediately gets transferred to the front end on the corresponding diagonal. Then we have the forelimb stance phase. And then we have the flight phase, or the collection phase, when the horse is off the ground. The flight phase is where the horse is making most of its postural adjustments in the air. So if it’s got a problem it’s managing, it’s trying desperately to accommodate that problem during the stride. And then when it goes into the air, it’s trying desperately to get itself ready for the next stride to do it all over again.”

The system is capable of detecting minute variations in the horse’s stride that are effectively invisible to the human eye. “From an observational point of view, humans can’t detect these sorts of changes that we’re picking up. It’s simply happening too fast,” he says. The sample rate in StrideMaster’s sensors is 800 hertz, or 800 frames per second. The human eye, by contrast, cannot directly perceive more than about 60 frames per second. “That enables us to look at the stride in a very high level of detail,” he says. 

Hawke has accumulated so much data that it’s no longer necessary to have historic data on an individual horse in order to make a judgment about its soundness. Rather, there exists an “ideal” fingerprint for different categories of horse: “We have a Gp.1 fingerprint, we have a Gp.2 fingerprint, right down to a $10,000 claimer fingerprint, to use the American parlance,” he explains. In other words, soundness can be assessed against an ideal archetype. If a horse is more than two standard deviations outside of this ideal, that is considered an adverse change that the system then flags for the attention of the trainer.

So how is this “deviation” measured? “We’re tracking two or three things that are important: we’re tracking the amount of power they produce, and we’re tracking the amount of vibration they produce,” Hawke explains. Vibration is, essentially, any rapid change in acceleration. That is what is most likely to cause injury. Think of the horse as a four-cylinder engine, with the legs as the pistons. Each piston—or leg—moves in a set rhythm. As long as this rhythm is maintained, vibration will be kept to a minimum. But changes in rhythm (for example, because the horse is managing a problem) generate vibration which, in turn, generates damage. The sounder the horse, in other words, the less vibration. But with great power comes the potential to generate huge amounts of vibration. This explains why most of the horses that get flagged are competitive horses in whatever cohort they’re in. “They’re not horses that are running 20 lengths down the track,” says Hawke. “Generally, those horses are not producing enough power or vibration to get themselves into trouble. [The good horses] will always find a way to go fast,” he says.

While Hawke sees the technology primarily as an injury prevention tool, he acknowledges that its potential is broader than that: “From a social license point of view, that’s where the pressure is: to manage these injury rates and welfare outcomes better than we have been. So that’s the primary focus,” he says. But the same technology could, in theory, be used to identify future elite performers: 

“When you compare, say, a Gp.1 horse to a low-rating handicapper, what we see is increased deviation from optimum,” he explains. “To take a metric at random: gravity. The acceleration of an object toward the ground caused by gravity alone, near the surface of Earth, is called ‘normal gravity,’ or 1g. This acceleration is equal to 32.2 ft/sec2 (9.8 m/sec2). If you drop an apple on Earth, it falls at 1g”. 

“The Gp.1 horse will be much closer to that 1g than the lower rating handicapper,” he explains. “[The lower-rated horse] is not as efficient. They’re losing power in all directions. They’re going up and down more, they’re going side to side more. Whereas the elite horse actually generates surprisingly less power, but it’s all pointing down the road in the right direction.”

Hawke is keen to emphasize that he is not marketing a diagnostic tool. Rather, trainers should see this technology as another tool in their toolkit: “When the trainer gets the information, either they come and seek more information or talk to their vets about what’s going on. The vet can review the stride on a stride-by-stride basis. And when we get down to that level of detail, we can actually, on most occasions, give some indication of what quadrant the problem is emanating from.”

But what if you could identify such problems without even galloping the horse? 

Stephen O’Dwyer thinks he has a solution. O’Dwyer is the founder of Irish start-up TrojanTrack, which uses video cameras to record the horse at the walk and, from there, identify any variations in its movement. “We take video data of 52 different parts of the horse at 120 frames per second,” he explains. “We then convert those parts into biomechanical data: joint velocities, accelerations, angles. And then we can compare that to the horse’s healthy baseline movement to track any deteriorations or imbalances that might be creeping in.” But wait. Horses are prey animals. Won’t they naturally try to mask any injuries at the walk? “Horses are herd animals, so rather than show any sign of injury, they try to hide it as much as possible, and that means compensating on a different limb or something like that,” O’Dwyer acknowledges. “But because we’re tracking 52 points, we’re able to pick up any tiny deviations, tiny nuances that won’t be picked up by the human eye. 

“In talking to a few of the vets, they say that when the horse is in its walk, it’s at its most comfortable,” he continues. “And because they’re in their most comfortable state, they won’t be trying to hide their injury as much.” O’Dwyer plans to incorporate trot movements in the future.

Like Hawke, O’Dwyer sees his technology primarily as another arrow in the trainer’s quiver, rather than a diagnostic tool. “It’s hard for the trainer to pick up on the whole horse at once,” he explains. “They might be staring at one limb while the hip isn’t moving, and they’d have to walk by again and check the hip, and then they’re not looking at another limb. We look at all four limbs landing, the hip movement as one of the limbs is landing. So it’s the whole package of the horse in one to really show the trainer exactly what is going on.”

O’Dwyer acknowledges that the technology is still in its nascency. He is currently running customer trials a couple of yards in Ireland while he tries to drum up the next round of investment. StrideMaster, meanwhile, has been adopted by racing authorities in the United States and in Hawke’s native Australia. But any technologies that can help spot potentially catastrophic injuries before a horse hits the track must be taken extremely seriously by an industry that can, at times, feel like it is operating on the razor’s edge of public acceptability. As Hawke says, “The first priority is welfare because we have to look after the animal. If we’re not seen to be looking after the animal, the whole game’s in trouble.”

It seems like it is only a matter of time before racing joins the ranks of other sports in embracing Big Data. Says Hawke: “If I walked into a major football club and said, ‘Who here’s got expertise in biometric sensor analysis,' half the football department would put their hand up because they’ve been doing it for 20 years. But the information can be used in so many different ways in terms of performance, breeding and training techniques. We’re just scratching the surface.”

Artificial Intelligence tools - and their growing use in selecting yearlings

Artificial Intelligence tools - and their growing use in selecting yearlings

Book 1 of the Tattersalls October Yearling Sale is traditionally where some of the finest horseflesh in the world is bought and sold. The 2022 record-busting auction saw 424 lots pass through its hallowed rotunda for a total of 126,671,000 guineas. One of the jewels in the crown was undoubtedly lot 379, a Frankel colt out of Blue Waltz, who was knocked down to Coolmore's M.V. Magnier, joined by Peter Brant, for 1,900,000 guineas.

 It is easy to see why lot 379 made Coolmore open its purse strings. He has a stallion’s pedigree, being out of a Pivotal mare. His sire has enjoyed a banner year on the track, with eight individual Gp/Gr1 winners in 2022. He is a full brother to the winning Blue Boat, himself a 450,000 guineas purchase for Juddmonte Farms at Book 1 in 2020. Lot 379 is undeniably impressive on the page. 

Lot 379 Tattersalls sale

But it is not his impeccable pedigree that makes Tom Wilson believe lot 379 has the makings of a future champion. “The machine doesn’t have any biases. It doesn’t know whether it’s a Galileo or a Dubawi or a Havana Grey,” he says. “The machine just looks at the movement of the horse and scores it as it sees it. It has no preconceptions about who the elite sires in the market are. It’s completely neutral.”

The “machine” to which Wilson is referring is, in reality, a complex computational model that he claims can predict with 73 percent accuracy whether a horse will be elite (which he defines as an official rating of 90 or above, or the equivalent in its own jurisdiction) or non-elite (horses rated 60 or below) based on its walk alone. It’s a bold claim. So how does he do it?

First, Wilson taught an open source artificial intelligence tool, DeepLabCut, to track the movements of the horse at the walk. To do this, he fed it thousands of hours of footage. He then extracted around 100 frames from each video and manually labeled the body parts. “You teach it what a hock is, what a fetlock is, what a hip is,” he explains. “Eventually, when you feed new videos through, it automatically recognises them and plots the points. Then you can map the trajectories and the angles.”

He then feeds this information into a separate video classification algorithm that analyzes the video and compares it to historic data in order to generate a predicted rating for the horse. “Since 2018, I’ve taken about 5,000 videos of yearlings from sales all around the world with the same kind of biometric markers placed on them and then gone through the results and mapped what performance rating each yearling got,” he says. “So we’re marrying together the video input from the sale to the actual results achieved on track.”

Lot 379 has a projected official rating of 107 based on his biomechanics alone, the highest of all the Frankel’s on offer in Book 1 (yes, even higher than the 2,800,000 gns colt purchased by Godolphin). Wilson’s findings have been greeted with skepticism in some quarters. “There’s so many other factors that you can’t measure,” points out trainer Daniel Kübler. “There’s no way an external video can understand the internal organs of a horse, which you can find through vetting. If it’s had an issue with its lungs, for example, it doesn’t matter how good it looks. If it’s inefficient at getting oxygen into its system, it’s not going to be a good racehorse.”

“It’s not a silver bullet,” concedes Wilson. “There are multiple ways to find good horses. It’s just another metric, or set of metrics, that helps.” But is it really “just another metric,” or the opening salvo in a data revolution that has the potential to transform the way racehorses are bought and sold?

Big data. Analytics. Moneyball. It goes by many names, but the use of data in sports is, of course, nothing new. It was brought to popular attention by Michael Lewis in his 2003 book Moneyball and by the 2011 film of the same name starring Brad Pitt. 

It charted the fortunes of the Oakland Athletics baseball team. You know the story: Because of their smaller budget compared to rivals such as the New York Yankees, Oakland had to find players who were undervalued by the market. To do this, they applied an analytical, evidence-based approach called sabermetrics. The term ‘sabermetrics’ was coined by legendary baseball statistician Bill James. It refers to the statistical analysis of baseball records to evaluate and compare the performance of individual players. Sabermetrics has subsequently been adopted by a slew of other Major League Baseball teams (in fact, you would be hard pressed to find an MLB team that doesn’t employ a full-time sabermetrics analyst), and ‘moneyball’ has well and truly entered the sporting lexicon on both sides of the Atlantic.

Take Brentford FC. As recently as 2014, the West London club was languishing in the third tier of English football. Today, Brentford is enjoying its second consecutive season in the top flight (Premier League), bucking the trend of teams that gain promotion only to slingshot back down to the lower leagues after one season. 

What is their secret? Moneyball. Brentford’s backroom staff has access to vast streams of data that detail how their players rank across a number of key metrics. This information helps them make day-to-day training ground decisions. But crucially, it also shapes their activity in the transfer market by helping them to identify undervalued players to sell on for a profit. Players such as Ezri Konsa, purchased from Charlton for a rumored £2.5 million in 2018 before being sold, one year later, to Aston Villa for a £10 million profit. Think of it as the footballing equivalent of pinhooking. 

Data analysis on yearlings

The bottom line is that data analysis has already transformed the way athletes are recruited and trained across a range of sports. It stands to reason, therefore, that statistical modeling could help buyers who are spending, on average, 298,752 guineas for a yearling at Book 1 make informed purchasing decisions.

“I’ve always been interested in applying data and technology to an industry that doesn’t exactly embrace technology.” That’s according to star bloodstock agent Bryon Rogers. Rogers is widely regarded as the godfather of the biometrics movement in racing. “The thoroughbred industry is one that moves slowly, rather than quickly,” he adds, with a dash of irony. 

Having cut his teeth at Arrowfield Stud in his native Australia and Taylor Made Farm in Kentucky, in 2011 he started his own company, Performance Genetics. As its name implies, the company initially focused on DNA sequencing, attempting to identify markers that differentiated elite and non-elite horses.

From there, it branched out into cardiovascular and biomechanical research. Rogers quickly discovered that it was the biomechanical factors that were the most influential in terms of identifying future elite horses. “When you put all the variables in, the ones that surface to the top as the most important are actually the biomechanical features: the way the horse moves and the way the horse is constructed. They outweigh DNA markers and cardiovascular measurements,” he explains. 

According to Rogers, roughly a fifth (19.5 percent, to be exact) of what makes a horse a horse is explained by the way it moves. “That’s not to say that [those other factors] are not important. It’s just that if you’re ranking them by importance, the biomechanical features are more important than the cardiovascular ones.” 

His flag bearer is Malavath. Purchased at the 2020 Goffs Premier Yearling Sale for £29,000, she was first sold for €139,200 at the Arqana Breeze Up Sale the following year. “I know when I’ve found one,” recounts Rogers. “I walked up to her [at the sale], and there was nobody else there. At that time, [her sire] Mehmas wasn’t who he was. But her scores, for us, were an A plus. She shared a lot of the common things with the good sprinter-milers that we’ve got in the database. A lot of the dimensions were very similar, so she fit into that profile.” She has since proven herself as a Gp2 winner and most recently finished second behind Kinross in the Prix de La Forêt on Arc day.

Malavath. Purchased at the 2020 Goffs Premier Yearling Sale for £29,000, she was first sold for €139,200 at the Arqana Breeze Up Sale

In December 2022, Malavath sold again, but this time for €3.2m to Moyglare Stud and is set to continue her racing career in North America under the tutelage of Christophe Clement.

A find like Malavath has only been made possible through the rapid development of deep learning and artificial intelligence in recent years. Rogers’s own models build on technology originally developed for driverless cars—essentially, how a car uses complex visual sensors and deep learning to figure out what’s happening around it in order to make a decision about what to do next.

But wait. What is deep learning? Here comes the science bit! Machine learning and deep learning are both types of artificial intelligence. “Classical” machine learning is A.I. that can automatically adapt with minimal human interference. Deep learning is a form of machine learning that uses artificial neural networks to mimic the learning process of the human brain by recognising patterns the same way that the human nervous system does, including structures like the retina. 

“My dad’s an eye surgeon in Australia and he was always of the opinion that what will be solved first in artificial intelligence will be anything to do with vision,” says Rogers knowingly. Deep learning is much more computationally complex than traditional machine learning. It is capable of modeling patterns in data as sophisticated, multi-layered networks and, as such, can produce more accurate models than other methods.

Chances are you’ve already encountered a deep neural network. In 2016, Google Translate transitioned from its old, phrase-based statistical machine translation algorithm to a deep neural network. The result was that its output improved dramatically from churning out often comical non-sequiturs to producing sentences that are closely indistinguishable from a professional human translator.

So does this mean that the received wisdom around how yearlings are selected is outdated, subjective and flawed? Not exactly. “There are so many different ways of being a good horse;  I don’t think [selecting horses] will ever completely lose its appeal as an art form,” says Rogers. “But when we get all this data together and we start to look at all these data points, it does push you towards a most predictable horse.” In other words, following the data will not lead you to a diamond in the rough; rather, it’s about playing the percentages. And that’s before all before the horse goes into training.

After that point, the data only gets you so far. “I would say [the use of biomechanical modeling] probably explains somewhere between 30 to 40 percent of outcome,” says Rogers. “It’s very hard to disentangle. The good racehorse trainer has got all the other things working with him: he’s got the good jockeys, the good vet, the good work riders. He’s got all of those things, and their effect on racetrack outcomes is very hard to model and very hard to disaggregate from what we do.”

Nevertheless, it does not look like big data is going away any time soon. “It might be a couple of years away,” says Rogers. “As bloodstock gets more and more expensive and as the cost of raising a horse gets more and more expensive, the use of science is going to rise.” He believes there’s already an analytics arms race happening behind the scenes.

“For me, it isn’t a case of if it’s valuable; it’s a case of when it will be recognised as being valuable.” That’s Wilson again. “What you see in every sport is a big drive towards using statistical analysis and machine learning to qualify and understand performance. Every other sporting sector tells us that these methods will be adopted, and the ones that adopt them first will gain a performance edge over the rest of the field.”

Comparisons to Deep Blue’s defeat of Garry Kasparov might be premature, but it is clear that the racing industry is fast approaching a tipping point. “I don’t think the machine on its own beats the human judge,” says Wilson. “But I think where you get the real benefit is when you use the information you've been given by machine learning and you combine that with deep human expertise. That’s where the application of these types of things are the most successful in any sport. It’s the combination of human and machine that is power. Humans and machines don’t have to compete with each other.”

So will more trainers be adopting the technology? “There’s lots of different data points that you can use to predict a horse’s potential, and it’s understanding all of the pieces together,” says Kübler. “I’d want a bit more proof of concept. Show me that your system is going to save me loads of time and add loads of value. We’ll see in three or four years’ time how good it was.”

In the meantime, all eyes will be on Lot 379.

Artificial Intelligence tools - and their growing use in selecting yearlings