Prepping for the BC: A Statistical Analysis (updated!)

Prepping for the BC: A Statistical Analysis (updated!)

Postby Tessablue » Sun Oct 29, 2017 8:45 pm

EDIT: see page 2 for this year's results!

The Breeders' Cup is always difficult, but this year we have an additional challenge: we don't know how Del Mar is going to play. Will it favor the home team? How can we predict whether performances across the country will translate to success in California?

Now we do have some precedence if we look at other BC's contested in California: 58% of dirt BC winners in California had a previous win at the host track, while 63% of them had a win in California. Meanwhile, only 27% had a previous win at Belmont, even though 45% had raced there at some point.

But this is a pretty reductive way of looking at things, right? A lot of people (rightly) point out that we don't know how many of these horses were favored and how many of them ran better (or worse) than expectations. So very long story short, I decided to address this issue by coming up with a formula that normalizes a horse's performance to their odds, for a value that I will refer to as a true differential. The formula is as follows:

True differential = (odds to $1) - (finish position/field size x 10)^2

With this formula, a score of around 0 means that a horse ran pretty much exactly as expected. A negative score indicates they ran a disappointing race, whereas a positive one means they exceeded expectations. This formula is rough and has two huge weaknesses: a horse at high enough odds will exceed expectations by running last (only one horse had this problem, Hot Number in 1993, and he was removed from the dataset), and it struggles badly with coupled entries (which are mercifully no longer a problem). But I think it passes the eye test, and I'd be interested in seeing if others agree.

For example, by this measure these are the most disappointing performance in California BC's since 1993 (unfortunately I don't have the charts for the 80's races): Jewel Princess, Cuvee, Touch Gold, Rich Tapestry, Princess of Sylmar, Daredevil, Close Hatches

These are the most expected performances: Flat Out, Tapiture, Beholder (2013), Iotapa, Shanghai Bobby

And these are the most unexpected: Arcangues, Ezzoud, Dawson's Legacy, Spelling Again, Take Charge Brandi, Pleasant Tango

This allowed me to generate graphs like this, to gauge the performances of horses who prepped at different tracks:
Image

True differential values are on the left. Again, a 0 means a horse ran exactly according to their odds. If the middle line (the median) is higher, that indicates that those horses ran better than expected, whereas a lower line means they were disappointing. This graph therefore indicates that mares who prep for the Distaff at Santa Anita run significantly better, relative to their odds, than those who prep at Belmont (confirmed statistically but this post is verbose enough, happy to provide details if asked). Keeneland is included in these graphs because it is the third big prep location, but there weren't many interesting effects to report regarding it.

I generated these graphs for every BC dirt race, collected in an album here: https://imgur.com/a/yJWfp

In short, the Distaff shows the greatest effect. Belmont preppers as a whole run significantly worse than Santa Anita preppers in these races, and Santa Anita preppers on average run better than expected, but the effect is found only for two-turn races. In fact, horses who prep at Keeneland run slightly better in these one-turn races and there is no difference whatsoever between Belmont and Santa Anita preppers!

So this was really interesting, but something kept bothering me- just because a horse preps at Belmont doesn't mean that they actually like or run well at that track. So what happens when we look at prep performance in addition to location?

Incredibly, horses who win their prep at Belmont run worse than those who lose their prep at Belmont. This is a robust, reliable effect. Just look at this incredible graph:
Image
The median odds of horses who win their prep at Belmont are 9/2 (actually lower odds than Santa Anita prep winners, who are 6-1). The median finish is 5th, and the media true differential is a dire -20.00. Meanwhile, prep losers (median odds 14-1) also run 5th on average, with a median differential of -11.28. This effect is not seen across tracks: Santa Anita and Keeneland winners run better or equally compared to losers. Once again, this effect is only seen around two turns. Around one turn, Belmont prep winners run a full ten points better than Belmont prep losers.

I think there are many possible explanations for this:
-horses who dislike Belmont are more likely to appreciate Santa Anita, and vice versa.
-the crowd is not good at evaluating the chances of horses who prep at Belmont
-Belmont preps have greater depth than other prep races, so horses who run 2nd or worse are more likely to outrun their odds the next time out.
-or perhaps Santa Anita preps tend to have greater depth, so horses who are good enough to win are stronger contenders in the BC.
-the strength of Belmont preps may be continuously overrated by bettors, or a lack of speed from the east coast horses may put them at a disadvantage (credit to Diver52 for these hypotheses).

While I lean toward option #1, I would very much like to hear what people think of this because I am a lifelong east coaster. I am admittedly a very strong critic of the idea that the BC should be permanently located in California, and I think this provides evidence that horses from certain tracks are potentially disadvantaged by running on the west coast- but this is by no means a certainty. The next step is of course to examine the performances of BC horses on turf and at Belmont. Meanwhile, I'd like to stimulate discussion with the following questions:

1. Do you actually believe these results, or do you have questions or criticisms about the methods? If you believe them, why do you think we see these differences between Santa Anita and Belmont?
2. Why do you think two turn races are (possibly) more prep-dependent than one turn races?
3. Do you think Del Mar will follow this trend? Does it affect your handicapping approach at all?
4. Which horses are you most confident will perform well at Del Mar? Least confident?

Thanks for reading!
Last edited by Tessablue on Tue Nov 07, 2017 9:14 pm, edited 5 times in total.
Tessablue
 
Posts: 3021
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Diver52 » Sun Oct 29, 2017 9:13 pm

That is very impressive and thought-provoking in that it seems to confirm the anecdotal or "gut feeling" evidence that New York shippers don't do well at SA. I am not qualified to offer much of an opinion but I think that perhaps NY shippers have been historically overbet due to the famous "East Coast bias," by which I mean the perception that NY/East Coast racing is superior to that in California. That may be true top to bottom, but it is hard to continue to maintain that the top horses on the East Coast are better than those out West. I mean, Hoppertunity took down the JCGC!

I am also going to reference a thread on another forum entitled something like "The Chad Brown effect," the primary observation of which was that Brown's success in adapting typical turf race shapes to dirt (that is, a gallop followed by a wild charge to the finish) had led other East Coast trainers and jockeys to be speed averse. This would, I think, tend to give a recent advantage to California horses who are usually either trained for speed or at least to stay in contact with a fast pace.

I will follow this with interest. Thank you.
I ran marathons. I saw the Taj Mahal by Moonlight. I drove Highway 1 in a convertible. I petted Zenyatta.
User avatar
Diver52
 
Posts: 1407
Joined: Fri Sep 13, 2013 12:44 pm
Location: Redlands, CA

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Tessablue » Sun Oct 29, 2017 9:31 pm

Thank you and you are most certainly qualified! Those are some great points that I hadn't considered, and I wonder if the effect is different depending on if a horse has won, raced, or performed well at both tracks. Hoppertunity is a great example, too- he earned a very solid -1.65 last year and is a great example of how a horse can run well at Belmont without really being a "Belmont horse." It's interesting to note that there was no effect at last year's BC- there were large effects in 2013 and 2014, but Belmont preppers ran as expected and better than SA preppers last year. Perhaps bettors are catching onto it and changing their bets accordingly? I think the east coast/ west coast debates have sort of evened out in the past few years, given how many quality horses have emerged from the west coast, but this year will be another interesting test.

I've heard a bit about the Chad Brown effect as well, though to this point I've mostly thought about it while shaking my fist at Saratoga jockeys. That's a fascinating thought, and certainly worth following up on! A lot of the disappointing Belmont horses are sort of grindy types- like Tonalist and Palace Malice- so it certainly makes sense.

The post has been updated to include some more hypotheses as well as your own- thanks!
Tessablue
 
Posts: 3021
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Treve » Sun Oct 29, 2017 10:34 pm

It's interesting you mention last year having no effect because I seem to remember a lot of people commenting (myself included) that during last year's edition of the BC, SA was playing very differently than usual and seemed a lot less speed biased than what is typically thought of. Some people claimed the track seemed deeper than usual. Then again there were also a few big upsets both on turf and dirt. Last year might've been a statistical anomaly but it was interesting for sure.
I don't know if there is a way to quantify that as well, in terms of performances not just in relation to the odds from preps but also to how the track was playing. I seem to recall one of the threads last year was discussing this and a pattern was emerging for BCs at Santa Anita but I cannot for the life of me remember now.
A filly named Ruffian...

Eine Stute namens Danedream...

Une pouliche se nommant Trêve...

Kincsem nevű kanca...


And a Queen named Beholder
User avatar
Treve
 
Posts: 2931
Joined: Fri May 08, 2015 5:12 pm

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Tessablue » Sun Oct 29, 2017 11:19 pm

Treve wrote:It's interesting you mention last year having no effect because I seem to remember a lot of people commenting (myself included) that during last year's edition of the BC, SA was playing very differently than usual and seemed a lot less speed biased than what is typically thought of. Some people claimed the track seemed deeper than usual. Then again there were also a few big upsets both on turf and dirt. Last year might've been a statistical anomaly but it was interesting for sure.
I don't know if there is a way to quantify that as well, in terms of performances not just in relation to the odds from preps but also to how the track was playing. I seem to recall one of the threads last year was discussing this and a pattern was emerging for BCs at Santa Anita but I cannot for the life of me remember now.

Another great point- looking at these trends across lots and lots of horses improves the statistical power of the analysis a great deal, but year-to-year details are completely lost in the process. I do vaguely remember that discussion (and a very spirited debate a few years back about the 2012 BC), and I'd like to find a way to quantify speed biases as well. I do have "lengths behind at the half" in my dataset, but unfortunately the true differential system is not suitable for correlations because different horses have different potential limits (an even-money favorite can only really earn a +1 at best, because they are expected to win and you can't do better than first). But perhaps a chunking method- grouping horses by position- would be a possible avenue. I'll look into it!

And in the interest in clarity, here are trends from all the different years (note however that it gets iffy with these smaller sample sizes, so caution is advised):

1993: Extreme underperformance by Belmont horses (winners and losers), SA preppers as expected
1997: Extreme underperformance by Belmont winners, roughly equivalent performance by Belmont losers and SA winners, underperformance by SA losers.
2003: Extreme underperformance by Belmont winners, moderate underperformance by Belmont losers, moderate overperformance by SA winners, SA losers as expected.
2012: Belmont winners and SA losers performed almost exactly as expected, strong underperformance by Belmont losers and SA winners
2013: Extreme underperformance by Belmont winners, moderate underperformance by Belmont losers, SA winners as expected, moderate underperformance by SA losers.
2014: Moderate underperformance by Belmont winners and losers, extreme underperformance by SA winners, SA losers as expected.
2016: Belmont horses (winners and losers) performed as expected, strong underperformance by all SA preppers.
Tessablue
 
Posts: 3021
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Kennedy » Mon Oct 30, 2017 10:35 am

Fantastic work Tessa. I love reading thoughtful analysis and wondering how it might play into results this year.

I do have a clarifying question about the data. Was this based on odds (as in a 10/1 has xx% chance of winning) or was it based on their odds relative to the other contestants (ie...they were the 4th choice and ran 5th)

I think the pre-identification of regional bias is a helpful tool to have in the belt and I think your numbers state it well. One of the things that I will generally lean towards is horses who have a preference on the host track or on a similar track. The trick here is that since Del Mar is such a seasonal track there is really only a small subset of each field that actually has any form over the track at all much of it is unknown. But I am also careful not to view this as some kind of exclusion factor. There are good reasons to go against regional bias.

Also Del Mar is not Santa Anita and I suspect more than one horse will turn in an "Arrogate in the San Diego" so it actually makes me more wary of Santa Anita horses than I would normally be.

One question I have is which track is most similar to Del Mar? It seems like there may not be a good correlation out there?
Kennedy
 
Posts: 953
Joined: Thu Sep 12, 2013 9:58 pm

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Tessablue » Mon Oct 30, 2017 11:32 am

Kennedy wrote:Fantastic work Tessa. I love reading thoughtful analysis and wondering how it might play into results this year.

I do have a clarifying question about the data. Was this based on odds (as in a 10/1 has xx% chance of winning) or was it based on their odds relative to the other contestants (ie...they were the 4th choice and ran 5th)

I think the pre-identification of regional bias is a helpful tool to have in the belt and I think your numbers state it well. One of the things that I will generally lean towards is horses who have a preference on the host track or on a similar track. The trick here is that since Del Mar is such a seasonal track there is really only a small subset of each field that actually has any form over the track at all much of it is unknown. But I am also careful not to view this as some kind of exclusion factor. There are good reasons to go against regional bias.

Also Del Mar is not Santa Anita and I suspect more than one horse will turn in an "Arrogate in the San Diego" so it actually makes me more wary of Santa Anita horses than I would normally be.

One question I have is which track is most similar to Del Mar? It seems like there may not be a good correlation out there?
Thank you, I was looking to hearing your thoughts on this! The odds are just win odds to $1- I thought about doing ranked odds, but I chose to do it this way because of two main considerations: I wanted to capture the big range in expectations between a low-priced favorite and a lukewarm one (so you could reasonably say that a 6/5 shot is "expected" to win but a 5-1 favorite isn't really), and the nature of parimutuel betting means that the incorporation of win odds + field size sort of does half of the math for me already. But I do think there needs to be a ranking aspect included because otherwise it sort of presupposes that races all follow a similar distribution of odds. In its current form, I really wouldn't trust this formula unless it's across massive datasets like the ones here. As it is now, it actually produces a pretty lovely frequency distribution with about a third of the datapoints between -9.00 and +6.00. Certainly it's a work in progress- in addition to adding odds rankings, I'd also like to incorporate beaten lengths. Haven't quite figured out how to do that yet.

And those are great questions- I'm hoping people who are more familiar with Del Mar can help us out here, because I don't know much about it. I don't really trust the Del Mar data I have for these purposes (n = 18 which isn't great compared to the big three), but Del Mar preppers have a median performance of -9.58 at SA/Hollywood (-13.10 for winners, -1.40 for losers). However, any horse who prepped at Del Mar is also coming into the BC off a layoff, which complicates things.

Upon seeing the Belmont results, I had an inclination to blame the shipping and the different climates- but we know that European horses have great success in California (or do they? haven't looked at turf races yet), so that doesn't feel quite right either. I'm also fascinated by how substantial the effect is for the Distaff, but that could be a side effect of the numerous small Distaff fields we've had over the years. And if this exercise has taught me anything, it's that prep location and track preference doesn't matter so much in the Sprint. I was pretty glad to find it, because I think Takaful is one of the more intriguing longshots right now!
Tessablue
 
Posts: 3021
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Somnambulist » Mon Oct 30, 2017 12:43 pm

Belmont may be 1 turn for it's preps, but that is one big turn...

We've had a very unseasonably warm fall this year. I hit 90 or close to more than once (along with being humid) and only over the past week have I felt I needed a light jacket in the morning. If it is climate based I wonder if the Belmont shippers do better this year. Diversify is going to be a hole, but honestly, I felt he should have skipped anyway. Smart move.

Belmont has shuffled around it's BC prep schedule more than I can recall any other track doing it, but that might be just because I would care more when they run in terms of trying to arrange to go to Belmont.
"Life's no piece of cake, mind you, but the recipe's my own to fool with."
User avatar
Somnambulist
 
Posts: 6969
Joined: Thu Sep 12, 2013 5:59 pm

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Kennedy » Mon Oct 30, 2017 3:25 pm

Tessablue wrote:
Kennedy wrote:Fantastic work Tessa. I love reading thoughtful analysis and wondering how it might play into results this year.

I do have a clarifying question about the data. Was this based on odds (as in a 10/1 has xx% chance of winning) or was it based on their odds relative to the other contestants (ie...they were the 4th choice and ran 5th)

I think the pre-identification of regional bias is a helpful tool to have in the belt and I think your numbers state it well. One of the things that I will generally lean towards is horses who have a preference on the host track or on a similar track. The trick here is that since Del Mar is such a seasonal track there is really only a small subset of each field that actually has any form over the track at all much of it is unknown. But I am also careful not to view this as some kind of exclusion factor. There are good reasons to go against regional bias.

Also Del Mar is not Santa Anita and I suspect more than one horse will turn in an "Arrogate in the San Diego" so it actually makes me more wary of Santa Anita horses than I would normally be.

One question I have is which track is most similar to Del Mar? It seems like there may not be a good correlation out there?
Thank you, I was looking to hearing your thoughts on this! The odds are just win odds to $1- I thought about doing ranked odds, but I chose to do it this way because of two main considerations: I wanted to capture the big range in expectations between a low-priced favorite and a lukewarm one (so you could reasonably say that a 6/5 shot is "expected" to win but a 5-1 favorite isn't really), and the nature of parimutuel betting means that the incorporation of win odds + field size sort of does half of the math for me already. But I do think there needs to be a ranking aspect included because otherwise it sort of presupposes that races all follow a similar distribution of odds. In its current form, I really wouldn't trust this formula unless it's across massive datasets like the ones here. As it is now, it actually produces a pretty lovely frequency distribution with about a third of the datapoints between -9.00 and +6.00. Certainly it's a work in progress- in addition to adding odds rankings, I'd also like to incorporate beaten lengths. Haven't quite figured out how to do that yet.

And those are great questions- I'm hoping people who are more familiar with Del Mar can help us out here, because I don't know much about it. I don't really trust the Del Mar data I have for these purposes (n = 18 which isn't great compared to the big three), but Del Mar preppers have a median performance of -9.58 at SA/Hollywood (-13.10 for winners, -1.40 for losers). However, any horse who prepped at Del Mar is also coming into the BC off a layoff, which complicates things.

Upon seeing the Belmont results, I had an inclination to blame the shipping and the different climates- but we know that European horses have great success in California (or do they? haven't looked at turf races yet), so that doesn't feel quite right either. I'm also fascinated by how substantial the effect is for the Distaff, but that could be a side effect of the numerous small Distaff fields we've had over the years. And if this exercise has taught me anything, it's that prep location and track preference doesn't matter so much in the Sprint. I was pretty glad to find it, because I think Takaful is one of the more intriguing longshots right now!


I wonder if it might be easy to translate these findings into Impact Values? So you take a 10/1 shot and say that they are expected to win xx% of the time based strictly on the fact they they are 10/1. Then you could potentially isolate 10/1 entrants who last prepped at Belmont vs the overall 10/1 average. Or the same with Santa Anita. It may be interesting to draw out if a 10/1 from California is a "better" or more likely winner than a 10/1 from another locale (when the BC is hosted in Cal). Or is it even the other way around? Do the locals take more money and a 10/1 from another regional base is actually a better play?

This may be particularly interesting to note with Europeans. How many times have we seen well backed Europeans lose to other Europeans in the BC who are actually quite good but for whatever reason aren't as well received at the windows. A lot of the European winners are not the odds on choices.
Kennedy
 
Posts: 953
Joined: Thu Sep 12, 2013 9:58 pm

Re: Prepping for the Breeders' Cup: A Statistical Analysis

Postby Somnambulist » Mon Oct 30, 2017 3:44 pm

Kennedy wrote:This may be particularly interesting to note with Europeans. How many times have we seen well backed Europeans lose to other Europeans in the BC who are actually quite good but for whatever reason aren't as well received at the windows. A lot of the European winners are not the odds on choices.


The betting public is interesting (Cuvee being listed as an example of an under performing comes to mind) and there is seriously way more attention to the lead up of these races than there is most others, save the TC. People watch works and formulate opinions based off weeks (which I do NOT agree with) of trainer comments, works, dozens of articles, and worse now... social media. It garners more way more casual attention from the national as a whole.

Who the betting public decides to stand behind is an exercise in collective decision making I'd like to see expounded on more. For the 20 or so people globally who care.

TB, honestly I'd like to see your formula applies to claiming and allowance races because I view the TC races and the BC to be an anomaly in terms of how people come to their decisions.
"Life's no piece of cake, mind you, but the recipe's my own to fool with."
User avatar
Somnambulist
 
Posts: 6969
Joined: Thu Sep 12, 2013 5:59 pm

Next

Return to Racing

Who is online

Users browsing this forum: Bing [Bot] and 5 guests