Ranking the Derbys: A Quantitative Analysis

Re: Ranking the Derbys: A Quantitative Analysis

Postby Treve » Mon Apr 10, 2017 11:48 am

Tessablue wrote:
peeptoad wrote:How'd you get the information on the environmental parameters? I've never actually looked for that myself.
I guess my initial thought, assuming all this is accurate (no offense intended of course) is that some of the more touted or remembered Derby winners didn't exactly run the strongest of Derbies
I guess that's just further proof that the Derby is not the end-all be-all, and those horses were touted for different reasons.

Historical weather charts on Weather Underground. My life is deeply boring.

I did end up researching what other figure-makers do with wind speed, and the answer seems to be "just pick something and pray," because nobody really knows how to handle it. Wind speed is a fairly minor consideration here and I didn't take direction into account, in part because the horses make one full circuit of the track and in part because I can't do physics. But there were some interesting finds in this historical information: the strongest winds going back to 1970 were actually in 2008. If I had to pick a conclusion from here, it's that Big Brown was terribly underappreciated.
Forry Cow How wrote:Thanks for posting this. I find these scales fascinating and not something I could do myself. I'm NOT surprised Secretariat was a super horse. :lol: I was surprised that American Pharoah was so far down the list. And that Sunday Silence was last. Will be interesting to see how this year's Derby winner fits on the list.

Yeah, American Pharoah's race wasn't fast by really any figure standards that I can find. One aspect which may have affected this was ground- he ran about 29 feet further than Firing Line and 69 more than Dortmund- but that information only goes back to 2011 so unfortunately it doesn't really fit.

One funny thing about Sunday Silence: along the way, I actually found a couple articles from right after the Derby complaining about his performance! That race was very hyped and very disappointing to a lot of people. I suspect the Preakness made up for it, however ;)

(there is an adjustment for mud, but all muddy tracks are not the same so it's a tough factor to work with. )


Ah I'm glad you answered this because on a related note I was wondering myself if you took in Post Position into account when rating these KYDs. Maybe that could be a parameter to add if it isn't already - ie a horse winning from a post with a high win rate would be rated slightly down while a horse rating up would have to break from a stall with a low win rate, or be the sole winner from that gate #.
A filly named Ruffian...

Eine Stute namens Danedream...

Une pouliche se nommant Trêve...

Kincsem nevű kanca...


And a Queen named Beholder
User avatar
Treve
 
Posts: 3954
Joined: Fri May 08, 2015 5:12 pm

Re: Ranking the Derbys: A Quantitative Analysis

Postby Kennedy » Mon Apr 10, 2017 11:56 am

Awesome work Tessa. This is right up my alley and I really wish my life had a little more boredom in it because this is just the kind of project I love to jump in to!

The first question I have is what method do you use to produce a projection? I've been dabbling with it this year since chef-de-race went offline and took their projections with them. But because I'm not that smart I really just ended up with a straight projection based off the fractions and beaten lengths at any call. It feels inexact though and I'd be interested to hear how you actually accomplish the "math".

One thought I had in terms of the Derby decline is to actually do a correlation study between the number of starts each winner had prior to the Derby.

I've been thinking about a theory in concept but what has a greater impact on a horses ability to run really fast? Age, current fitness or experience?
Obviously all three matter and it's tough to make a blanket statement for all horses. But the general decline in Derby performances may have a correlation to the fact that the runners in the Derby are more often making their 5th lifetime start than their 10th.

How many horses run their best lifetime effort in their 5th start?

I personally don't think horses are getting worse but I do wonder if the Derby is now scheduled at a time of year that is "sooner" in terms of the entrants development than in years past.
Kennedy
 
Posts: 1025
Joined: Thu Sep 12, 2013 9:58 pm

Re: Ranking the Derbys: A Quantitative Analysis

Postby Tessablue » Mon Apr 10, 2017 1:41 pm

Treve wrote:Ah I'm glad you answered this because on a related note I was wondering myself if you took in Post Position into account when rating these KYDs. Maybe that could be a parameter to add if it isn't already - ie a horse winning from a post with a high win rate would be rated slightly down while a horse rating up would have to break from a stall with a low win rate, or be the sole winner from that gate #.

That's an interesting consideration, but it would be very difficult to add because it would involve an element of speculation. There are two main challenges here: 1) the difficulty of certain post positions has changed over time (for example, the rail is now an almost guaranteed loss with modern field sizes, but previously it produced the most winners) and 2) there is considerably less data for outside vs. inside posts. There's also the fact that certain outside posts, such as the 14 and 15, are considered more advantageous than others. It's certainly something worth thinking about, thanks!

One way to get around that might be to include a field size consideration, but I'm not sure how to go about quantifying the effects. There were some very small fields in the 70's that may have influenced these figures by virtue of stretching the field out, but I'm not sure it's a big enough problem to merit a change at this time.
Tessablue
 
Posts: 3379
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Ranking the Derbys: A Quantitative Analysis

Postby Tessablue » Mon Apr 10, 2017 2:16 pm

Kennedy wrote:Awesome work Tessa. This is right up my alley and I really wish my life had a little more boredom in it because this is just the kind of project I love to jump in to!

The first question I have is what method do you use to produce a projection? I've been dabbling with it this year since chef-de-race went offline and took their projections with them. But because I'm not that smart I really just ended up with a straight projection based off the fractions and beaten lengths at any call. It feels inexact though and I'd be interested to hear how you actually accomplish the "math".

One thought I had in terms of the Derby decline is to actually do a correlation study between the number of starts each winner had prior to the Derby.

I've been thinking about a theory in concept but what has a greater impact on a horses ability to run really fast? Age, current fitness or experience?
Obviously all three matter and it's tough to make a blanket statement for all horses. But the general decline in Derby performances may have a correlation to the fact that the runners in the Derby are more often making their 5th lifetime start than their 10th.

How many horses run their best lifetime effort in their 5th start?

I personally don't think horses are getting worse but I do wonder if the Derby is now scheduled at a time of year that is "sooner" in terms of the entrants development than in years past.

Ha, thanks, I have to say I thought of you while writing this!

Regarding the methodology, some of it honestly is pretty hazy because it was several years ago and my notes are poorly organized. I know that I started by looking for the best predictor of final time. I looked at pretty much every fraction and combination of fractions, but ended up finding that 6f split had the best correlation with final time (and this relationship is surprisingly linear, all polynomial curves were basically the same). Assuming that exceptional and poor performances would distort this correlation, I took out the outliers and wet tracks to get a better line of fit and generated a starter linear equation. So as a disclaimer, I realize this is a recursive analysis, which is terrible! But there really is no dataset like the Derby, so I'd be hesitant to apply fractional correlations gathered from other races. I'm eager to keep collecting these over the years to refine the equation, and thus far it has proved a fairly reliable BSF predictor- for reference, I heard from a number of people in 2014 that their own self-generated figures equated to about a 103 beyer.

The wind stuff is fairly uninteresting- I just looked at whether or not races with high winds fell above or below that line of fit, then looked for a correlation between wind speed and reduced final time. It isn't huge, but it's there and again it's pretty linear, so I added that on as a modifier to final time. Finally, I researched what other people do with track surface and came up with a series of modifications for wet tracks, topping off at +1.3 seconds for sloppy (which lines up pretty well with Beyers again, although slop is tough to work with because it's always a different surface). This allowed me to generate the predicted vs. actual times and work from there.

So this is all basically a way to generate a Beyer without knowing track variant, and it worked pretty nicely for that, but I wanted another method for assessing quality, which is where the final margins came in. Under the assumption that fast races results in big margins, if not big win margins, I wanted to look at beaten lengths. After working with a lot of possibilities, I decided to average the beaten margin of 3rd and 6th place. I didn't want to punish horses for beating quality opponents by small margins, so I didn't do win margin. However, adding the 3rd place margin gives those big winners a slight bonus, while going to sixth gives a bonus to horses for stringing the field out late (incidentally, I found while going through the charts that 1st to 3rd is variable, but there's usually a bunching around 4th to 6th). My thought process here was that while field quality varies from race to race, it's likely about even by the time you get to that sixth tired horse. Unsurprisingly, most of the fastest races also had the biggest margin bonuses- but it helped smooth out some of the irregularities, such as Animal Kingdom's number (which was originally very high). This aspect has not been perfect because I think it punishes some horses too much, so I'd love to keep fiddling with it to improve it.

Long story short for the rest of it, I weighted margin by 1.5x, normalized the predicated vs. actual figure, combined the two, then ranked the horses against known beyers, looked for identical rankings and used that to generate a linear relationship between the two (and it was incredibly linear, like r squared of .99, which was interesting). So technically, these predicted beyers are relative to the assumption that Monarchos got a 116 and Giacomo got a 100, plus a few others. Even if not entirely accurate, this was a fun way to look at old races, and I think I might extend it back to 1960 (when the first "modern" race times started to pop up in the Derby). Another nice feature is the fact that this method can be applied to all Derby horses, not just winners.

As for career starts, that would be really interesting to look at! Perhaps I could see whether there is a relationship between how many starts they have made and what figure they receive. It's very hard to control for developmental stage, but it would make sense that horses start to run slower if they don't have the same foundation or aren't as far along developmentally.
Tessablue
 
Posts: 3379
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Ranking the Derbys: A Quantitative Analysis

Postby Treve » Mon Apr 10, 2017 2:34 pm

Tessablue wrote:
Treve wrote:Ah I'm glad you answered this because on a related note I was wondering myself if you took in Post Position into account when rating these KYDs. Maybe that could be a parameter to add if it isn't already - ie a horse winning from a post with a high win rate would be rated slightly down while a horse rating up would have to break from a stall with a low win rate, or be the sole winner from that gate #.

That's an interesting consideration, but it would be very difficult to add because it would involve an element of speculation. There are two main challenges here: 1) the difficulty of certain post positions has changed over time (for example, the rail is now an almost guaranteed loss with modern field sizes, but previously it produced the most winners) and 2) there is considerably less data for outside vs. inside posts. There's also the fact that certain outside posts, such as the 14 and 15, are considered more advantageous than others. It's certainly something worth thinking about, thanks!

One way to get around that might be to include a field size consideration, but I'm not sure how to go about quantifying the effects. There were some very small fields in the 70's that may have influenced these figures by virtue of stretching the field out, but I'm not sure it's a big enough problem to merit a change at this time.


That's true, I guess one could calibrate/create a median per decade/field size but that definitely requires a lot of extra free time. Looking at a list dating from 2014, the winningest post at that point was #10 with 11.5% win rate. When Chrome won, he broke from #5, which has the second highest win rate. This of course only goes back to 1930 when starting gates were first used for the KYDerby. I don't think it's overall a huge determining factor when assessing the overall quality of a Derby winner but it certain can say something when a horse overcomes a supposedly bad position to go on and win anyway, I think.
I do like the other things brought up - career times, steroids use etc.
A filly named Ruffian...

Eine Stute namens Danedream...

Une pouliche se nommant Trêve...

Kincsem nevű kanca...


And a Queen named Beholder
User avatar
Treve
 
Posts: 3954
Joined: Fri May 08, 2015 5:12 pm

Re: Ranking the Derbys: A Quantitative Analysis

Postby Grade1 » Mon Apr 10, 2017 3:15 pm

Tessablue,

In May 1979, Andy Beyer wrote an article, “The Bid Belongs in Best -- Bid Ranks Right Behind Secretariat”, giving the adjusted times for each Derby winner from 1972-79, based on track variants he derived.

Secretariat 2:00
Affirmed 2:01
Spectacular Bid 2:01 1/5
Riva Ridge 2:01 4/5
Seattle Slew 2:02
Bold Forbes 2:02 1/5
Foolish Pleasure 2:02 2/5
Cannonade 2:04 1/5

You could compare the adjusted times, or the equivalent BSF differences, to your figures for a second opinion on those eight horses. You could also determine Beyer's track variants. For instance, I take Secretariat's adjusted time to mean that the track was 3/5 of a second faster than the average Derby track for the 1972-79 period.

I don't know whether you already have the original BSFs for Secretariat and Seattle Slew: 129 and 112. These figures can't be directly compared to today's figures, but the difference is close to what the adjusted times would imply.
The only place you find free cheese is in a mousetrap.
Grade1
 
Posts: 33
Joined: Sun Sep 15, 2013 1:51 pm
Location: USA

Re: Ranking the Derbys: A Quantitative Analysis

Postby luvsgeldings » Mon Apr 10, 2017 3:49 pm

thanks Tessa for this info! - makes me wonder what Beyer Sham would have gotten, given his final time running 2nd to Big Red that year - and I found the possible differences in Beyer's interesting for some of the derby winners.
luvsgeldings
 
Posts: 801
Joined: Sat Mar 22, 2014 6:18 pm

Re: Ranking the Derbys: A Quantitative Analysis

Postby Tessablue » Mon Apr 10, 2017 5:04 pm

Grade1 wrote:Tessablue,

In May 1979, Andy Beyer wrote an article, “The Bid Belongs in Best -- Bid Ranks Right Behind Secretariat”, giving the adjusted times for each Derby winner from 1972-79, based on track variants he derived.

Secretariat 2:00
Affirmed 2:01
Spectacular Bid 2:01 1/5
Riva Ridge 2:01 4/5
Seattle Slew 2:02
Bold Forbes 2:02 1/5
Foolish Pleasure 2:02 2/5
Cannonade 2:04 1/5

You could compare the adjusted times, or the equivalent BSF differences, to your figures for a second opinion on those eight horses. You could also determine Beyer's track variants. For instance, I take Secretariat's adjusted time to mean that the track was 3/5 of a second faster than the average Derby track for the 1972-79 period.

I don't know whether you already have the original BSFs for Secretariat and Seattle Slew: 129 and 112. These figures can't be directly compared to today's figures, but the difference is close to what the adjusted times would imply.

Ooh this is a treasure trove, thank you! I was not aware that he had assigned a 129 to Secretariat, so I'm happy to see that they match up nicely. I'm a bit surprised to see Slew up that high because I read that his victory was not well-received contemporarily, but that's an especially interesting figure given what I know about Beyer's criticism of Slew back in that time! I know his figure suffered through my method because it wasn't super fast and his beaten lengths margins were among the lowest in the group (Charismatic rated last on this scale, as he was remarkably only 2.75 lengths ahead of sixth in that year). I don't currently have enough time to play around with those adjusted times, but I'll certainly do so for comparison's sake later. Thanks again for sharing them!

I do wonder, and have wondered in the past, just how reliable track variant is for Derbys. These days, by the time the race has come around there has been an almost two-hour gap since the last dirt race, in addition to changes in track maintenance, weather, and the current inevitability of few if any other two-turn dirt races in the card. Part of my motivation here was to look for alternative methods of evaluation because track variant is so difficult on Derby day, as said by the figure-makers themselves. I wonder if these calculations have become easier or more difficult over time?

luvsgeldings wrote:thanks Tessa for this info! - makes me wonder what Beyer Sham would have gotten, given his final time running 2nd to Big Red that year - and I found the possible differences in Beyer's interesting for some of the derby winners.
Through this method, he gets a +0.79 which is about a 123 :)

Unrelated, but some efforts that I came to appreciate much more upon this exercise: Unbridled, Spend a Buck, and Genuine Risk.
Tessablue
 
Posts: 3379
Joined: Fri Sep 13, 2013 11:29 am
Location: Boston

Re: Ranking the Derbys: A Quantitative Analysis

Postby Apollo » Mon Apr 10, 2017 6:23 pm

The only aspect that interested me was the certainty that the fastest track conditions would be at the top and the slowest would nearer the bottom but not absolute bottom.

Speed ratings never fully account for the extremes, on either end but especially at the top. The fastest horses will take advantage of freeway conditions to post numbers that overstate their ability since mediocre horses in earlier races basically can't take full advantage of any type of condition. Therefore the overall track variant is not slanted as actual. I've bet enough races in other sports like swimming and speed skating to be fully aware of how that translates anywhere. When a notoriously fast speed skating oval like Calgary or Salt Lake City is on the docket the times will be projected as fast, but the final results will be even faster than that. Likewise when a slow heavy outdoor oval like Lake Placid is the venue the final times are assumed as slow and they'll be even slower.

From my memory the fastest track conditions were throughout 1973 and especially at Belmont Park. That was confirmed years later when I was in college and read Andy Beyer's book Picking Winners. I immediately went to the library and started compiling track variants using any reference that was available. At that point it was nearly a decade later but 7 of the 11 standing main track records at Belmont survived from a very short window from 1973. I did track variants for maybe a dozen tracks across the country. None of the other tracks had anything approaching that, so many records from a short time frame that had failed to be eclipsed for almost 10 years. It couldn't be coincidence or normalcy. Then during my Las Vegas decades I bet the over/under on final times almost every year, and I certainly was aware of those over/under times even if I didn't wager. In this thread I looked quickly for the Monarchos and Spend a Buck years because those were the two years in which the bettors pounded the under on the final time of the Derby prop. There were only two joints booking that prop in 1985 but by the Monarchos year the prop was all over the place and got slammed down. More often than not that Derby time prop is bet toward the over, and likewise with the Preakness and Belmont props. I think I've mentioned that I lost a ton when Bally's put up a seemingly low 2:27 on the 1988 Belmont over/under. It got bet up a full second to 2:28 but I didn't hedge at all. Then Risen Star ran away and ruined all of us with over tickets.

Sunday Silence wandered all over the place down the stretch of the 1989 Derby. Charlie Whittingham in post race interviews said he ran green but didn't seem particularly bothered by it. However, that serpentine stretch run became a big overblown topic leading to the Preakness.
Apollo
 
Posts: 289
Joined: Thu Sep 26, 2013 1:05 pm

Re: Ranking the Derbys: A Quantitative Analysis

Postby luvsgeldings » Mon Apr 10, 2017 11:33 pm

oh Tessa.... thanks for that fig for Sham - love it! that's so outstanding - he certainly deserved it - thanks again so much for the info!!
luvsgeldings
 
Posts: 801
Joined: Sat Mar 22, 2014 6:18 pm

PreviousNext

Return to Racing

Who is online

Users browsing this forum: No registered users and 10 guests