Predicting Horse Race Winners Using Advanced Statistical Methods

Predicting Horse Race Winners Using Advanced Statistical Methods

Share This Content
Conditional Logistic Regression with Frailty applied to predicting horse race winners in Hong Kong.

Since first proposed by Bill Benter in 1994, the Conditional Logistic Regression has been an extremely popular tool for estimating the probability of horses winning a race.

I propose a new prediction process that is composed of two innovations to the common CLR model and a unique goal for parameter tuning . First, I modify the likelihood function to include a "frailty" parameter borrowed from epidemiological use of the Cox Proportional Hazards model. Secondly, I use a LASSO penalty on the likelihood, where profit is the target to be maximized. (As opposed to the much more common goal of maximizing likelihood.)

Finally, I implemented a Cyclical Coordinate Descent algorithm to fit the model in high-speed parallelized code that runs on a Graphics Processing Unit (GPU), allowing me to rapidly test many tuning parameter settings.

Historical data from 3681 races in Hong Kong were collected and a 10-fold cross validation was used to find the optimal outcome. Simulated betting on a hold out set of 20% of races yielded a return on investment of 36.73%.