Ever since this year’s March Madness I’ve become really fascinated with how sport betting lines are created, and how one would use existing data to create predictive models for outcomes.
Here’s a first attempt at a formula to generate two scores to compare. It omits many crucial data points right now, but just to throw something together I picked a few key variables, like total Goals For, Goals Against, PTS, Shots on Goal; also PTS in the last 2 games and in the last 5 games, to measure a team’s momentum.
The Score — which is super abstract right now and I’m not sure really sure what to ground it around — is…
The Sum of:
- 5 times Average Goals Scored per Game minus Average Goals Against per Game
- 15% of Average Shots on Goal per Game
- Total Points earned divided by Games Played
- Points earned in last 5 games
- 3 times Points earned in last 2 games
In the case of STL vs. LA, they’re pretty evenly matched. But in looking at the last 5 games each team has played, STL has won only 2 games (currently on a 2 game losing streak), while LA has won 3 games (including their last one). I’m not sure if I’m putting too much weight on the momentum factor or not in the current formula, but you’d expect LA to have some advantage because of this difference. Again, many other variables to incorporate, including the differences in stats between home and away games.
I signed up for the Johns Hopkins School of Public Health’s “Mathematical Biostatistics Boot Camp” class on Coursera, too. Excited to see what I can learn from that.