A Theoretical Model for The Probability of Winning a Basketball Game – Part 1
- 1 Comment
This is the first in a 3 part series where I will present a theoretical model for the probability of winning a basketball game. The 3 parts will break this model down at the team, unit, and player level.
Before diving right into the data and trying to build new models of the game, I feel it will be worthwhile to try and present theoretical models of the game as best I see them.
I see this as being beneficial for two reasons:
- The first, and in my mind most important, reason for doing this is that it will clearly show that all models are wrong. By having a theoretical model as a guideline, we can better understand what a practical model does and (more importantly) does not capture. This way we can better understand the inferences and predictions we’re making with any specific model that’s created.
- The second reason has to do with the fact that I know what I don’t know. And what I don’t know is everything that goes into every possible theoretical model of the game of basketball. I am hoping obvious mistakes in these theoretical models will be pointed out by informed readers.
The purpose of this theoretical model is to show how the probability of winning a basketball game is derived. Instead of trying to start at the player level (the most basic components of a team), I will instead start with the proportion of wins and losses between two imaginary basketball teams. This will allow me to construct a top down view of the model. The end goal is to work on a player level to understand how a team’s probability of winning changes based on these most basic components.
It’s worth noting that you can’t start at the player level when constructing this model. Well you can, but you are probably going to miss out on the big picture if you do. Starting from the top will allow us to better understand how the components (players) interact.
The Long Run Proportion
In a perfect world we would know how often Team A beats Team B in an imaginary game. With our current level of technology, however, we can’t just conjure up games between Team A and Team B until the law of large numbers gives us an idea as to the true probability of Team A beating Team B.
Theoretically, however, we can say that we have this proportion. I know that Team A will beat Team B with probability p, and I know that Team B will beat Team A with probability q.
Now I will define a distribution that allows us to calculate these probabilities.
The Margin of Victory Distribution
One way to calculate these probabilities of winning would be with a discrete distribution I call the margin of victory distribution. This distribution represents the probability Team A wins by 1,2,3,…,n points (where n is Team A’s largest margin of victory). It also represents the probability Team B wins by 1,2,3,…,m points (where m is Team B’s largest margin of victory).
This is a good distribution to use as it fully represents the probabilities of winning, p and q.
These margins of victory come from the obvious: points scored by each team in every possible game.
Points Scored in Each Game
These margins of victory come from what should be familiar to everyone. They come from the actual difference between Team A’s points scored and Team B’s points scored. Because you can only score points by making free throws, 2pt shots, and 3pt shots, the formula for points scored is easy to calculate:
Points Scored = FTM x 1 + 2FGM x 2 + 3FGM x 3
This formula will always give you the number of points each team scored in the game.
We’re getting closer to identifying the next layer, the 5-player units, but the points scored formula must be broken down into its most basic component: points scored per play.
Points Scored per Play
Points scored per play gets to the heart of how teams score points. I would like to note that I leave out points scored by way of technical free throws made from the plays themselves. This is done to limit the theoretical maximum points per play to 5 (a made 3 point field goal followed by two made free throws that are the result of a flagrant foul; I don’t think this has ever happened, but it is theoretically possible). It is also done because of the nature of technical fouls that can be called on either the offensive or defensive team for a variety of reasons.
For the sake of clarity, the points scored formula becomes:
Points Scored = (Σ FTMi x 1 + 2FGMi x 2 + 3FGMi x 3) + (Technical FTM x 1)
Where the first part is summed over the index i that starts at 1 and ends at the total number of plays. With points scored broken down in this fashion, we can finally break down points scored into the 5-player unit level.
Points Scored per 5-player Unit per Play
That’s a mouthful, but we can now look at the points scored per 5-player unit per play. This turns the point scored formula into:
Points Scored = Σ (Σ FTMij x 1 + 2FGMij x 2 + 3FGMij x 3) + (Technical FTMi x 1)
The points scored formula now has two summations that are performed as follows: i is the index corresponding to the 5-player unit and j is the index corresponding to each play of the 5-player unit.
It took a few steps to get here, but we’ve now approached a point where we can begin looking at how a 5-player unit scores points on a play. That will be the subject of part 2 of this 3 part series.
I have tried my best to be as technically accurate as possible with the derivation from the probability of winning down to the points scored per 5-player unit per play. I appreaciate any comments and feedback that would help refine this theoretical model.