Theoretical Unit vs. Unit Efficiency Ratings
 5 Comment
So lets say you visit your local magic shop and find a device that will magically give you the true probabilities associated with a 5player unit’s offensive and defensive performance. You know, the important probabilities associated with things like offensive and defensive field goal percentages, rebounding percentages, and turnover rates.
As cool as it is to have this information, you greedily want more. This is understandable, of course, because you want to know what these 5player units look like in terms of offensive and defensive efficiency. The answer to your woes? Simulation, of course!
The Basic Model
The basic premise is that you somehow know the distribution of shots and turnovers for each 5player unit. Also, you know the probabilities associated with making shots and free throws, grabbing boards, and turning the ball over for these 5player units.
With this information in hand, the simulation of a basic model of a basketball game can be used to analyze offensive and defensive efficiency ratings for two competing units.
An Example
To get an idea of how this works, I used some statistics for the most used 5player units from the Celtics and Lakers from the 20072008 season. The Celtics most used unit consisted of Ray Allen, Kevin Garnett, Kendrick Perkins, Paul Pierce, and Rajon Rondo. The Lakers most used unit consisted of Kobe Bryant, Derek Fisher, Pau Gasol, Lamar Odom, and Vladimir Radmanovic.
Using my data, I crudely approximated various offensive and defensive statistics for these units. These are what I assumed to be the true rates. (Clearly these are not the true rates of these statistics, but they are the best to use to illustrate the idea.)
I then merged these statistics together to obtain the true rate for this matchup. As an example, suppose the Celtics make 50% of their 2pt shots and the Lakers allow their opponents to make 55% of their 2pt shots. Clearly some sort of probability distribution is best suited for estimating the true rate in this matchup, but for this simple theoretical case I simply assumed it is the mean of these two values. Hence the Celtics true 2pt field goal percentage would be (50%+55%) / 2 = 52.5%.
Using these statistics, I simulated 10,000 games between these units. These games were more theoretical in nature, as they simply involved each 5player unit having 100 possessions each.
The Results
Below are the results from the simulations:
Celtics Offensive Efficiency: Mean=115.5; SD=12.2
Lakers Offensive Efficiency: Mean=115.0; SD=12.5
Celtics Win%: 51.5%
Based on the simulations, this appears to be a fairly even matchup. One area I want to explore is how strategy could help seperate one team from the other. This, however, will require a better model than this one.
Reproduce These Results
These results can be reproduced using this R code: theoretical_unit_vs_unit.R
To use this code, simply start R and run: source(“theoretical_unit_vs_unit.R”)
By default 10,000 games of 100 possessions each are run. You can modify the code to change these values by modifying the nsims and nposs variables.
The Next Step
With this basic model in place, the next step is to take into account the uncertainty associated with estimating the statistics used in the simulation. We can’t know for sure the values of the parameters, but we can estimate them with probability distributions. Simulating from these distributions first will allow us to take this uncertainty into account.
5 Comments on this post
Trackbacks

JB H said:
“Clearly some sort of probability distribution is best suited for estimating the true rate in this matchup, but for this simple theoretical case I simply assumed it is the mean of these two values. Hence the Celtics true 2pt field goal percentage would be (50%+55%) / 2 = 52.5%.”
Finding the mean is very wrong here. You need to know the league average.
If the league average is 50%, Team A shoots 60%, and Team B allows 50%, then Team A will shoot 60%, not 55%.
November 27th, 2008 at 1:28 am 
Ryan said:
You make a very good point JB.
In reality we need to know a lot of details, especially since you’re unlikely to have a 5player unit face the league average over the course of a season. So strength of opponents is very important for both sides.
For this theoretical case I use the mean because, even in the perfect example you provide, we can’t be sure of the true proportion of shots this unit will make versus that unit.
At the end of the day we’re still trying to model basketball, so there is going to be some uncertainty around that proportion even if we knew for sure they were 60% against the league average of 50%, and the opponent allows 50% to the league average.
November 27th, 2008 at 1:48 am 
JB H said:
I’m sorry but I don’t understand your point. Because the right way to do it doesn’t predict the future with 100% accuracy you’re going to do it the wrong way?
If you just want a quick shortcut then A + adversary B – environmental mean will be close enough.
November 27th, 2008 at 7:11 pm 
Ryan said:
My point is that there is clearly error involved, and the best solution involves a probability distribution
Does not taking into account opponent strength and ignoring the structure of the probability distribution create more error? Absolutely.
At this point I certainly would not feel this is the ideal model, but it’s a good start for what will come next. Is your shortcut better than my naive method? It sure sounds like it.
My ideal solution would involve some sort of algorithm similar to those used in things like college football rankings. So until I get to that component I’m going to stick with the quick and dirty method and gladly acknowledge that there are better and more correct methods out there.
November 27th, 2008 at 10:14 pm
[…] my last post I looked at theoretical unit vs. unit efficiency ratings. Now I will take a more practical look at calculating these […]