Practical Unit vs. Unit Efficiency Ratings
- 2 Comment
In my last post I looked at theoretical unit vs. unit efficiency ratings. Now I will take a more practical look at calculating these ratings.
In the theoretical case we made the assumption that we could know for sure a 5-player unit’s shot distribution, 2pt and 3pt shooting percentages, and turnover and rebounding rates. Clearly, however, this is not the case. In reality we can, at best, estimate these proportions.
Gathering The Data
To estimate these proportions we need data. With this in mind, I wrote an event parser that looks at each unit’s events on a per-play level, and it separates home and away data. This allowed me to pull out actual successes and failures for the shot distribution, shooting percentages, and rebounding rates.
With this data I am able to estimate the true rates for each 5-player unit.
I have chosen to use Bayesian statistics to model these proportions. For the play distribution (2pt shot vs. 3pt shot vs. turnover), I have used the Dirichlet distribution to model this multivariate proportion. For all other cases, I have used the Beta distribution to model the other proportions.
Also I have used a Uniform prior distribution, so it’s not a good idea to look at 5-player units with small sample sizes, as they miss out on the basic nature of the game of NBA basketball. So this is something that could be improved upon in the future.
An Example Simulation
Now that each proportion is modeled by a distribution instead of being assumed to be known, I have modified the theoretical code to simulate from these distributions so that the uncertainty in the actual proportions is taken into account.
For this example I have used 2007-2008 regular season data from the same units used in the theoretical case: the Celtics unit consisting of Ray Allen, Kevin Garnett, Kendrick Perkins, Paul Pierce, and Rajon Rondo, and the Lakers unit consisting of Kobe Bryant, Derek Fisher, Pau Gasol, Lamar Odom, and Vladimir Radmanovic. Now, however, I have used only the Celtic’s home data and the Laker’s away data.
The simulation results are:
Celtics Offensive Efficiency: Mean=110.9; SD=12.1
Lakers Offensive Efficiency: Mean=108.9; SD=12.6
Celtics Win%: 54.8%
As would be expected, the Celtic’s 5-player unit at home should be small favorites over the Lakers 5-player unit on the road.
One thing I take away from this result is the importance of the bench. The Celtics aren’t huge favorites with this unit, so the bench clearly has an important role in a game between these two teams. I know, not groundbreaking stuff here, but this result helps to support this classic basketball belief.
Reproduce These Results
These results can be reproduced using this R code: practical_unit_vs_unit.R
To use this code, simply start R and run: source(”practical_unit_vs_unit.R”)
By default 10,000 games of 100 possessions each are run. You can modify the code to change these values by modifying the nsims and nposs variables.
The Next Step
With this practical model in place, there are some areas for improvement.
The first step would be to take into account informative prior distributions for each proportion. Using historical data we should be able to narrow the range of possible values for the true proportion of these events. For example, it’s highly unlikely a 5-player unit’s true offensive rebounding rate is say 50%. Therefore we should be able to narrow the distributions based on the data if we take this sort of common sense into account. This also allows us to get a better handle on units with small samples.
A second area for improvement would be to take each unit’s opponent strength into account, similar to what you might find in a college rating system. Time will tell how large of an issue this would be for a full season’s worth of data, but surely a small sample would garner better results taking opponent strength into account.
This leads into another important area for improvement: the unit vs. unit interaction effects. Using the adjusted for strength data, what is the best estimate of the proportion to expect between these units? My naive method assumes it’s the mean. Clearly this is a lazy approach. How can this be improved?
With all of these things in mind, my next step will be to make each team’s 5-player unit data available on this website so that it can be plugged into the practical code for comparison. I have all of the unit data for the 2007-2008 season, so once that is available the plan is to have the 2008-2009 data update on a daily basis.