Individual Offensive Efficiency Ratings Extracted from Play-by-Play Data
- 8 Comment
I’m unsatisfied with the usefulness of individual efficiency ratings that estimate the offensive and defensive impact of a player on a lineup’s efficiency by simply controlling for the strength of teammates and opponents. This is because these ratings don’t really give any insight into what the individual players are actually doing. These ratings are simply not well parameterized in the form of the actual things that players do on the court.
Therefore, I’m going to explore the methods of calculating individual ratings that Dean Oliver outlines in Basketball on Paper. The only difference between what Dean has done and what I will do is that I will use play-by-play data instead of box score data.
Dean’s Offensive Efficiency Rating
Before extracting individual offensive efficiency ratings from the play-by-play data, I had to first figure out how to translate Dean’s formulas that estimate ratings from box score data into something I could apply to individual possessions in the play-by-play data. I learned a lot during this process, so I think it is worthwhile to outline the general idea of the formulas here.
Figuring out how many possessions a player is responsible for and the number of points a player produces go hand in hand. Once you know what percentage of a possession a player is responsible for, then you can multiply that percentage by the number of points scored to estimate the number of points the player produced. Note: I actually calculate these in terms of scoring sequences inside of each possession to take care of the rare situations when there is an offensive rebound that leads to points after a missed free throw, where the free throw attempt was the result of a made field goal+shooting foul (these sequences are normalized to ensure a total of one team possession is used).
A player uses a possession and produces points in one of the following ways:
- Making a field goal or free throw
- Assisting on a field goal
- Obtaining an offensive rebound that leads to a made field goal or free throw
- Missed field goals or free throws that are rebounded by the defense
Clearly a player produces zero points when they miss a field goal, miss a free throw, or commit a turnover. When a player makes an unassisted field goal or free throw they receive full credit for using the team possession and producing the number of points scored.
When there is an assist made on a field goal, the player making the field goal receives the following portion of credit:
where eFG% is the effective FG% of the shot attempt. The player assisting on the shot receives the following portion of credit:
Dean’s theory behind these formulas is that easier shots are harder to assist on. Thus if a player assists on an easy shot, they should get more credit than a player assisting on a harder shot.
The last thing we have to take care of is offensive rebounds. I’ll leave you to Appendix 1 of Basketball on Paper for the full theory, but the idea is that when an offensive rebound leads to points we want to give credit to the offensive rebounder that is proportional to how important the offensive rebound is to the team. The formula is:
where TeamOR% is the team’s probability of obtaining an offensive rebounding, and TeamPlay% is the team’s probability of scoring at least one point on a play.
With these formulas we can now give credit to the players when examining each individual possession in the play-by-play data. Credit is given out as each possession is encountered in the play-by-play, so we don’t care about any of the other stuff in Dean’s box score formulas.
Estimating eFG%, TeamOR%, and TeamPlay%
One thing that we do care about is estimating effective FG%, TeamOR%, and TeamPlay%.
Because the play-by-play allows us to obtain details such as the location of shots players assist on, it is important to come up with a reasonable expectation on the eFG% for these shots so that we give credit appropriately. To do this, I fit multilevel models by position for every player for the following shot locations:
- Low Paint Shots: Shots inside the paint <= 6 feet from the hoop
- Short 2pt Shots: Shots <= ~14 feet from the hoop
- Long 2pt Shots: All other 2pt shots
- Corner 3pt Shots: Corner 3pt shots
- Other 3pt Shots: All non-corner 3pt shots
These models give us reasonable expectations on the eFG% of shots and should give better insight into the expected eFG% of all shots that are taken.
To estimate TeamOR% and TeamPlay%, I fit two logistic regressions that used the actual team versus team data. This allows me to estimate TeamOR% and TeamPlay% for any competing teams so that for each game a fair offensive rebounding weight can be calculated.
The Offensive Ratings
The following spreadsheet lists the offensive ratings for each player from the 2008-2009 regular season (including other applicable statistics):
The data is grouped and sorted by teams and players, and it contains the following data:
- Ortg: the player’s offensive efficiency rating
- Usg%: the percentage of possessions used by this player while on the court
- Total Used: the total number of possessions this player used
- %Shots: percentage of possessions used that were shots
- %Free Throws: percentage of possessions used that were free throws
- %Assists: percentage of possessions used that were assists
- Assist eFG%: mean expected eFG% of assists
- %Oreb: percentage of possessions used that were offensive rebounds
- %Turnover: percentage of possessions used that were turnovers
Usage versus Efficiency
(If usage versus efficiency means nothing to you, it’s best you get acquainted with the topic by reading this post by Eli Witus.)
It would be really nice to find a way to model an individual’s usage versus efficiency as Dean outlines in Basketball on Paper, but it isn’t very easy to do. In Eli’s post linked to above, Eli illustrates a nifty way of estimating the effect of individual player usage on a lineup’s efficiency. Eli used data from roughly half of the 2007-2008 season, so I wanted to know what results we get when using the data in the spreadsheet above with data from most of the games from the 2008-2009 season.
See Eli’s post for all of the details, but essentially the actual efficiency – expected efficiency is our response and the sum of player’s usg%’s – 1 is the predictor. For example, a team with an actual efficiency of 110 and expected efficiency of 100 with a total of 1.05 usg% would have 10 as the response and 0.05 as the predictor. The idea is that if there is an an effect of usage on efficiency then we expect to find a positive coefficient for our predictor.
Fitting a model to this data set estimates this coefficient to be 10.9 with a standard error of 2.9 giving us a 95% confidence interval of (5.2, 16.6) and a p-value < 0.01 when testing if this coefficient is zero. Eli’s estimate was approximately 25 with a standard error of around 9 with a 95% confidence interval of (7.4, 42.6).
My estimate suggests that each 0.01 increase in total usg% increases our efficiency expectation by 0.11 points per hundred possessions. This estimated effect is smaller than Eli’s. His suggests a 0.04 difference changes expectations by one point per hundred possessions, where as mine suggests we would need a 0.09 difference to see the same effect. The 95% confidence interval for the usg% difference needed to expect a one point per hundred possession change in efficiency is (0.06, 0.19).
The overall point is that I too find this effect, so hooray for reproducible research!
The Next Step
The next step is to extract individual defensive efficiency ratings from the play-by-play. Once I have defensive ratings, the goal is to adjust the ratings for strength of teammates and opponents. I then plan to examine how these ratings predict future team efficiency ratings, similar to how I examined the predictive capabilities of basic efficiency models.