Jan 7 2010

Adjusting Individual Defensive Efficiency Ratings

A couple of months ago I presented individual defensive efficiency ratings for the 2008-09 regular season that I extracted from play-by-play data. In this post I will present a method for adjusting these ratings in an attempt to get a clearer picture of a player’s defensive abilities.

Adjusting the Defensive Ratings

To adjust these defensive ratings I fit a multilevel model that allows us to measure the individual offensive, individual defensive, and team defensive impacts on individual efficiency ratings. I fit this model for each of the 2006-07 through 2009-10 regular seasons, and I also fit a single model using all data from those seasons. The results of these fits can be found in the following spreadsheet:

Adjusted Individual Defensive Efficiency Ratings

In this spreadsheet you will find tabs for each of these model fits. The ratings are in terms of the player’s difference from the average defender. Standard errors are listed along with color coded confidence levels. These color codes give us an idea as to how much confidence we have in the estimate. In other words, green means we’re confident the player is not average, red means we have little confidence the player is not average, and yellow is the middle ground between the two confidence levels.

Interpreting the Ratings

To interpret these ratings, you have to think in terms of knowing the defensive player used the possession. For example, Dwight Howard’s 2009-10 rating suggests that when he uses a defensive possession the individual offensive efficiency rating of the player that used the offensive possession is 14.7 points lower than what it would be against an average defender.

It is important to note that because this model shrinks estimates to the mean, bad defenders that get little playing time will be considered average.

These ratings also adjust for the team the player plays for, as Dean Oliver shows in Basketball on Paper how some good defensive players can play on poor defensive teams. The general idea was to try and account for “Dumars-like” players while at the same time control for the idea that one individual doesn’t have complete control over how well a team does defensively.

I haven’t done anything scientific to fully study the impact of this team adjustment, but it seems to make sense after eyeballing the impact of this on players like Pau Gasol and Chris Bosh. Eyeballing something doesn’t give us a ton of confidence, so this adjustment is worth a deeper look in the future.

Players Still Underrated After Adjustment

These adjusted ratings do little to account for the fact that we don’t have a great way of giving credit to defenders when opponents make or miss shots. Guys like Shane Battier that defend the opponent’s best offensive player aren’t going to stand out in these ratings.

What Makes Sense? What Doesn’t?

I’m still trying to learn what makes a good defender, so I’d like to hear your thoughts on what ratings make sense, and which don’t. What players have reputations for being good defenders that this model isn’t estimating well?

Dec 29 2009

College Basketball: Rating Individual 3FG%

My last post presented a model for predicting 3FG% based on a player’s ability, age, and role in the offense. A comment by DSMok1 inspired the model I will present in this post. He writes:

I was considering how best to create an “equalized” measure of 3pt and 2pt % for college players, based on the opposition played and the usage percentage. In other words, I would create a notional percentage for each player based on a usage rate of 20%, playing NCAA-average opposition.

Do you think that you could do a similar regression for 2Pt%, and post it?

Although he specifically requested 2FG%, in this post I will present a model of college basketball 3FG% that controls for player ability, opponent strength, experience, and role in the offense.

The Data

To build this model I collected each player’s made and attempted three point field goals for every season from 2002-03 to 2008-09, and I kept only those player seasons that attempted at least 50 three point field goals. I separated this data by opponent, and I kept track of how often this player was in the data set as a proxy for that player’s experience.

Also, I calculated every player’s usage% for each season. Usage% is the percentage of his team’s possessions that the player can be considered responsible for, as defined by Dean Oliver in Basketball on Paper. Thus this usage% includes assists, and it is constructed using Dean’s formulas for the NBA from his book.

The Model

With this data I fit the following model:

Pr({\tt 3FG make}) = {\tt logit}^{-1}(\alpha + \beta_{1}({\tt usage}) + \beta_{2}({\tt experience}) + \beta_{3}({\tt long}))

This logistic regression was fit as a multilevel model to allow the intercept to vary by player and opponent. This allows us to estimate player ability while controlling for opponent strength. In this model long indicates if the attempt is from the 2008-09 season in which the NCAA moved the three point line back to 20 feet 9 inches from 19 feet 9 inches.

The Results

The average player results are as follows:

  • Coefficients: \alpha = -0.490, \beta_{1} = -0.557, \beta_{2} = 0.028, \beta_{3} = -0.034. The p-values for testing if the true values of these parameters are equal to zero are all less than 0.01.
  • Usage: The coefficient for usage, \beta_{1} = -0.557, suggests that for each additional 1% in an individual’s usage% the odds the individual makes a 3FG attempt are decreased by 0.55%. As we would expect, this suggests that a player that increases their usage from 20% to 21% would expect to see their odds of making a 3pt FG attempt decrease by 0.55%
  • Experience: The coefficient for experience, \beta_{2} = 0.028, suggests that for each one year increase in experience the odds the individual makes a 3FG attempt is increased by 2.8%.
  • Long: The coefficient for the longer 3pt distance, \beta_{3} = -0.034, suggests that the odds of making a 3pt shot from the longer distance are 3.3% lower than the odds of making a 3pt shot at the shorter distance.

Player Estimates

This model fit helps us cut through the noise and estimate a player’s ability against league average opponents. As the graphs below show, there is a lot of uncertainty in a player’s individual 3FG% in any one season. Further compounding these yearly results is the fact that players face different levels of competition, and they may take on a larger role in their team’s offense as they gain experience.

The first graph I will present is that of Davidson’s Stephen Curry:

Stephen Curry: Estimated 3FG% Ability in College

This graph shows Stephen’s estimated ability as a function of experience (the x-axis) and usage (blue=10% usage, black=20% usage, and red=30% usage). Below the x-axis you will see the actual usage% for each season to go along with the average percentile ranking of opponent 3FG% defense, where 50% represents average, >50% above average, and <50% below average opponents. The black dots and associated lines extending from these dots represent the sample 3FG% for the season and the 95% confidence interval for the player’s true 3FG% ability during the season.

While Stephen ranks 11th in this model of all players from 2002-03 to 2008-09, current star of the College of Charleston, Andrew Goudelock, ranks a surprising 41st. His graph is below:

Andrew Goudelock: Estimated 3FG% Ability in College

Another player graph that may be of interest is Duke’s J.J. Redick:

J.J. Redick: Estimated 3FG% Ability in College

Translating to the NBA

Although this model helps us estimate a player’s ability in college, we’re ultimately interested in translating this to the NBA. There are a lot of highly ranked players that never play in an NBA game, as simply being able to shoot 3pt shots well isn’t enough to succeed in the NBA.

That said, the next step is to examine players that actually make it to the NBA and determine what this model says about their ability to shoot 3pt shots against that level of competition.

Dec 11 2009

The Relationship Between Age, Usage, and 3FG%

In my last two posts I have taken a look at modeling a player’s 3FG% based on their age, and I’ve also estimated the relationship between usage and 3FG%. In this post I would like to bring these two topics together and estimate the relationship between age, usage, and 3FG%.

The Data

To put this data set together I again used Basketball-Reference.com’s Player Season Finder, but this time I collected the advanced statistics to go along with the player’s 3pt makes and attempts. Also, I used more years of data, as this data set is from the 1989-90 to 2008-09 seasons.

My original threshold for including the player season in the data set was to require at least two 3pt shot attempts during the season. I have, however, increased this threshold to eighty two 3pt shot attempts in an effort to isolate the data set to only players that we expect to shoot 3pt shots. This means that I’m attempting to quantify those players that are “regular” or “semi-regular” 3pt shooters and disregard those that do not consider the 3pt shot a part of their game.

The Model

To estimate the relationship between age, usage, and 3FG%, I’ve fit the following model:

Pr({\tt 3FG make}) = {\tt logit}^{-1}(\alpha + \beta_{1}({\tt USG\%}) + \beta_{2}({\tt age}) + \beta_{3}({\tt age}^{2}))

I fit this logistic regression as a multilevel model to allow the intercept and coefficients for USG% and the age quadratic to all vary by player. This type of model allows us to estimate the player ability while allowing us to estimate individual USG% lines and individual player aging curves.

The Results

The average player results are as follows:

  • Coefficients: \alpha = -1.62, \beta_{1} = -0.0061, \beta_{2} = 0.081, \beta_{3} = -0.00136. The p-values for testing if the true values of these parameters are equal to zero are all less than 0.01.
  • USG%: The coefficient for usage, \beta_{1} = -0.0061, suggests that for each additional 1% in an individual’s USG% the odds the individual makes a 3FG attempt are decreased by 0.6%. As we would expect, this suggests that a player that increases their usage from 20% to 21% would expect to see their odds of making a 3pt FG attempt decrease by 0.6%
  • Age: The coefficients for the aging curve, \beta_{2} = 0.081 and \beta_{3} = -0.00136, suggest that the average player’s peak in 3pt shooting ability occurs when they are 30 years old.

Trevor Ariza

Trevor Ariza was the source of the original motivation for looking at the relationship between usage and 3FG%, so I thought it would be appropriate to present a graph of his estimated aging curve at usage levels of 10% (blue), 20% (black), and 30% (red). The dots represent the sample 3FG% for Trevor at the specified age:

Trevor Ariza: Estimated Usage% and Aging Curve

One thing you’ll notice is that we only have one data point on this graph. This is because Trevor did not shoot many 3pt shots until last season with the Lakers.

That said, using just last year’s data for Ariza we would predict him to shoot 34% this year with the Rockets at age 24 using 23.6% of his lineup’s possessions. Thus far this year he’s shooting 34.3%. Don’t read too much into the closeness of this predicted% to his actual%, as a 95% confidence interval for his 3FG% this year is (26.6%, 42.8%).

One thing to note is that this model suggests that last year’s 31.9% performance isn’t a fair representation of his true ability. This model estimates his fair ability of making a 3pt FG attempt to be 34.3% last year with the Lakers at age 23 using 16.7% of his lineup’s possessions.

Other Players

Here are some other player graphs that have more than a single season’s data, where lines for the estimated aging curve at usage levels of 10% (blue), 20% (black), and 30% (red) are shown. The dots represent the sample 3FG% for the players at the specified age:

Ray Allen: Estimated Usage% and Aging CurveSteve Nash: Estimated Usage% and Aging CurveDirk Nowitzki: Estimated Usage% and Aging CurveRobert Horry: Estimated Usage% and Aging Curve

More Work…

The next step is to try and validate these models using out of sample data. One thing I would like to do is to use cross-validation to measure the expected prediction error of this model. Also, I would like to quantify the uncertainty around these estimates. Current efforts to do this have left me unsatisfied, but there are certainly some confidence bounds we could generate for these estimtes, and they should prove to be worthwhile to create.

I’ll have to wait to do this, as my final exams start tomorrow, and I’ve blown off studying for them about as long as I possibly can. 8)

TAGS: , ,
Dec 10 2009

The Tradeoff Between Usage and 3FG%

My last post on the relationship between age and 3FG%, Brian Tung brought up a topic that I’ve wanted to look at:

… I’m not even sure that history is necessarily a good guide here, as players shoot those treys under fairly different circumstances at different stages of their careers. Who’s to say that Ariza, for example, is going to get the same looks this year as one of the primary options as opposed to last year as one of the secondary options?

Thanks to the work of Eli Witus, I’d like to see what we can say about situations like Ariza’s.

The Goal

The goal of this study is to determine what relationship usage% has on 3FG%. For the uninitiated, usage% is the proportion of a lineup’s possessions an individual player is responsible for. See Basketball on Paper for the full details.

Collecting the Data

To estimate this relationship, I’ve collected the number of 3pt makes and misses for each lineup from the 2006-07 to 2008-09 seasons. I also kept track of which shots were from the corner, and I’ve captured each player’s individual usage% for the season from play-by-play data.

With this data I constructed a data set that consists of how many 3pt shots were made and missed, if the shots took place in the corner, and the sum of the individual usage%s for each lineup. Lastly, the sum of the individual usage%s for each lineup were centered around one. This is done to estimate how much “more” or “less” the lineup must do relative to what the individual usage%s would indicate. For example, if the usage%s sum to 1.05, relative to one this is +0.05. Hence we would expect this lineup to take “less” of a load and increase their shooting percentages.

Lastly, I predict the effective FG% of 3pt shots of each lineup based on the likelihood that each player in the lineup attempts a 3pt shot and their probability of making these 3pt shots, where the difference between making a corner and non-corner 3pt shot are taken into account. (These predictions are made conditional on knowing some player in the lineup took a 3pt shot.)

Fitting the Model

I’ve chosen to fit this model with a linear regression, so like Eli’s efficiency study, I’m using the difference between the predicted effective FG% of 3pt shots and the actual effective FG% of 3pt shots as the response variable and the relative difference from one of the summed usage%s as the predictor.

In 2008-09, the estimated coefficient for usage% is 0.131 with a p-value < 0.01. 2007-08, the estimated coefficient for usage% is 0.087 with a p-value of 0.06. In 2006-07, the estimated coefficient for usage% is 0.046 with a p-value of 0.33. When fit to all three seasons of data, the estimated coefficient for usage% is 0.09 with a p-value < 0.01. A 95% confidence interval for this coefficient is (0.039, 0.142).

Interpreting the Results

Using the model fit to all three seasons of data, the estimated coefficient suggests that for each 0.01 increase in the sum of the individual player’s usage%s we expect the lineup’s effective FG% on 3pt shots to increase by 0.09%.

At the player level we would estimate that each 0.01 decrease in a player’s usage% would increase the effective FG% of their 3pt shots by 0.45%.

In Ariza’s case, we would estimate that a change from using 16.3% of his lineup’s possessions to 21.8% (a net -5.5%) would decrease his effective FG% of 3pt shots by 2.5%. The actual estimated change from last year with the Lakers and this year with the Rockets is -1.7% on corner 3pt shots and -3.4% on other 3pt shots.

Future Work

Even though we have some evidence to suggest an increase in usage% will decrease a player’s effective FG% on 3pt shots, we know that age and other factors affect 3FG%. Thus in the future I would like to determine how much confidence we have in out of sample predictions when taking into account usage%, age, etc. The goal of this study was to present the estimates from in-sample data, but more work needs to be done to determine how useful this is for making predictions about the future in a new season.

Dec 9 2009

Estimating the Impact of Aging on 3FG%

The impact of aging on player performance interests me, but I haven’t really done anything useful in this area. Baseball’s recent debate (really fantastic stuff) on the topic of player aging curves has motivated me to take a closer look at the affect of aging on 3pt shooting.

Ed Küpfer’s three year old post on aging is the best resource I know of on the topic, but I’m sure he’s got something more useful than what’s in that post. That said, I’d like to specifically estimate an aging curve for the 3pt shooting abilities of NBA players.

The Data

Using Basketball-Reference.com’s excellent Player Season Finder, I collected 3pt shooting data for players that attempted at least two 3pt shots during any season from 1999-00 to 2008-09. This resulted in a data set containing a little over 3300 player seasons.

The Model

Individual 3pt shooting performance is more than a function of player ability and age, but in this case I’m specifically interested in estimating the affect of age. Further, I would like to estimate an aging curve for each player under the assumption that each player has their own unique aging curve.

To do this, I’ve chosen to use a multilevel model that allows the intercept and slope to vary by player, as this model allows for the estimation of a unique aging curve for each player. The aging curve is assumed to be of the form \alpha + \beta_{0}(age) + \beta_{1}(age^{2}).

The Results

The model fit estimates the the average player’s peak age in 3pt shooting ability is 29, and the p-values for the \beta_{0} and \beta_{1} coefficients in the model form listed above are both less than 0.01.

The estimates for each player are listed in the following spreadsheet: 3FG% Aging Estimates

In this spreadsheet you will find each player’s estimated peak age along with 3FG% estimates for each player from age 20 to 40.

With the exception of Andre Miller, most peak ages are plausible. For a player like Miller, the model suggests his 3pt shooting ability has been declining since starting his NBA career.

Individual Players

Below you will find graphs of estimated aging curves for some player’s 3FG%. In addition to the estimated aging curve, you will see points plotted that represent the actual 3FG% for the player at the specified age.

LeBron James Estimated 3FG% Aging CurveKobe Bryant Estimated 3FG% Aging CurveSteve Nash Estimated 3FG% Aging CurveVince Carter Estimated 3FG% Aging Curve

There is Uncertainty…

These are just estimates, and it is important to point out that uncertainty exists around these estimates. I’ve tried to simulate confidence bounds for this model without success, but even though I’m not representing the uncertainty, keep in mind that it does exist.

Improving this Model

The debate in the baseball community suggests it’s incorrect to simply assume players have this type of aging curves. Although this type of curve appears to be a decent approximation, perhaps there are better ways to estimate the impact of a player’s age on 3FG%. Other than trying to control for other explanatory variables, trying to fit other model types might prove worthwhile.

 Page 1 of 12  1  2  3  4  5 » ...  Last »