Dec 29 2009

# College Basketball: Rating Individual 3FG%

My last post presented a model for predicting 3FG% based on a player’s ability, age, and role in the offense. A comment by DSMok1 inspired the model I will present in this post. He writes:

I was considering how best to create an “equalized” measure of 3pt and 2pt % for college players, based on the opposition played and the usage percentage. In other words, I would create a notional percentage for each player based on a usage rate of 20%, playing NCAA-average opposition.

Do you think that you could do a similar regression for 2Pt%, and post it?

Although he specifically requested 2FG%, in this post I will present a model of college basketball 3FG% that controls for player ability, opponent strength, experience, and role in the offense.

### The Data

To build this model I collected each player’s made and attempted three point field goals for every season from 2002-03 to 2008-09, and I kept only those player seasons that attempted at least 50 three point field goals. I separated this data by opponent, and I kept track of how often this player was in the data set as a proxy for that player’s experience.

Also, I calculated every player’s usage% for each season. Usage% is the percentage of his team’s possessions that the player can be considered responsible for, as defined by Dean Oliver in Basketball on Paper. Thus this usage% includes assists, and it is constructed using Dean’s formulas for the NBA from his book.

### The Model

With this data I fit the following model:

$Pr({\tt 3FG make}) = {\tt logit}^{-1}(\alpha + \beta_{1}({\tt usage}) + \beta_{2}({\tt experience}) + \beta_{3}({\tt long}))$

This logistic regression was fit as a multilevel model to allow the intercept to vary by player and opponent. This allows us to estimate player ability while controlling for opponent strength. In this model long indicates if the attempt is from the 2008-09 season in which the NCAA moved the three point line back to 20 feet 9 inches from 19 feet 9 inches.

### The Results

The average player results are as follows:

• Coefficients: $\alpha = -0.490$, $\beta_{1} = -0.557$, $\beta_{2} = 0.028$, $\beta_{3} = -0.034$. The p-values for testing if the true values of these parameters are equal to zero are all less than 0.01.
• Usage: The coefficient for usage, $\beta_{1} = -0.557$, suggests that for each additional 1% in an individual’s usage% the odds the individual makes a 3FG attempt are decreased by 0.55%. As we would expect, this suggests that a player that increases their usage from 20% to 21% would expect to see their odds of making a 3pt FG attempt decrease by 0.55%
• Experience: The coefficient for experience, $\beta_{2} = 0.028$, suggests that for each one year increase in experience the odds the individual makes a 3FG attempt is increased by 2.8%.
• Long: The coefficient for the longer 3pt distance, $\beta_{3} = -0.034$, suggests that the odds of making a 3pt shot from the longer distance are 3.3% lower than the odds of making a 3pt shot at the shorter distance.

### Player Estimates

This model fit helps us cut through the noise and estimate a player’s ability against league average opponents. As the graphs below show, there is a lot of uncertainty in a player’s individual 3FG% in any one season. Further compounding these yearly results is the fact that players face different levels of competition, and they may take on a larger role in their team’s offense as they gain experience.

The first graph I will present is that of Davidson’s Stephen Curry:

This graph shows Stephen’s estimated ability as a function of experience (the x-axis) and usage (blue=10% usage, black=20% usage, and red=30% usage). Below the x-axis you will see the actual usage% for each season to go along with the average percentile ranking of opponent 3FG% defense, where 50% represents average, >50% above average, and <50% below average opponents. The black dots and associated lines extending from these dots represent the sample 3FG% for the season and the 95% confidence interval for the player’s true 3FG% ability during the season.

While Stephen ranks 11th in this model of all players from 2002-03 to 2008-09, current star of the College of Charleston, Andrew Goudelock, ranks a surprising 41st. His graph is below:

Another player graph that may be of interest is Duke’s J.J. Redick:

### Translating to the NBA

Although this model helps us estimate a player’s ability in college, we’re ultimately interested in translating this to the NBA. There are a lot of highly ranked players that never play in an NBA game, as simply being able to shoot 3pt shots well isn’t enough to succeed in the NBA.

That said, the next step is to examine players that actually make it to the NBA and determine what this model says about their ability to shoot 3pt shots against that level of competition.

If you enjoyed this post, use RSS to get notified of new posts.

### 4 Comments on this post

1. DSMok1 said:

Excellent work, Ryan!

A quick question: why exactly does Goudelock’s “true” range run so far below his actual performance? Because of his opponent strength? But the opponent strength doesn’t seem to be that low. Could you go into more depth about how that opponent adjustment works in a case like that? I’m a newbie at multilevel modeling!

December 29th, 2009 at 10:17 am
2. Ryan said:

The model shrinks a player’s estimate to the mean based on the amount of data we have. In this case we have two years of data and are estimating his ability based on his performance against slightly above average opponents in 2007-08 (~52%) and below average opponents in 2008-09 (~41%).

What I really hope to show with the plots of the actual results is that there is a large amount of uncertainty after any one season. This model shrinks that confidence interval to (36%, 43%) for 2007-08. I couldn’t find a great way to compare the two, but that’s the general idea and benefit of what this model is doing.

December 29th, 2009 at 11:39 am
3. DSMok1 said:

In other words, this directly provides regression to the mean based on the number of data points (3PA’s)? That makes sense, I guess. Is that at all comparable to a Bayesian framework for regression to the mean? I would guess so.

December 29th, 2009 at 2:15 pm
4. Ryan said:

It isn’t fully Bayesian, but we do assume the player and team coefficients come from some normal distribution. This provides the shrinkage.

December 29th, 2009 at 3:04 pm