My Poster at ChaCha Days 2009
 5 Comment
This weekend is ChaCha Days 2009 at the University of Central Florida, and I will be presenting a poster on some research I’ve done for my senior research and writing project at the College of Charleston.
The title for my poster is Modeling Basketball’s Points per Possession, and the motivation for this work comes from wanting to figure out how to best model the number of points teams score or allow on a possession. The first step in accomplishing this goal is to find a modeling technique that could have plausibly generated the actual data observed, and the poster presents the results of this process.
In the poster below you will find graphs that illustrate the difference between the actual frequency of points scored on a possession along with the simulated frequency of points scored on a possession, where the simulated data comes from the fitted models. These graphs illustrate the various models examined, and they cover linear, poisson, negative binomial, zero altered poisson (ZAP), and multinomial logistic regressions.
Using a chisquare goodness of fit test, the multinomial logistic regression is the only model that could have plausibly generated the actual data we observed. An example of the fit of this model is shown on the right hand side of the poster, where the model fit for the 200809 Orlando Magic offense is shown. This model was fit with the following predictors:
 Reserves: The number of opponent reserve players in the game. This value ranges from zero to five.
 Penalty: An indicator taking the value one when the defensive team is in the penalty and a zero when not.
 Players: Indicators for each player that take the value of one when the player is on the court and zero when not.
The pvalues shown in the table of coefficients were calculated using a likelihood ratio test, and they allow us to test if these predictors are useful.
Using this fit for the Orlando Magic, the remaining graphs illustrate the estimated difference between Dwight Howard and his backup Marcin Gortat along with the estimated difference between being in the penalty versus not being in the penalty.
Please comment if you have any questions or feedback relating to the research and/or poster!
A direct link to the poster can be found here.
5 Comments on this post
Trackbacks

Jose A. Martinez said:
Well done, Ryan. Thanks for sharing this.
I think that your results are reasonable. Multinomial logistic model is the “a priori” best model, because of the nature of the dependent variable (0, 1, 2, 3). But this things have to be empirically proved, as you have done.
ZAP is the same as Zero Inflated Poisson (ZIP)??
I think that this kind of models (censored inflated regression, negative binomial regression, ZIP regression, etc.) should be used when modelling blocks, steals or assits per player and possessions, but the could be problematic from some theoretical viewpoint. For example, ZIP models are related with the existence of structural zeros representing a latent variable that could be distinct from the variable representing the poisson part of the model… But player performance can be modelled using this reasoning? This is an interesting topic to analyse.
And how did you obtain a chisquare for linear regression? Did you apply Maximum Likelihood instead of OLS estimation using some program such as Lisrel or similar?
Good job, Ryan.
November 7th, 2009 at 5:34 am 
Ryan said:
Thanks Jose. ZAP is a little different from ZIP. The ZAP assumes that zeros do not come from the Poisson component, where as the ZIP allows zeros from both the binomial and Poisson components. I fit a ZIP model to this data and obtained very similar results to the ZAP. In fact, the ZIP, ZAP, ZINB, and ZANB (NB for negative binomial) all produced very similar results.
For the linear regression, I consider all points less than or equal to 0.5 to be 0, points on (0.5, 1.5] to be 1, (1.5, 2.5] to be 2, and greater than 2.5 to be 3. This allows me to create the same 0, 1, 2, and 3 categories to run the goodness of fit test.
November 7th, 2009 at 10:45 am 
Jose A. Martinez said:
Thanks for clarify the doubts. Nice site!
November 7th, 2009 at 12:20 pm
[…] The Lampshade: Ryan provides a model mapping points per possession given various situations in the game. I have a soft spot for charts. [Basketball Geek] […]
[…] have been doing some research to figure out how to best model the number of points teams score and allow on an individual […]