Player Statistics at Home vs Away
 12 Comment
It has become apparent to me that to we must study the relationship between game situations and the player statistics collected under these game situations before we can fully understand the stats we collect about players. Players enter the game under varying conditions, thus the distribution of game situations is not uniform across all players. Because of this, I feel we can gain insight by studying how these game situations relate to individual player stats.
For example: I’d like to know what sort of relationship garbage time eFG% has to nongarbage time eFG%. This is merely one of many possible questions we can answer by studying game situations.
Home vs Away
The most basic game situation to study is home vs away. We’re all familiar with how much a team’s home court advantage is worth in terms of points or winning percentage, but what about the relationship between a player’s eFG% or turnovers per possession at home vs away?
The Method and Data
Using data collected for the 20062007, 20072008, and 20082009 regular seasons, I calculated the following statistics for each player: FT%, 2pt FG%, 3pt FG%, eFG%, OReb%, DReb%, turnovers per offensive possession, fouls drawn per offensive possession, personal fouls per defensive possession, and steals per defensive possession.
Using R, I calculated correlation coefficients and fit linear models to the data for all players that took part in at least 100 events at home and away in each category (such as 100 FTA, 100 2FGA, 100 3FGA, 100 offensive possessions, etc). See this file for the raw results.
The Correlation Coefficients
Year  FT%  2FG%  3FG%  eFG%  OR%  DR%  TO%  Fouled%  Steal%  Foul% 
0607  0.865  0.653  0.465  0.598  0.908  0.913  0.661  0.870  0.599  0.847 
0708  0.835  0.655  0.151  0.612  0.896  0.933  0.675  0.855  0.586  0.857 
0809  0.816  0.614  0.346  0.540  0.905  0.899  0.670  0.856  0.561  0.798 
The Relationships in Visual Form
As much fun as it may be to look at correlation coefficients, graphing the data with the fitted linear models helps paint a better picture. The graphs below illustrate these relationships from the 0809 regular season:
FT%  2FG% 
3FG%  eFG% 
OR%  DR% 
TO/Poss  Fouled/Poss 
Steals/Poss  Fouls/Poss 
Making Predictions
The whole point of this is to make some sort of prediction about a player’s stats given some information (such as how they’ve performed at home).
Based on the models fit to this data, knowing a player’s stats at home gives us information about player’s road stats. (Except, of course, for the models fit to the 3FG% data from the 0607 and 0708 seasons).
These results, however, should not surprise anyone. Obviously there is a connection between home vs away stats. Hopefully, however, this helps answer the magnitude of the relationship between a player’s stats at home vs away.
My goal is to use the framework outlined above to quantify the relationship between player stats in other game situations of interest (such as garbage time vs nongarbage time).
Reproduce These Results
To reproduce these results, you’ll need to download the following files:
By running source(“home_vs_away.R”) in R, a file with the raw results will be created. Also, to plot the graphs, simply uncomment the plot() code in the home_vs_away() function.
Summary
This data allows us to quantify the relationship between various player stats at home vs away. Other than the before mentioned 3FG% models, all of the linear models showed the home stats to be statistically significant for predicting the away stats. To use these models, see the raw results file.
UPDATE: Per Nick’s suggestion, I’ve rescaled the graphics so that the xaxis and yaxis cover the same distance.
If you enjoyed this post, use RSS to get notified of new posts.
12 Comments on this post
Trackbacks

Gabe said:
Just curious: were these results weighted by the number of observations for each player? For instance, a guy taking 500 FGA would carry 5 times more weight than a guy taking 100 FGA.
I couldn’t tell from the TXT of the results, so I thought I’d mention it.
April 8th, 2009 at 12:37 pm 
Ryan said:
No, I didn’t apply any weighting to the observations. The only requirement was that the player take part in at least 100 events.
April 8th, 2009 at 12:41 pm 
nick said:
Just a thought, but if you evened up the axes on the plots so that they showed square intervals (i.e. the x axis should be the same as the y), the linear fit would give you visual information more clearly and directly. Specifically, the extent to which the linear fit is not a line (45 deg) from the bottom left to the top right would show a) how much it changes home v. road, and b) which players are most affected.
April 8th, 2009 at 4:11 pm 
Ryan said:
Great idea Nick. I’ve updated the graphics in the post above.
April 8th, 2009 at 7:39 pm 
Daniel said:
I love your work. I love the insight. But how in the world are you able to do it? I have an engineering degree from the Naval Academy and I can’t do ANYTHING approaching what you can do. I can make excel plots….and thats it. How does one learn these techniques?
ANY insight is greatly appreciated.
April 10th, 2009 at 12:59 am 
Ryan said:
Daniel, I use R for all of the analysis. The stuff I’ve done, in my mind at least, is just the tip of what you can program/do with R (http://www.rproject.org). There are more complex things and better graphics to create (if that’s the sort of thing you’re interested in).
I hope that answers your question.
April 11th, 2009 at 5:43 pm 
Andrew said:
It looks like these results should be correlated on a curve – for example, the linear eFG% model predicts that a player who shoots 30% at home will shoot 35% on the road. It would make more sense to do an exponential regression which has the road percentages always slightly lower than the home percentages. I believe this would fit the data better as well.
June 9th, 2009 at 5:32 pm 
Ryan said:
That’s an interesting idea Andrew. Just to be clear, are you referring to a generalized linear model with a log link function? I’m not familiar with running that type of regression, but I have a handy book that describes things well, so I want to make sure I have the terminology correct.
One aspect of these sorts of things has to do with underperformance. We’re not controlling for strength of opponent, but my sense from this data is that a player that shoots poor at home my just been have underperformed that what we might expect.
There is certainly much left to explore with this stuff.
June 9th, 2009 at 5:54 pm 
Marc said:
Hi Ryan,
Thanks for the great data set! Can you explain more what the abbreviations mean ( FT% 2FG% 3FG% eFG% OR% DR% TO% Fouled% Steal% Foul% etc.), maybe this is totally obvious to a basketball fan?
I am currently testing a model for prediction of German football games (dependent on home/away). It would be nice to test the model with your dataset as well.
Thanks in advance,
Marc
June 12th, 2009 at 5:32 am 
Ryan said:
Marc, BR’s glossary should help!
June 12th, 2009 at 10:59 am 
Marc said:
thank you!
June 19th, 2009 at 2:13 pm
[…] 2009 May 24 tags: chart, facet, ggplot2, linear fit, panel, plot, R by learnr Basketballgeek is exploring the relationship of several game statistics at home and away games. He uses R to illustrate these […]