Jul 2 2009

Measuring 3pt Shooting Ability With a Multilevel Model

Over the past couple of months I took an awesome class on categorical data analysis. Although it may not sound like it, this sort of data analysis has a lot of application to basketball, as it covers analyzing and building models for things like odds of events, probabilities of success, etc.

Although we didn’t cover it in class, the final chapter in our book by Alan Agresti covered multilevel models (otherwise known as mixed or random effects models). This finally allowed me to start to piece together the large treatment on this topic by Andrew Gelman and Jennifer Hill in their book on regression and multilevel models.

(Hopefully these references provide some reading for those interested in the details of these models.)

An example by Agresti on free throw shooting inspired me to see how we might apply a basic multilevel model to other NBA statistics. I chose 3pt shooting.

The Purpose of a Multilevel Model

The best way for me to explain the purpose of a multilevel model (with respect to sports, at least) is to liken it to a model-based regression to the mean.

By grouping similar players together, we take advantage of what we know about the average player from this group of players with the actual data we collect for each individual player. Like regression to the mean, this allows us to make sense of small samples and intelligently pool the group and individual-specific data together.

Since it’s model based, one advantage of this type of analysis over regression to the mean is that we can more easily quantify effects of the game, like home court advantage, that we might otherwise have a hard time quantifying.

There are more complex multilevel structures that I hope to understand in the future that will hopefully allow for controlling for other aspects of the game. Although not considered here, we might want to control for quality of opponents when rating individual player ability. This is simply one of many things we might want to control for that a more complex model structure may provide over the model presented here.

The Data Used for This Model

The models presented below were fit to a data set containing all 3pt shots attempted from a reasonable distance during the 02-03 through 08-09 seasons. This data set is grouped by the following categories:

  • Season
  • Player Position: from 1 through 5 to denote PG, SG, SF, PF, and C
  • Player Name
  • Player Age: as of June 1st prior to the start of the upcoming season
  • Home vs Away
  • Corner 3pt shots vs Other 3pt shots: a corner 3pt shot is defined to take place within 10ft of the baseline

Here is a sample from the data set that shows Kobe Bryant’s 3pt shots from the 08-09 season:

Season,Position,Name,Age,Game Location,Shot Location,Makes,Misses
2008,2,Kobe Bryant,30,A,corner3,7,7
2008,2,Kobe Bryant,30,A,other3,74,135
2008,2,Kobe Bryant,30,H,corner3,6,14
2008,2,Kobe Bryant,30,H,other3,59,99

Each player’s position is held constant for each season and taken to be the position listed from the most recent season. The position data comes from doug’s stats and the date of birth data comes from database basketball and NBA.com.

The Basic Model Structure

This model considers 3pt shots that are grouped by player. A separate model was fit for each position. The purpose of this model is to estimate each player’s ability while controlling for things like home court advantage, corner 3pt shots vs other 3pt shots, and age effects.

The models below were fit with R using glmer() from the lme4 package.

The Model Fits

PG: logit-1( -1.40 + 0.02(home) + 0.11(corner3) + 0.053580(age) - 0.000911(age2) )

SG: logit-1( -0.60 + 0.03(home) + 0.11(corner3) )

SF: logit-1( -0.64 + 0.04(home) + 0.12(corner3) )

PF: logit-1( -3.12 + 0.03(home) + 0.10(corner3) + 0.170527(age) - 0.002965(age2) )

The fit for centers does not appear to be that useful. Most centers don’t take that many 3pt shots. It might be better to group centers of interest with power forwards, but for now we’ll ignore these players (sorry Mehmet and Sheed).

All of the coefficients for the fits listed above are significant at the 0.10 level, except for the coefficient for home in the PF fit. It’s about what we would expect it to be at 0.03, so it seems reasonable to leave it in the model.

Interpreting These Fits

First I want to note that failure to converge warnings were encountered when including the age effects in the SG and SF fits, which is the primary reason why we are not controlling for those variables for shooting guards and small forwards. I’ve been unable to resolve this issue, so for now we will assume these players do not have an age effect for their 3pt shots.

That said, here is how we might interpret the effects in these fits:

  • Home Court Advantage: We estimate the odds of making a 3pt shot at home are 2% higher than the odds of making a 3pt shot on the road for point guards, controlling for player ability, corner 3pt shots vs other 3pt shots, and age. We estimate this effect to be 3%, 4%, and 3% for shooting guards, small forwards, and power forwards, respectively. From a practical standpoint, we find no evidence that any one position has a higher home court advantage over any other position.
  • Corner 3pt Shots: We estimate the odds of making a 3pt shot from the corner are 11.6% higher than the odds of making all other 3pt shots for point guards, controlling for player ability, home court advantage, and age. We estimate this effect to be 11.6%, 12.8%, and 10.5% for shooting guards, small forwards, and power forwards, respectively. Like home court advantage, there is no evidence that any one position has a higher corner3 effect than any other position.
  • Aging Curve: We estimate that the peak age for 3pt shooting ability for point guards and power forwards is when these players are 29 years of age.
  • Home 3pt FG%: We estimate that the average 29 year old’s non-corner 3pt FG% at home is 35.6% for point guards, 36.1% for shooting guards, 35.4% for small forwards, and 34.6% for power forwards.

The Player Effects

Of ultimate interest here is the player effects that quantify the player ability. Before diving into the numbers, I think it’s worth noting what exactly player ability means in the context of this model. Because we only control for home court advantage, corner 3pt shots vs other 3pt shots, and age, the player ability component is essentially a combination of true player talent, coaching, teammates, opponents, and other things like actual shot selection. A player that suddenly takes nothing but wide open 3pt shots is likely to overperform what we might predict from this model, or underpform if they were able to take the opposite action and only attempt heavily contested 3pt shots. This shot distribution is surely to be affected by coaching, teammates, and opponents.

With that in mind, the following spreadsheet lists 95% confidence intervals for the 3pt shooting ability of each player at home, 29 years of age, and where the uncertainty comes only from the error in the measured ability of each player. The uncertainty associated with the mean intercept, home court advantage, corner3, and age effects are not taken into account.

Spreadsheet: Multilevel Model: Estimated 3pt Ability

Aging Curve Examples

Rajon Rondo has been in the news a lot lately, so I figure he is as good a player as any to take part in showing the estimated aging effect for point guards.

In the graph below the blue represents the predicted mean 3pt FG% for Rajon Rondo weighted based on his actual shot distribution from all seasons in this data set. The red represents the actual estimated mean 3pt FG% for Rajon from those seasons, also weighted based on his actual shot distribution from all seasons in this data set. The dots represent the median, while the lines illustrate the 95% confidence interval for this mean 3pt FG%.

The uncertainty shown is only for the uncertainty in the actual measured player effect. The uncertainty on the age coefficients gives us fairly wide intervals to work with, so this uncertainty has been removed for clarity in the graph. This highlights the lack of precision for the estimated aging curve, even though the coefficients are statistically significant.

Estimated Aging Curve for Rajon Rondo

For a comparison, here is Steve Nash’s estimated aging curve:

Estimated Aging Curve for Steve Nash

There are two things to take away from this comparison. First, we have more data on Steve Nash, so the uncertainty around his ability is smaller than the uncertainty around Rondo’s ability. This is shown by the smaller bars in Nash’s graph.

Second, the aging effect for point guards is estimated to be fairly small. By that I mean the curve is not very steep, so although the curvature exists, we estimate it to be a fairly flat curve.

Point guards, however, are not the only position we estimated aging effects for. Here are similar graphs comparing a young power forward to a veteran power forward:

Estimated Aging Curve for Yi JianlianEstimated Aging Curve for Dirk Nowitzki

Like Rondo, there is more uncertainty around Jianlian’s ability than Nowitzki’s. Another comparison to the point guards comes in the shape of the curve. As the graphs show, the estimated curve for power forwards is steeper than the estimated curve for point guards.

Recreate These Results

To recreate these results with the R and data files listed below, you will need to install the arm package and its associated dependencies. One easy way to do this should be to run the following command from your R console:

install.packages(”arm”, dependencies=TRUE)

Once you have these packages, you’ll need to download these files:

  • multi_3pt.R: By default, this R script simply fits the models above. You can then use summary(fits[[i]]) to see the results, where i = 1, 2, 3, or 4. Edit the code where you see the first if (0) statement to create the graphs above. Edit the code where you see the second if (0) statement to create the CSV file used to generate the spreadsheet above.
  • raw_3pt.csv: This CSV data file contains the data as defined in the data section at the beginning of this post.

Summary

This is my first attempt at using a multilevel model with NBA, so there is certainly a mistake or two lying around somewhere. :)

I used the simplest structure possible for this model, so my hope is that future research in this area will allow for different groupings. One such grouping would ideally be at the team (or coach?) level.

Jun 28 2009

The Time Distribution of Events in the NBA

In my quest to create a realistic simulation of the NBA, I’ve come to the point in which I need to answer an important question: how long does it take for an event to occur after the start of a play?

We don’t actually have to care about the time distribution of events to simulate and make inferences about most player versus player aspects of the game. That said, there are some important aspects of the game that are directly tied to time. By using time, we will be able to better examine how the time-to-penalty situation impacts a team’s efficiency. Although my direct focus as of now is on fouls, other aspects, like strategy, have timing implications, too.

Estimating The Distributions

The data used to estimate these time to event distributions was extracted from the 06-07 to 08-09 regular season’s play-by-play data. This data is represented as the number of seconds elapsed from the start of the play to the time of the play ending event, all conditional on how the play started.

Thanks to a tip from @revodavid, I used R’s density() function to perform kernel density estimation on the data. I’m certainly no expert with this stuff, but for some reason setting adjust to 0.5 (half the default bandwidth) garnered results more to what I was expecting. I don’t want to get too crazy altering the default results, though, as the idea isn’t to follow every little bump in the data, but rather to intelligently smooth the data to provide a good approximation. This isn’t life or death stuff here, so I figure it will be good enough for now.

One thing to point out with the data is that the times aren’t perfectly measured. Time is continuous in nature, yet (prior to the 2009 playoffs, at least) we never see fractional seconds in the play-by-play. The way the data is collected is also inexact. The shot events below show events that last past 24 seconds. Aside from actual errors in the time stamp on each play-by-play event, the shot event time isn’t actually recorded when the shot is taken. Thus we expect to run off more than 24 seconds for some shots.

Time to Shot Events

To illustrate how long events take to occur, I’ve decided to show the estimated probability distributions for the time to shot events. These shot events include all 2pt and 3pt makes, misses, and shooting fouls drawn.

Period Start vs Timeout vs Inbounds After Foul

The graph below shows the probability distribution for the number of seconds that elapse before a shot event for plays that start at the beginning of a period, after a timeout, and inbounds after a foul.

Time to Shot Event (Period Start vs Timeout vs Inbounds After Foul)

Opponent Shot Made vs Live Def Reb vs Live Off Reb

The graph below shows the probability distribution for the number of seconds that elapse before a shot event for plays that start after an opponent’s made shot, a live defensive rebound, and a live offensive rebound.

Time to Shot Event (Opp Shot Made vs Live Def Reb vs Live Off Reb)

Dead Ball Turnover vs Steal

The graph below shows the probability distribution for the number of seconds that elapse before a shot event for plays that start after a dead ball turnover and steals.

Time to Shot Event (Dead Ball Turnovers vs Steals)

Explore These Distributions

The graphs above only illustrate the time to event distributions for shot events. There are other events like personal fouls and turnovers that warrant their own time to event distributions for simulation purposes.

You can use the following files to further examine these and other time to event distributions:

  • times.R - This R script creates the graphs above, and has some code that can be used to examine the distributions for personal fouls and turnovers.
  • times.csv - This CSV data file contains the elapsed times extracted from the play-by-play from the 06-07 to 08-09 regular seasons.

In the times.csv data file you can see the play starting events and the play ending events that you can then examine with times.R.

Summary

By estimating these distributions, we can now get a general idea as to how much time elapses for various NBA events. This will provide a starting point for being able to realistically simulate actual NBA periods versus simply X number of possessions.

One question worth answering is how useful quick shot or drain the clock strategies are. This opens up a lot of other questions such as: what kind of field goal percentages and turnover rates can we realistically expect using these strategies? Hopefully this is a starting point towards moving in that direction.

May 31 2009

Conditioning the Distribution of Play Ending Events Given How the Play Starts

My last post took a very general look at how plays end in the NBA. To better understand how the game works, it’s important to know how these distributions vary based on how the play starts.

How Do Plays Start?

Plays can start in a variety of ways. I’ve broken these ways into the following 20 categories (along with tags I have given these events that you will see again).

  • The start of the game, quarters, and overtime periods: period_start
  • After a timeout: timeout
  • After an opponent made shot, after which the ball is brought in out of bounds: 2fg_make, 3fg_make, ft_make
  • Defensive rebounds under both dead and live ball situations: dreb_2fg_dead, dreb_2fg_live, dreb_3fg_dead, dreb_3fg_live, dreb_ft_dead, dreb_ft_live
  • Offensive rebounds under both dead and live ball situations: oreb_2fg_dead, oreb_2fg_live, oreb_3fg_dead, oreb_3fg_live, oreb_ft_dead, oreb_ft_live
  • After a foul that leads to the ball being taken in from out of bounds rather than resulting in free throws: inbounds_after_foul
  • Opponent turnovers under both dead and live ball situations: tov_dead, tov_steal

The Distributions

By extracting play-by-play data from the 06-07 to 08-09 seasons, I’ve come up with the following spreadsheet that contains the distributions for each season, separated by home and away:

http://spreadsheets.google.com/ccc?key=rFQ-skpTEnN3P7uyLqtJbGQ

For each season, location, and play starting event, the median and 95% credible interval for the true proportion of each play ending event is listed. These proportions are modeled by a Dirichlet distribution using a noninformative prior distribution.

A Note About Team Offensive Rebounds

While extracting the play-by-play data, I realized that the NBA awards a team offensive rebound when the offensive team misses a shot (and iron) while the 24 second shot clock expires. Thus I’ve made the decision to ignore these shots and only keep the turnovers in these situations.

This is done so that these turnovers don’t skew the distribution of dead ball offensive rebounds, and are instead attributed to the play starting event that led to the original shot. Understanding the time distribution of shots will come later.

Telling the Full Story

Although these distributions are important, they aren’t something we can easily work with visually or by hand. Thus they don’t paint a very good picture as to what is going on just by looking at them.

To solve this, I’ve created some graphs to help paint this picture. These graphs only include some of the more likely play starting events.

Graph: 2FG% Given Play Start

In order to understand how things like shooting differ between the play starting events, we must condition on knowing that there was a field goal attempt.

The following graph shows this for 2FG% for the 06-07 to 08-09 seasons, where red represents away teams, blue home teams, dots the median, and the lines cover the 95% credible interval:

Graph: 3FG% Given Play Start

The following graph shows how 3FG% varies based on the play starting event, conditional on knowing that a 3pt field goal attempt took place. This graph is also for the 06-07 to 08-09 seasons, where red represents away teams, blue home teams, dots the median, and the lines cover the 95% credible interval:

Graph: eFG% Given Play Start

The following graph shows how eFG% (effective FG%) varies based on the play starting event, conditional on knowing that a field goal attempt took place. This graph is also for the 06-07 to 08-09 seasons, where red represents away teams, blue home teams, dots the median, and the lines cover the 95% credible interval:

The 2FG%, 3FG%, and eFG% graphs are drawn to the same scale so that they can be evaluated side by side. One interesting thing about the 2FG% vs eFG% graphs is that all play starting events have a higher eFG% than 2FG% except for tov_steal. This makes me wonder if teams are using the optimal shot distribution for plays that start with a steal.

Graph: Points per 100 Shot Events Given Play Start

The following graph shows how many points to expect per 100 shot attempts varies based on the play starting event, conditional on knowing that a shot event took place. This differs from 2FG%, 3FG%, and eFG% in that it illustrates the impact of free throws associated with shooting fouls. Hence a shot event is defined to be all made and missed shots, regardless of whether a shooting foul took place.

To simplify things, I assume the free throws are made at a rate of 75%. In other words, we expect 0.75 points per free throw attempt. This graph is also for the 06-07 to 08-09 seasons, where red represents away teams, blue home teams, dots the median, and the lines cover the 95% credible interval:

Reproduce These Results

There are other ways to work with these distributions than I’ve shown here, so here is how you can reproduce these results and also examine other aspects of the distributions:

  • dist.full.csv - This CSV data file contains the distribution of events as counts that are easily fed as parameters to the Dirichlet distribution.
  • dist.full.R - Using dist.full.csv, this R script: 1) creates a CSV data file containing the medians and 95% credible intervals for the distributions (as seen in the spreadsheet), and 2) generate the graphs above. It requires the R package MCMCpack for rdirichlet().

Summary

Clearly steals rock when we know a shot took place. There are other events like turnovers (that might lead to a steal) and non-shooting fouls that impact the odds of a team winning any given game. So there is certainly much left to explore.

I’m interested to hear in any results you may find using the data and code listed above.

May 19 2009

The Distribution of Play Ending Events in the NBA

I have come to the realization that I really don’t understand the NBA game all that well.

Sure I have a general knowledge of basketball, but as I work toward building a realistic simulation of the NBA, I realize that I don’t understand the dynamics of the game that impact a team’s chances of scoring points.

By quantifying the distribution of play ending events, I will be taking the first step in the direction of understanding the dynamics of the game.

What is a play?

The terms possession and play get thrown around a lot, so I want to be clear on the definition of a play that I am using here:

  • play - period of play before a play ending event

Ok so that really doesn’t help. The real understanding comes in the definition of a play ending event:

  • play ending event - all shot events, any event that stops play or gives the opponent the ball, and any event that creates a free throw opportunity

In general, play ending events can be broken down into four basic categories: fouls, shots, timeouts, and turnovers.

The General Distribution of Play Ending Events

The general distribution of these four basic categories is as follows:

Season Location Foul% Shot% Timeout% Turnover%
08-09 Away 8.5% 75.4% 5.5% 10.5%
08-09 Home 8.7% 75.5% 5.5% 10.3%
07-08 Away 8.3% 75.6% 5.4% 10.6%
07-08 Home 8.3% 76.0% 5.4% 10.2%
06-07 Away 9.2% 74.2% 5.6% 11.0%
06-07 Home 9.2% 74.1% 5.8% 10.9%

This data was compiled from 137,706, 140,343, and 136,108 away play ending events, and from 136,971, 139,543, and 135,805 home play ending events from the 08-09, 07-08, and 06-07 seasons, respectively.

I believe I need to make it clear that I consider shooting fouls a component of shots, and thus I have grouped them with Shot% and not Foul%. Also, I group offensive foul turnovers with Foul% instead of Turnover%. These distinctions will be made clear below.

One result from this table that interests me is the difference between Foul% and Shot% when comparing the 06-07 season to the other two seasons.

There are enough events to say these are statistically significant from each other, so I’m interested to know if 1) some rule change caused this, 2) some other explanatory reason made this happen that I’m missing (such as the distribution of play starting events, which I will cover in the future), 3) this really was just by chance, or 4) I have some perl code not working as desired.

That said, these general categories give us an idea how plays end, but they don’t really tell us how play ending events for home versus away teams differ. Digging into more detail will shed some light onto this.

Distribution of Fouls

The percentages below are on a per play basis. So this means they are not conditional on knowing there was a foul, which is why they do not sum to 1.

SEA LOC CP D3S DP DT FT1 FT2 OFF PF TECH MISC
08-09 A 0.02% 0.30% 0.013% 0.027% 0.025% 0.003% 1.54% 6.22% 0.28% 0.04%
08-09 H 0.03% 0.31% 0.010% 0.029% 0.030% 0.005% 1.49% 6.47% 0.30% 0.03%
07-08 A 0.02% 0.32% 0.011% 0.026% 0.027% 0.001% 1.55% 6.09% 0.25% 0.02%
07-08 H 0.03% 0.30% 0.014% 0.020% 0.033% 0.001% 1.48% 6.17% 0.27% 0.02%
06-07 A 0.03% 0.40% 0.007% 0.028% 0.032% 0.004% 1.91% 6.45% 0.33% 0.02%
06-07 H 0.03% 0.39% 0.005% 0.020% 0.035% 0.003% 1.77% 6.65% 0.31% 0.03%

Abbreviations: SEA: Season; LOC: Team Location, A=Away and H=Home; CP: clear path; D3S: defensive 3 seconds (includes all “illegal defense” events for the 06-07 play-by-play); DP: double personal; DT: double technical; FT1: flagrant type 1; FT2: flagrant type 2; OFF: offensive foul; PF: personal fouls; TECH: technicals; MISC: all other fouls.

Distribution of Shots

The percentages below are also on a per play basis.

2 point shots:

Season Location Make% Miss% Make+SF% Miss+SF% Blocked%
08-09 Away 23.4% 23.3% 1.93% 6.96% 4.26%
08-09 Home 24.4% 23.5% 1.88% 6.65% 3.69%
07-08 Away 23.5% 23.6% 1.96% 7.08% 4.20%
07-08 Home 24.7% 23.7% 1.81% 6.83% 3.58%
06-07 Away 25.3% 28.0% 2.10% 7.10% 4.10%
06-07 Home 26.2% 28.0% 2.03% 6.76% 3.57%

3 point shots:

Season Location Make% Miss% Make+SF% Miss+SF% Blocked%
08-09 Away 5.60% 9.73% 0.025% 0.113% 0.109%
08-09 Home 5.65% 9.56% 0.023% 0.101% 0.092%
07-08 Away 5.46% 9.68% 0.022% 0.101% 0.094%
07-08 Home 5.55% 9.65% 0.029% 0.105% 0.087%
06-07 Away 2.93% 4.57% 0.021% 0.096% 0.051%
06-07 Home 3.02% 4.48% 0.013% 0.099% 0.030%

Distribution of Turnovers

Like the other distributions above, the percentages below are also on a per play basis.

Season Location Steal% Dead Ball%
08-09 Away 6.2% 4.3%
08-09 Home 6.1% 4.1%
07-08 Away 6.2% 4.4%
07-08 Home 6.1% 4.1%
06-07 Away 6.1% 4.8%
06-07 Home 6.1% 4.7%

Summary

The distributions presented above are simply one component of plays in the NBA. The next step is to examine how plays start, as this has a role in how a given play ends.

From there, the ultimate goal is to then quantify the distribution of how plays end based on how they started. This will help answer questions like, “What proportion of plays end with a 2pt FG make + shooting foul when the play starts on a steal?” or “Does the data provide evidence that there is a positive or negative relationship with this proportion and playing at home?”

These are simply a couple of examples of the many questions that I want to be able to answer to help better understand how the game works.

Apr 8 2009

Player Statistics at Home vs Away

It has become apparent to me that to we must study the relationship between game situations and the player statistics collected under these game situations before we can fully understand the stats we collect about players. Players enter the game under varying conditions, thus the distribution of game situations is not uniform across all players. Because of this, I feel we can gain insight by studying how these game situations relate to individual player stats.

For example: I’d like to know what sort of relationship garbage time eFG% has to non-garbage time eFG%. This is merely one of many possible questions we can answer by studying game situations.

Home vs Away

The most basic game situation to study is home vs away. We’re all familiar with how much a team’s home court advantage is worth in terms of points or winning percentage, but what about the relationship between a player’s eFG% or turnovers per possession at home vs away?

The Method and Data

Using data collected for the 2006-2007, 2007-2008, and 2008-2009 regular seasons, I calculated the following statistics for each player: FT%, 2pt FG%, 3pt FG%, eFG%, OReb%, DReb%, turnovers per offensive possession, fouls drawn per offensive possession, personal fouls per defensive possession, and steals per defensive possession.

Using R, I calculated correlation coefficients and fit linear models to the data for all players that took part in at least 100 events at home and away in each category (such as 100 FTA, 100 2FGA, 100 3FGA, 100 offensive possessions, etc). See this file for the raw results.

The Correlation Coefficients

Year FT% 2FG% 3FG% eFG% OR% DR% TO% Fouled% Steal% Foul%
06-07 0.865 0.653 0.465 0.598 0.908 0.913 0.661 0.870 0.599 0.847
07-08 0.835 0.655 0.151 0.612 0.896 0.933 0.675 0.855 0.586 0.857
08-09 0.816 0.614 0.346 0.540 0.905 0.899 0.670 0.856 0.561 0.798

The Relationships in Visual Form

As much fun as it may be to look at correlation coefficients, graphing the data with the fitted linear models helps paint a better picture. The graphs below illustrate these relationships from the 08-09 regular season:

FT% 2FG%
3FG% eFG%
OR% DR%
TO/Poss Fouled/Poss
Steals/Poss Fouls/Poss

Making Predictions

The whole point of this is to make some sort of prediction about a player’s stats given some information (such as how they’ve performed at home).

Based on the models fit to this data, knowing a player’s stats at home gives us information about player’s road stats. (Except, of course, for the models fit to the 3FG% data from the 06-07 and 07-08 seasons).

These results, however, should not surprise anyone. Obviously there is a connection between home vs away stats. Hopefully, however, this helps answer the magnitude of the relationship between a player’s stats at home vs away.

My goal is to use the framework outlined above to quantify the relationship between player stats in other game situations of interest (such as garbage time vs non-garbage time).

Reproduce These Results

To reproduce these results, you’ll need to download the following files:

By running source(”home_vs_away.R”) in R, a file with the raw results will be created. Also, to plot the graphs, simply uncomment the plot() code in the home_vs_away() function.

Summary

This data allows us to quantify the relationship between various player stats at home vs away. Other than the before mentioned 3FG% models, all of the linear models showed the home stats to be statistically significant for predicting the away stats. To use these models, see the raw results file.

UPDATE: Per Nick’s suggestion, I’ve re-scaled the graphics so that the x-axis and y-axis cover the same distance.

 Page 1 of 9  1  2  3  4  5 » ...  Last » 

Latest Twitter Update