Mar 17 2009

Effective Rebounding Rates

Earlier in the season I wrote myself the following note while watching a Clippers game:

Is Baron Davis really a “very good” rebounder for a guard? How would we determine this?

(Please continue once you’re done laughing at the fact I was actually watching a Clippers game.)

This note was the result of listening to one of the announcers proclaim that Baron Davis is a “very good” rebounder for a guard. Naturally, I would like to know how we might classify people as “very good” rebounders.

More recently, I’ve heard announcers praise Jason Kidd for being the best rebounding point guard in the NBA. This comment naturally comes from his high rebounding totals, but what do his rebounding rates look like?

Using Rebounding Rates Instead of Rebounding Totals

We all know that rebounding rates, not rebounding totals, give us the best picture with respect to offensive and defensive rebounding. That said, I think we can paint a better picture using more than raw rebounding rates.

Thus the question I asked myself was: “What if we look at rebounding rates based on the shot location?”

My hope is that by using shot location data we can get a sense of how rebounding rates for different positions change based on the shot location. Also, we can use this data to neutralize a player’s rebounding rate based on these shot locations, since players do not face the same shot distributions while on the court.

Shot Locations

Using data from the ’07-’08 season, I collected the number of rebounds each player obtained and missed while on offense and defense based on rebound opportunities coming from the following shot locations:

  • Low Paint – The area in the paint within 6 feet of the hoop
  • Mid-Range – All other 2pt shots
  • 3pt Shots
  • Free Throws

With these rates in hand, I neutralized each player’s offensive and defensive rebounding rates to create an effective rebounding rate metric that weights rebounding rates based on the average distribution of rebounding opportunities from the ’07-’08 season.

This means that a player’s effective rebounding rate is calculated by weighting the player’s low paint rebounding rate by 25.8%, the mid-range rebounding rate by 43.3%, the 3pt rebounding rate by 24%, and the free throw rebounding rate by 6.8%.

The Results

The spreadsheets below list the results of performing the calculations listed above. The results are grouped by position, with the players sorted from highest effective rebounding rate to lowest effective rebounding rate.

It is worth noting that players that did not have at least 100 opportunities from each shot location on offense and defense were removed from this data set.

The following spreadsheet lists the offensive rebounding results:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oquI6XwF2O5IUQ

The following spreadsheet lists the defensive rebounding results:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oqthtSWrpCsFOw

Back to Baron Davis and Jason Kidd

On offense, Baron’s effective rebounding rate ranks him as the #6 offensive rebounding point guard of ’07-’08, while Jason’s effective rebounding rate ranks him as the #4 offensive rebounding point guard of ’07-’08.

On defense, it’s safe to say Kidd was a monster in ’07-’08. His effective rebounding rate ranks him as the #1 defensive rebounding point guard of ’07-’08, while Baron’s effective rebounding rate ranks him as the #10 defensive rebounding point guard of ’07-’08.

In ’07-’08 at least, it’s safe to say these guys were very good rebounders. I was surprised to see just how high Jason Kidd’s defensive rebounding rates were. Also, his ability to obtain that percentage of rebounds on free throws is impressive. Makes me want to watch some tape and see what he’s doing differently than other point guards.

What this doesn’t tell us

Unfortunately this doesn’t give us much insight into what external forces affect the player’s rebounding rates. This measure does not capture important components like teammates, opponents, and coaching philosophies. That said, I believe this is worth looking at to hopefully provide a source of motivation for tackling these issues in the future.

Future Work

One idea I have for the future is to try and relate rebounding rates to shot distance. In the end it might make more sense to use discrete shot locations, but I’m interested in seeing if there are any general curves that form based on the distance of the shot from the basket.

As mentioned in the previous section, work can be done to try and determine how teammates, opponents, and coaching philosophies relate to a player’s individual rebounding rates. Lastly, I’m hoping to see how aging curves fit to these rebounding rates. I’ve got a lot of historical play-by-play to parse before I can do that, though.

Mar 11 2009

Referee Efficiency Ratings

Last Saturday I attended the MIT Sloan Sports Analytics Conference. During the Basketball Analytics panel, this quote from Mark Cuban got me thinking:

There’s not 10 players on the court, there’s 13. And three of them determine about 80 percent of what happens out there.

Along with this excerpt, he mentioned something along the lines of: “so if you’re not looking at the refs then you’re missing out on a lot.”

The point I get is, if we don’t understand refs, then we don’t understand the game.

With this, he makes a very good point. I think Mark gets a bad rap for wanting to talk about refs, when in reality we should be talking about refs. With all of the on/off-court work we’re doing, refs seem to be an obvious component to add. They absolutely are on the court. And they’re on the court a lot.

So why, then, have we not been looking at refs? I can come up with a few quick reasons:

  1. Refs don’t get credit for points in the box score, so ESPN doesn’t highlight their contributions on SportsCenter.
  2. Your favorite team can’t sign and/or trade refs.
  3. We don’t give MVP awards to refs.
  4. Refs don’t dunk or hit the game winning shot.

There are certainly many more, I bet, but that really isn’t the point of this post.

The point of this post is to measure the relationship between referees and efficiency.

The Model

To measure this relationship, I fit the following model to data from the ’07-’08 season:

Efficiency = Intercept + HCA + O1 + … + On + D1 + … + Dn + R1 + … + Rn

Where Efficiency = Points Scored per 100 Possessions, the home court advantage HCA = 1 if the offensive team is at home, O1,…,On = 1 if the player is on offense, D1,…,Dn = 1 if the player is on defense, and R1,…,Rn = 1 if the referee is on the court. These variables are 0 otherwise.

The Results

The following spreadsheet lists the referee ratings:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oqtkdHwDWJ0CZg

The following spreadsheet lists the offensive ratings:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oquzr1K4x3rbQQ

The following spreadsheet lists the defensive ratings:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oquv4qUgn1Rbbw

Interpreting the Results

To interpret these results, we want to think in terms of holding all of the other variables at some constant value. As an example, for the refs, we’d estimate that Derrick Stafford is associated with an increase in 7.8 points per 100 possessions he’s on the court.

That being said, what would we want a ref’s measured relationship be? In my mind we’d want their relationship to be 0, as this would indicate the referees all have the same relationship to efficiency. Based on these results, I think it’s safe to say this is not the case. I’m sure Mark Cuban would agree.

The astute reader will notice all of the refs have standard errors that indicate the coefficients are not statistically significant from 0. Thus someone might conclude refs (or coaches, for that matter) don’t have an impact. But given our knowledge of the game and these results, it is safe to say that these refs do not call everything the same way.

A final point to mention is that there are always 3 refs on the court at one time. Thus their combined impact is not likely to be 0. That being said, it might be worth treating refs as individual units to see what sort of relationship ref units have with efficiency.

What Things do Refs do Differently?

Just like player ratings, this model doesn’t tell us why this relationship exists. It doesn’t tell us what refs are doing differently from each other, it’s just telling us there is some difference.

In an attempt to quantify this sort of thing, I’ll be including refs in all of my future on/off-court work. Hopefully this will give further insight into the differences between refs.

Mar 6 2009

Measuring the Relationship Between Players and their Lineup’s Effective FG%

In my last post I presented a method for measuring the relationship between players and their lineup’s shot distribution.

This, however, is only part of the picture. We also need to know the relationship between players and their lineup’s shooting percentages from those locations on the court to determine if a player has a positive or negative relationship with their lineup.

To measure this relationship, we can combine the shooting percentage and shot distribution relationships to calculate the relationship between a player and their lineup’s effective field goal % (eFG%). This measure takes into account the fact that 3pt shots are worth more points, so it has a direct relationship to points per field goal attempt.

The Shot Distributions

While putting this data together I realized I needed to re-define the areas on the court. This is because some of the areas like the high paint and the corner 3pt shot are associated with small samples that do not provide a very good link between lineups when rating FG%. Thus the FG% ratings using 5 areas is filled with a bunch of noise.

To resolve this, I decided to use these three areas:

  • Low Paint – Shots in the paint within 6 feet of the basket
  • Mid-Range – All other 2pt shots
  • All 3pt Shots

It would be nice to separate the high paint from the mid-range shots and the corner 3pt shots from the other 3pt shots, but it doesn’t appear as if the method for rating FG% would handle these small samples well.

To see the relationship between players and their lineup’s shot distribution with respect to these three areas of the court, see the following spreadsheet:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oquK5TB7tWIDNQ

This should match up well with the results from my last post, as the high paint shots have been merged with mid-range shots, and corner 3pt and other 3pt shots have been combined to simply be all 3pt shots.

As before, a number of 1.5 indicates this player is associated with an increase of 1.5% more shots from this location on the court compared with the league average player at that position.

The Shooting Percentages

I’ve already looked at the low paint, but now I need to present the results from the other two locations.

You can see these results in the following spreadsheet:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oqvrsxuJnpI3Kg

This spreadsheet shows how a player is associated with shooting percentages on the three locations of the court when compared the league average player at that position.

As an example, here is how you would interpret Chris Paul’s numbers (line 20):

On Offense: Chris Paul is associated with a 0.1% FG% increase from the low paint; a 0.7% FG% increase from mid-range; and a 1.3% FG% increase from 3pt range.

On Defense: Chris Paul is associated with a 2.8% FG% decrease from the low paint; a 2.3% FG% increase from mid-range; and a 11.7% FG% increase from 3pt range.

I think it’s fair to say that this relationship between Chris Paul and defensive 3pt FG% played a major role in his ’07-’08 defensive adjusted +/- rating last year.

Combining Shot Distributions and Shooting Percentages

Using the shot distribution and shooting percentage data, I can calculate an offensive and defensive eFG% for each player.

These ratings can be found in the following spreadsheet:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oqsrV5hEIH36NA

As an example, here is how you would interpret Manu Ginobili’s rating (line 6):

On Offense: When facing a league average defense and surrounded by an offensive lineup consisting of a league average PG, SF, PF, and C, Manu is associated with a 50.6% offensive eFG%.

On Defense: When facing a league average offense and surrounded by a defensive lineup consisting of a league average PG, SF, PF, and C, Manu is associated with a 43.5% defensive eFG%.

Net eFG% Rating: Subtracting the defensive eFG% from the offensive eFG% provides the net eFG%. Based on this ’07-’08 data, Dwight Howard is rated as the best in net eFG%.

Lineup Combinations

In my last post I presented a hypothetical loaning of Yao Ming to the ’07-’08 Celtics. Under this scenario, Yao Ming would replace Kendrick Perkins and would be paired up with Rajon Rondo, Ray Allen, Paul Pierce, Kevin Garnett.

With Perkins, the ’07-’08 Celtics offensive eFG% would be 54.7%, while the defensive eFG% would be 42.8%. This equates to a difference of 11.9%.

With Ming, we would estimate that this lineup would have an offensive eFG% of 53.7%, while the defensive eFG% would be 43%. This equates to a difference of 10.7%.

So strictly in terms of eFG%, the Celtics would prefer to keep Perkins over Ming. This advantage is very small, though. In terms of points, this equates to an advantage of 2.4 points per 100 shots for the ’07-’08 Celtics lineup with Perkins over Ming.

Summary

These are fun to look at, and I believe they provide an interesting look at the game that we’ve never had before. That being said, we need to know how these ratings vary from year to year. Can we create aging curves for these offensive and defensive eFG% (or for specific areas of the court)? How well do these predict future results?

In addition, while important, shooting does not explain everything. Turnovers, rebounding, and free throw shooting all play an important role in winning games. So these are the areas for future investigation when comparing players.

Mar 4 2009

Measuring the Relationship Between Players and their Lineup’s Shot Distribution

In my last post I looked at how we might rate a player’s impact on their lineup’s FG% in the low paint. With this came the obvious question of: “What about shot distribution?”

With this question in mind, I’ve finally put forth efforts into trying to make sense out of how players fit together. In this case I’m simply trying to figure out the relationship between players and shot distribution. Coaching certainly matters, but I’ve gotta start somewhere!

The Model

Similar to my last post, I’ve fit an adjusted plus/minus-like logistic regression for approximating the shot distribution for these 5 locations on the court:

  1. Low Paint – The area in the paint within 6 feet of the basket
  2. High Paint – All other shots in the paint
  3. Mid-Range – All 2pt shots outside of the paint
  4. Corner 3s – 3pt shots on the sidelines up to 14 feet from the baseline
  5. Other 3s – All other 3pt shots

Also, like before, I’ve used data from the ’07-’08 season and accounted for all players that took part in at least 1600 shots.

The Results

For this model I feel the best way to present the results is by spreadsheet:

http://spreadsheets.google.com/ccc?key=pLJimPjd7oqtOTygaQH2dWw

In this spreadsheet you will find the relationship between each player and their lineup’s offensive and defensive shot distributions with respect to the average player at each specific position.

Take Steve Nash as an example (line 9).

On offense, Nash is associated with: a 0.5% increase in shots from the low paint; a 0.6% decrease in shots from the high paint; a 5.8% decrease in shots from mid-range; a 1.3% increase in corner 3pt shots; and a 4.5% increase in all other 3pt shots.

On defense, Nash is associated with: a 0.4% increase in shots from the low paint; a 1.4% increase in shots from the high paint; a 1% increase in shots from mid-range; a 1% decrease in corner 3pt shots; and a 1.8% decrease in all other 3pt shots.

Again, these numbers are with respect to the average point guard in this data set.

Combining Players

Based on the construction of this model, we can combine players to get an approximation of what their lineup’s offensive and defensive shot distributions would look like.

Clearly this is not without error. From an offensive standpoint, a coach can in some ways control each individual player’s shot distributions, which affects the lineup’s overall shot distribution. From a defensive standpoint, a common held belief is that a “system” can play a big role (see the Celtics last year).

With these obvious realites that the model does not take into account, we can take a peek at what the prediction would be.

Average Offense and Defense

We can use the average offensive and defensive lineup as a starting point. The average offensive and defensive shot distributions would look something like this:

  • Low Paint – 32.3%
  • High Paint – 11.4%
  • Mid-Range – 37.1%
  • Corner 3s – 4.5%
  • Other 3s – 14.5%

The ’07-’08 Boston Celtics

We’ll first take a look at the ’07-’08 champion Boston Celtics most used lineup of Rajon Rondo, Ray Allen, Paul Pierce, Kevin Garnett, and Kendrick Perkins. Against the average lineup from this data set, their offensive shot distribution would look something like:

  • Low Paint – 38%
  • High Paint – 10.5%
  • Mid-Range – 31.9%
  • Corner 3s – 4%
  • Other 3s – 15.5%

Their defensive shot distribution would look something like:

  • Low Paint – 31.2%
  • High Paint – 10.6%
  • Mid-Range – 37.9%
  • Corner 3s – 4.9%
  • Other 3s – 15.3%

Let the Fun Begin

Lets imagine a world in which the Rockets loaned the Celtics Yao Ming in exchange for some time with Kendrick Perkins. What sort of shot distribution would this lineup of Rajon Rondo, Ray Allen, Paul Pierce, Kevin Garnett, and Yao Ming have?

Based on this model, their offensive shot distribution would look something like:

  • Low Paint – 32%
  • High Paint – 13.5%
  • Mid-Range – 36.4%
  • Corner 3s – 4.6%
  • Other 3s – 13.4%

Their defensive shot distribution would look something like:

  • Low Paint – 24.3%
  • High Paint – 11.9%
  • Mid-Range – 49%
  • Corner 3s – 3.3%
  • Other 3s – 11.5%

Since this model doesn’t account for coaching affects, we’d naturally assume there is some extra error involved with taking a player from another team, in this case Yao Ming, and placing him with this new lineup.

What I find most interesting, however, is the defensive aspects of this. I think it is fair to say that this lineup with Ming would do a better job of keeping shots out of the low paint. This shouldn’t surprise anyone, but it is nice to be able to put some numbers to this.

Shooting Percentages

The shot distribution is just part of the picture. The next step is to look at shooting percentages from all locations on the court. I’ve already looked at the low paint, but by examining the other areas I believe we can come up with a model that would allow us to attach an offensive and defensive eFG% to a given lineup.

If we were able to do this, then we’d have a metric that would allow us to gauge the effectiveness of a lineup with respect to shooting. Getting that far would allow us to look at doing similar things for the other four factors: turnovers, rebounding, and free throws, as controlling the ball, getting boards, and getting to the line and keeping your opponent off of the line are all important parts of the game.

Replicate these Results

First, you’ll need to download the dist.zip archive (4MB).

The first thing you might want to replicate is the regressions. In the *.dist directories you will find an associated R file that will run the logistic regression for that location on the court. Simply source() these files from R and everything should run without issue.

The other area of interest is the lineup combinations. Inside of the dist.results directory, you will find a dist.R file that contains functions to obtain results from the fitted models. The function of most interest will be the dist.combine_players() function. To use this function, you’ll first need to run source(“dist.R”). Note: You do not need to run the regressions to use this function.

Without arguments, dist.combine_players() displays results for league average players at each position. This function, however, takes 5 arguments: PG, SG, SF, PF, and C. These arguments allow you to specify players at each position. So to see results for a lineup of Allen Iverson, Dwyane Wade, Paul Pierce, LaMarcus Aldridge, and Tim Duncan, run:

  • dist.combine_players(PG=”Allen Iverson”, SG=”Dwyane Wade”, SF=”Paul Pierce”, PF=”LaMarcus Aldridge”, C=”Tim Duncan”)

If you run this, you should get the following offensive shot distribution:

  • Low Paint – 38%
  • High Paint – 23%
  • Mid-Range – 23.8%
  • Corner 3s – 5.5%
  • Other 3s – 9.5%

Also, you should get the following defensive shot distribution:

  • Low Paint – 30.1%
  • High Paint – 10.3%
  • Mid-Range – 42.5%
  • Corner 3s – 4.7%
  • Other 3s – 12.3%

Summary

As mentioned before, the shot distribution is just half the battle. We need to attach shooting percentages to each of these locations, as that will allow us to truly determine the effectiveness of a lineup with respect to shooting.

One area of concern is predictability. Would this model (or one like it) do a better job of fitting players together than a coach or GM? Once I look at shooting percentages, the next step will be to look at data for historical seasons to see what sort of year-to-year relationships exist, and to see how well predictions can be made from one year to the next.

As of now, that’s a complete unknown. But clearly that will determine just how effective this type of model is.

Mar 2 2009

Rating a Player’s Impact on Shooting Percentages in the Low Paint

Studying the relationship between shooting and defensive efficiency has made me wonder what, if anything, we can learn by rating a player’s impact on shooting percentages from various locations on the court.

The Model

Borrowing from the idea of adjusted plus/minus, I ran a logistic regression for data from the ’07-’08 regular season for the field goals made and missed in the low paint for each lineup combination in the data set. (Again, the low paint is defined as an area in the paint within 6 feet of the basket.)

For this fit, I only used data from players that took part in at least 1600 combined offensive and defensive shots in the low paint. I arrived at this number fairly arbitrarily, but it is close to 20 combined offensive and defensive shots per game. Also, I controlled for the home court advantage.

To run this regression yourself, simply download the lp.zip archive. Inside you will find:

  • lp.R: From R, run source(“lp.R”) to run the regression.
  • lp.csv: A CSV file for the data used in the regression
  • lp.formula: The formula for the regression. You can modify this to add (or remove) players.
  • lp.players: A file listing the players and their IDs. The number of combined offensive and defensive shots is listed in parenthesis.
  • lp.results.txt: The results from my run of the regression

The Results

I’m sure a majority of the people reading this post will simply want to know who’s in the top ten and bottom ten on offense and defense. But that’s not really why I’m here, so the important nerd details can be found in the lp.results.txt file. I’ll leave it up to you to dig into the file.

Offense: Top 10 (from best to worst)

  1. Carlos Boozer
  2. Steve Nash
  3. Dwight Howard
  4. Marcus Camby
  5. Thaddeus Young
  6. Dwyane Wade
  7. Kobe Bryant
  8. LeBron James
  9. Steve Blake
  10. Amare Stoudemire

Offense: Bottom 10 (from worst to best)

  1. Allen Iverson
  2. Yi Jianlian
  3. Samuel Dalembert
  4. Chauncey Billups
  5. Beno Udrih
  6. Corey Brewer
  7. Delonte West
  8. Lamar Odom
  9. Ben Gordon
  10. Ben Wallace

Defense: Top 10 (from best to worst)

  1. Zydrunas Ilgauskas
  2. Kevin Garnett
  3. Brendan Haywood
  4. Joakim Noah
  5. Yao Ming
  6. Andris Biedrins
  7. Josh Smith
  8. Joel Przybilla
  9. Lamar Odom
  10. Manu Ginobili

Defense: Bottom 10 (from worst to best)

  1. Craig Smith
  2. Juan Carlos Navarro
  3. Hakim Warrick
  4. Morris Peterson
  5. Jason Williams
  6. Jeff McInnis
  7. Jordan Farmar
  8. Boris Diaw
  9. DeShawn Stevenson
  10. Jose Calderon

With these obligatory lists out of the way, I hope to get to some real substance.

What does this tell us?

The most important question to ask ourselves is: “What exactly is this telling us?” This data is conditional on a lot of stuff. Before I get to that, though, I certainly don’t want to give off the impression that a team’s sole goal is to maximize their probability of making shots in the low paint. Clearly basketball is much more than that, but measuring a player’s impact on all aspects of the game is important. So this is just one small piece of the overall puzzle.

That being said, ideally we could get a context free measure of how a player impacts their lineup’s probability of making a shot in the low paint, but this is far from it. These ratings merely hold for home court advantage and the strength of opposing lineups and teammates based on the strategies they’ve chosen to use. In a lot of cases, statistically significant coefficients were not found (again, see the regression results for the details), so even with the conditional aspects of the data to think about, there is still uncertainty with a lot of players’ ratings.

To get an idea of how we might (or might not) use this data, I’ll use the most interesting result from this regression: Allen Iverson’s offense.

Allen Iverson’s Offense

Because of the results in Detroit, Allen Iverson is getting a lot of attention. He seems to be the guy a lot of people love to hate right now. If we were to take these ratings at face value, then we could simply pile on top of Iverson, as he’s got the worst offensive rating. But is this fair?

One reason I don’t believe this is fair has to do with the fact that overall shot distribution is important. According to 82games.com, Iverson took 30% of his shots close to the rim last year, where as a guy like Steve Nash took only 14% of his shots from the same location. Since it’s fair to say Iverson takes a larger percentage of his lineup’s shots compared to Nash, it’s fair to say that his low paint FG% will have a larger impact on his lineup’s low paint FG% when compared to Nash.

I think it’s safe to say that most every team would rather have Nash over Iverson, but I think it’s worth studying how Iverson’s shot selection hurts (or helps?) his team. Does his athletic ability that allows him to take this high percentage of shots close to the rim help, or is he rather forcing shots that are hurting his team’s efficiency? We can’t say for sure based just on this rating.

Looking at overall shot selection with respect to these ratings might shed some light onto the situation, but I’ll save that for another day.

Summary

I believe there is some value in these results, but like most everything else in basketball, we need to examine other measures to determine how a player will impact any given lineup combination. Hopefully this will prove to be a useful part of that toolkit.

Up Next: My plan is to run this regression with respect to mid-range jump shots.

Page 7 of 13« First...56789...Last »