Aug 30 2008

Tracking the 2008 NBA Playoffs: What the Data Represents

In my post where I detail my data collection goals for the 2008 NBA playoffs, I spell out the sort of data that I’m tracking and adding to the play-by-play. This post will expand on that and describe exactly what you’ll see in the data.

First, a quick reminder of the four types of events I’m adding data to:

  1. Shots
  2. Turnovers
  3. Rebounds
  4. Fouls

With that in mind, here is what the fields mean for each event type:

Shots

  • assist – I’m not changing much here except that I’m awarding assists even if the shot was missed. Therefore, if the shot was missed, you could say the assist field represents a potential assist. For the curious, I use the 82games definition:

    Basically if the player after receiving the pass pauses or dribbles around for a while before taking action it’s not an assist, but otherwise if the player takes the pass and immediately shoots (catch and shoot), drives to the basket, or has a little pump fake type move to throw off the defense and then goes up for the shot (with perhaps one small dribble even) then you’re talking assist.

  • opponent – This field is used to track contested shots. I have tried to be as consistent as possible while tracking this data, so let me state my goal for tracking defenders: My goal is to understand the difference between contested and uncontested shots. So the rule of thumb is: If the defender appears to contest the shot then they are tracked as the opponent. Understand that on some shots the opponent tries to contest the shot, but in reality they are too far away and/or come in at a bad angle to get in the shooter’s way. These examples are not counted as contested shots. This field can also be made up of multiple players, in which case the player’s names are separated by the pipe ‘|’ character.
  • pick_assist – This is similar to an assist except this player screened an opponent to allow the shooter to obtain spacing, an uncontested shot, or a better matchup. If the screen leads to a shot then that player is credited here, in very much the same way an assist is defined. It is rare, but in the case where two teammates are setting screens, both player’s names are recorded and are separated by the pipe ‘|’ character.
  • x and y – These integers represent the shot location in feet. Although I’m not actually tracking this (as that was already done by the great folks that score these games), I have merged the location of the shot into the play-by-play. The translation into the court is simple: If you are standing behind the offensive team’s hoop, then x goes from left to right, and y starts at the baseline behind the hoop all the way up to the baseline behind the opponent’s hoop.

Turnovers

  • x and y – These (x,y) coordinates represent the location where possession was gained. These values are translated in the same way as shot location (x,y) coordinates. Note, however, that the team gaining possession is defined as the defensive team. As such, the team losing possession is the offensive team. So keep this in mind when work with these coordinates.

Rebounds

First I want to say that this could be done better (or at least done in a way that will help achieve a specific goal). One of my goals is to try and understand the probability of gaining a rebound when two opponents battle for it. I’m not so sure I’m meeting this goal.

The other goal I have, however, is to understand the likelihood a shot will be rebounded in a specific floor location given the location the shot was taken from. I’m more confident in reaching this goal (or at least understanding it better).

  • opponent – This field holds the opponent(s) (seperated by the pipe ‘|’ character) that directly contested the rebounding player. I could go into all sorts of examples, but basically you have to either attempt to get the rebound (say jump in the air and put your arm in the area) or be actively blocked out by the rebounder when the rebound heads in your direction. Opponents just standing around the rebounder are not tracked.
  • teammate – This field holds the teammate(s) (seperated by the pipe ‘|’ character) that tried to get the rebound when instead their teammate gained possession. See the definition above for how players get put into this field.
  • x and y – These (x,y) coordinates represent the location where the ball was rebounded. It’s common for a ball to bounce around while the player’s try to get the rebound, so understand that this coordinate represents the location where the ball was finally controlled, not where it first landed (or first made contact with a player). Also, for coordinate translations, the shooting team is always considered the offensive team.

Fouls

The only other data being tracked in association with fouls in addition to the (x,y) coordinates is shot-related information for shooting fouls (like assists and pick assists).

  • x and y – These (x,y) coordinates represent the location where the foul took place.

Summary

The types above represent what I am tracking. There is a lot more data in the standard play-by-play than the list above, but hopefully that data is straight forward (everyone should know what a block is, for example).

At some point I will create a definition for each field, but for now you will at least understand the intent behind the data I’m tracking.

Please use the comments below to help clarify any questions you might have about the data I’m tracking.

If you enjoyed this post, use RSS to get notified of new posts.

4 Comments on this post

Trackbacks

  1. Getting Defensive for the 2008-2009 Regular Season wrote:

    […] the 2008 playoffs, I experimented with tracking a lot of data. Thanks in large part to the theoretical model and useful input from the analytic community, I will […]

    September 30th, 2008 at 12:11 am

  1. Mountain said:

    I concur of pick assists.

    On contested shots does a flatfooted contest have near the same impact as a well airborne effort?

    On fouls, this is subjective, but was there a clear path to the basket or a good shot or the likelihood it would develop? That would affect the wisdom of the foul in addition to the simple location data.

    August 30th, 2008 at 2:44 am
  2. Mountain said:

    Other areas that might be worthwhile (though I can understand you might have to set limits) would be passing turnover responsibilty split between passer and receiver or lack of box out that led to an uncontested rebound.

    This latest attempt to move beyond boxscore promises to have discoveries and have past community tracking efforts but to go comprehensive would be a large hard to achieve and sustain enterprise. Not sure of all that Synergy (a fairly large enterprise) currently tracks (especially on defense) but perhaps some of your tracking choices that they do not currently track will get picked up. Especially if team consultants recommend that they be picked up, using their fees.

    August 30th, 2008 at 3:03 am
  3. Ryan said:

    “On contested shots does a flatfooted contest have near the same impact as a well airborne effort?”

    I have no idea what the impact of any contested shot is. The goal is to track that it was contested. I’m not trying to specify if it was merely a hand in the face or a full blown jump in the air (as that can also depend on the matchup–some guys don’t need to jump, or don’t want to so that they don’t get a needless foul). Ideally the data would give you an idea as to who the “poor” defenders are, and then people in the know can look at tape and try to identify fixable issues with technique.

    The point you bring up about fouls is an interesting one. There are clearly times where a player is fouled to prevent a high percentage shot. Knowing how often this occurs might prove worthwhile.

    Hopefully as we move forward this will help create more discussion about turnovers and rebounds, as the cases you suggest are real. A lot of what happens (especially on rebounds) is a function of what the coaches tell the players to do, but anything we can identify that could help us better understand what a player is supposed to do would be ideal. A problem with existing stats is we tend to blame the player more so than what the player is actually being instructed to do.

    I’m hoping that I can provide tools that make it very easy to track new data. If you can record a game then you should be able to track something, even if it is just one small piece of the puzzle. The idea to do this for the 2008 playoffs is so that we can better define what the most important pieces of that puzzle are.

    August 30th, 2008 at 1:45 pm