Aug 18 2008

Tracking Data from the 2008 NBA Playoffs

After writing roughly 4,000 lines of perl code to get the existing play-by-play data into the format I want to work with, I’m finally ready to track data from the 2008 NBA playoffs. This data collection project (along with the motivation to perform this data collection on a larger scale for the regular season) has been fueled by my desire to build better offensive and defensive models of basketball.

The Goal

My end goal is to use these models as inputs to a simulation engine to attempt to understand how a given lineup will play together. Therefore, it is necessary to understand as much context as possible around the events that take place on the basketball court. In addition to what is already in the play-by-play data, I will be specifically tracking the following data:

  • For Shots: Defenders, Pick Assists, Potential Assists, and Potential Pick Assists.
  • For Turnovers: Defender(s) that forced the turnover along with the (X,Y) location of the turnover
  • For Rebounds: Opponent(s) and teammate(s) near the rebounding player along with the (X,Y) location of the rebound
  • For Fouls: The (X,Y) location of the foul along with shot data (like assists, etc.) for shooting fouls.

I will be collecting this data for 80 of the 2008 NBA playoff games. The other games not a part of this 80 game sample have incomplete play-by-play data that I will need to rectify using another source of play-by-play data. I’m hoping 80 games will be plenty, although I am fairly obsessive compulsive, so if I can complete the data collection process in enough time before the start of the regular season I will try to get data for the rest of the games.

Data Availability

As I described in my welcome post, the data will be open for all to use. To achieve this, I have setup a Google site called NBA Game Tracking where I will upload the data to. This is an open Google site that allows anyone to view the content published to the site. Once the regular season rolls around I’m hoping for trackers to contribute their own work to the site. More on that when the time comes. For now, just know that is where I’ll be uploading the data files as I complete them.

That’s all for now. I’m hoping my setup will allow me to complete this data collection efficiently, so hopefully I’ll have a solid set of game data to work with a week from today.

If you enjoyed this post, use RSS to get notified of new posts.

3 Comments on this post

Trackbacks

  1. Data Pet Peeve #1: Loose Ball Fouls & Rebounds wrote:

    […] fouls that lead to rebounds are recorded in play-by-play data. I’m on game #5 of 80 in the 2008 NBA playoffs tracking project, and I already cringe every time I see a loose ball foul that results in the opponent being […]

    August 21st, 2008 at 12:14 am
  2. Tracking the 2008 NBA Playoffs: What the Data Represents wrote:

    […] my post where I detail my data collection goals for the 2008 NBA playoffs, I spell out the sort of data that I’m tracking and adding to the […]

    August 30th, 2008 at 12:27 am
  3. What I’ve Learned Over the Past Year wrote:

    […] started off on a kick wanting to collect new data. New data is vital for further understanding, but when you don’t have a good […]

    March 7th, 2014 at 7:36 pm