Mar 29 2009

2006-2007 Regular Season Play-By-Play Data Now Available

I’ve had to focus on other things over the past couple of weeks, so I figured now was as good a time as any to start putting together data sets for past seasons.

’06-’07 Data Set Stats

This regular season data set has 1149 games, which is 81 short of the full season (or 6.6% games short of being complete). As I mentioned with the ’07-’08 data set, I hope to integrate some of these games in the future. That said, this should be a good start.

Downloading the Data

To download the data, visit the data page or use this direct download link.

Why We Need More Data

One key area to study is how things change from season to season, so putting this data together will allow us to do that. With this data, the data from the ’07-’08 season, and data now coming in from the ’08-’09 season, we’ve got 3 years of data to work with.

My current focus will be on putting a few more seasons together so that I’ve got a nice database to work with once the semester is over and some free time on my hands.

Lastly, please let me know if you find any problems with the download. Also, don’t forget about basketballvalue.com if you don’t see a game you’re looking for.

If you enjoyed this post, use RSS to get notified of new posts.

8 Comments on this post

Trackbacks


  1. Brock said:

    Do the positions in pbp files (a1-a5, h1-h5) correspond to positions. By my question you can see I am a casual bball fan (think of positions as positions and not numbers), but I am a sports play by play fanatic. Please keep up the great work! Doing a great thing keeping your data and work “open”, contrary to other NBA stat “blogs.”

    May 28th, 2009 at 10:04 pm
  2. Ryan said:

    The a1-a5 and h1-h5 elements are simply an alphabetized list (by first name) of the players. They do not correspond to a basketball position.

    I’m glad you’re finding the data useful! Let me know if you have any suggestions for improving it.

    May 28th, 2009 at 11:29 pm
  3. Brock said:

    Thanks! Before I go bonkers and come up with a way to tag each player with a position for a file, is there an easy way to do this or is necessary to get an external file and name-match your PBP to the external file? Obviously this is easy in a DB environment, but I didn’t want to create it if there was an easier way. I figure you know more than I on this one!

    Thanks again.

    May 29th, 2009 at 9:26 pm
  4. Ryan said:

    Well it’s not “easy”, per se, but it can be done using data from Doug Stats (like this file).

    Here is some perl code that you can use. It use some UNIX commands to extract positions from YEAR.doug files (taken from doug stats) and a YEAR.players file (which contains a list of players you want to get positions for). The code then prints out each player and position. It’ll be a place to start, at least.

    Also, I came up with some code that you can use to assign some naive probabilities to a player’s position (once you’ve gotten that position data using the below code). You can find it in this post at the APBRmetrics forum.

    #!/usr/bin/perl
    
    my %stats;
    my %pos;
    
    my $year = 2007;
    
    open(FILE, "< $year.players");
    while () {
      chomp;
    
      my ($player) = $_;
    
      # if player pos data not found, don't bother
      if ($pos{"$player"} == -1) { next; }
    
      if ($pos{"$player"} eq "") {
        my $np = $player; $np =~ s/\ //g;
        my $extra = 0;
        if (length($np) > 15) {
          $extra = length($np) - 15;
        }
    
        my @cs = split(/\ /, $player);
        my $cmd = "cat $year.doug";
    
        foreach $c (@cs) {
          my @chars = split(//, $c);
          for ($i = 0; $i < $extra; $i++) {
            # remove extra
            pop(@chars);
          }
          $c = join('', @chars);
          $cmd .= "|grep -i \"$c\"";
        }
    
        my $res = `$cmd|awk '{print \$3}'|head -1`;chomp($res);
        if ($res eq "") {
          print STDERR "Unable to find position data for $player ($year)\n";
          $pos{"$player"} = -1;
          next;
        }
        $pos{"$player"} = $res;
      }
    
      print "$player,".$pos{"$player"}."\n";
    }
    close(FILE);
    

    I hope this helps!

    May 29th, 2009 at 9:50 pm
  5. Brock said:

    THAT IS FANTASTIC! many thanks!

    May 31st, 2009 at 11:25 am
  6. Ryan said:

    I’m glad I can help!

    May 31st, 2009 at 4:33 pm
  7. TJ said:

    Can you explain the xy coordinates? For example, when x=0 and y=0 is that the left corner? right corner?

    August 19th, 2009 at 10:11 am
  8. Ryan said:

    They are always in terms of the offensive team, so if you’re standing behind the offensive team’s hoop, then the x-axis runs from left to right, and the y-axis runs from bottom up. Thus the far left corner on the baseline is (0,0).

    August 19th, 2009 at 10:28 am