Aug 6 2009

What I’ve Learned Over the Past Year

One year ago today I published my first post, so I wanted to go through the past year and see what stuff sticks out, for better or worse. Regardless of the end result of a lot of my work, I know I sure have learned a lot along the way.

If you’ve been reading for some time, then I hope you’ve enjoyed the journey with me so far. If you’re new, then I hope you enjoy this tour through the past year.

• I started off on a kick wanting to collect new data. New data is vital for further understanding, but when you don’t have a good foundation to work with then new data is more noise than help. I realized this, which is why I’ve not been collecting much new data, even as important as that is.
• Speaking of new data, I’ve been slowly making play-by-play data available in a CSV format so that it is easy to work with. The data files also provide data, like shot locations, that has been hard to get in the past.
• The rough outline of a theoretical model for the probability of winning a basketball game at the team, unit, and individual level gave me a lot to think about in terms of how a team comes together to score points.
• Trying to figure out substitution patterns is an interesting problem, and one step in this direction is looking at the average number of starters in the game based on time and by lead. The joint problem of time and lead is still left to be solved.
• In November I was inspired to put together some bridge jumper projections for the rest of the NBA season, and looking back I don’t really see how it’s very useful. It was fun to put together, but I don’t think I’ll be doing anything like this in the future.
• Figuring out how one unit of players will fare against another is important, and I took a theoretical and practical look at trying to assign efficiency ratings between two competing units.
• Adjusted plus/minus is something that took me a long time to fully wrap my head around. My naive points added contribution helped me figure it out. My modeling skills have come a long way since then, but at that time working with things like grouped data and the like were not second nature to me, but now they are. I think we can throw points added in the trash, but it was an important step in the direction of figuring out adjusted plus/minus.
• Using numerical methods to adjust statistics is controversial (in the NBA, at least), but these methods help in college sports where there is a large disparity between level of competition. I applied some methods to 3pt shooting, efficiency ratings, and rebounding rates to show how we might use these methods in the NBA. You don’t seem to get much at the team level, but I think there is more to be explored at a unit and player level.
• Basketball on Paper’s skill curves are an informative look at how a player’s efficiency relates to their role in the offense, and you too can create them yourself (even though the exact way Dean created the curves is still locked away in his vault).
• I would like to understand defense better (who wouldn’t?), and my post on the defensive four factors and shooting versus defensive efficiency were basic steps at that. This isn’t surprising, but shooting is the most important predictor of defensive efficiency.
• We want to know how players fit together and what their production would look like as a unit. I took a look at shot distributions and shooting percentages, but there is much more to do in this area.
• My post on referee efficiency ratings turned out to be one of the best learning experiences all year. Refs are always a touchy subject, and I didn’t provide evidence to make some of the statements I made. As one reader pointed out, referee stuff is “very hard to do right”. He’s right, and the discussion that followed based on this post was very informative. I feel that I’ve yet to acquire the tools needed to do a ref analysis real justice, so I’ve left it alone for now.
• Plays are the basic building blocks of basketball, and there is much to explore: how plays end, how a play is likely to end based on how it started, and how long it takes for plays to end are just a starting point.
• Multilevel and hierarchical models seem to me to be very applicable to problems we deal with in analyzing sports data. We tend to know something more about these players, units, or teams that we analyze other than simply the sample of data we observe. I’ve finally started to piece these types of models together and have applied them to a couple of NBA data sets: 3pt shooting statistics, offensive rebounding rates, and defensive fouls drawn and committed.

So that’s my best summary of the past year. It’s not a complete rundown of every post, so make sure you check the archives if you’re thirsty for more.

My Future Focus

While working on a presentation for an upcoming talk about sports rating, I realized that I’ve yet to really try to predict anything. When we talk about why one model might be better than another, one aspect has to do with predictive capabilities.

Even though fitting data helps answer some questions, my focus over the immediate future will be on measuring prediction capability, as I want to get an idea of what models of past data tells us about the future. I have a lot of ideas that I hope I will get to sooner rather than later. ðŸ™‚