Dec 9 2009

Estimating the Impact of Aging on 3FG%

The impact of aging on player performance interests me, but I haven’t really done anything useful in this area. Baseball’s recent debate (really fantastic stuff) on the topic of player aging curves has motivated me to take a closer look at the affect of aging on 3pt shooting.

Ed Küpfer’s three year old post on aging is the best resource I know of on the topic, but I’m sure he’s got something more useful than what’s in that post. That said, I’d like to specifically estimate an aging curve for the 3pt shooting abilities of NBA players.

The Data

Using Basketball-Reference.com‘s excellent Player Season Finder, I collected 3pt shooting data for players that attempted at least two 3pt shots during any season from 1999-00 to 2008-09. This resulted in a data set containing a little over 3300 player seasons.

The Model

Individual 3pt shooting performance is more than a function of player ability and age, but in this case I’m specifically interested in estimating the affect of age. Further, I would like to estimate an aging curve for each player under the assumption that each player has their own unique aging curve.

To do this, I’ve chosen to use a multilevel model that allows the intercept and slope to vary by player, as this model allows for the estimation of a unique aging curve for each player. The aging curve is assumed to be of the form \alpha + \beta_{0}(age) + \beta_{1}(age^{2}).

The Results

The model fit estimates the the average player’s peak age in 3pt shooting ability is 29, and the p-values for the \beta_{0} and \beta_{1} coefficients in the model form listed above are both less than 0.01.

The estimates for each player are listed in the following spreadsheet: 3FG% Aging Estimates

In this spreadsheet you will find each player’s estimated peak age along with 3FG% estimates for each player from age 20 to 40.

With the exception of Andre Miller, most peak ages are plausible. For a player like Miller, the model suggests his 3pt shooting ability has been declining since starting his NBA career.

Individual Players

Below you will find graphs of estimated aging curves for some player’s 3FG%. In addition to the estimated aging curve, you will see points plotted that represent the actual 3FG% for the player at the specified age.

LeBron James Estimated 3FG% Aging CurveKobe Bryant Estimated 3FG% Aging CurveSteve Nash Estimated 3FG% Aging CurveVince Carter Estimated 3FG% Aging Curve

There is Uncertainty…

These are just estimates, and it is important to point out that uncertainty exists around these estimates. I’ve tried to simulate confidence bounds for this model without success, but even though I’m not representing the uncertainty, keep in mind that it does exist.

Improving this Model

The debate in the baseball community suggests it’s incorrect to simply assume players have this type of aging curves. Although this type of curve appears to be a decent approximation, perhaps there are better ways to estimate the impact of a player’s age on 3FG%. Other than trying to control for other explanatory variables, trying to fit other model types might prove worthwhile.

If you enjoyed this post, use RSS to get notified of new posts.

17 Comments on this post

Trackbacks

  1. The Tradeoff Between Usage and 3FG% wrote:

    [...] last post on the relationship between age and 3FG%, Brian Tung brought up a topic that I’ve wanted to look at: … I’m not even sure that [...]

    December 10th, 2009 at 6:28 pm
  2. The Relationship Between Age, Usage, and 3FG% wrote:

    [...] my last two posts I have taken a look at modeling a player’s 3FG% based on their age, and I’ve also estimated the relationship between usage and 3FG%. In this post I would like [...]

    December 11th, 2009 at 6:33 pm
  3. The Mid-Afternoon Milk Mustache, featuring Yao’s words of wisdom | Stacheketball, an NBA Blog wrote:

    [...] Pyramidal: Ryan responds to a comment on his previous post and takes a look at the relationship between usage rates and 3 point percentage. [Basketball [...]

    December 11th, 2009 at 6:43 pm

  1. Brian Tung said:

    I gotta say, I’m with those who say that there’s no rigorous justification for assuming that 3FG% is quadratic with age. I’m not even sure that history is necessarily a good guide here, as players shoot those treys under fairly different circumstances at different stages of their careers. Who’s to say that Ariza, for example, is going to get the same looks this year as one of the primary options as opposed to last year as one of the secondary options?

    Might be more interesting to look at FT% as a function of age. One could make a plausible case that history is a better guide there.

    December 9th, 2009 at 4:16 pm
  2. Ryan said:

    You bring up very good points, and that’s why I wanted to start with this quadratic assumption. This certainly isn’t perfect, but one thing I want to know is how good of an approximation this is.

    From there I want to think about how to best incorporate things that help estimate a player’s 3FG%. We know that 3FG% is not merely a function of ability and age, as your example with Ariza shows. The stuff you point out is what I’d like to answer, such as, “How can we estimate this?”

    December 9th, 2009 at 4:30 pm
  3. Brian Tung said:

    The problem is, I’m not sure I could even come up with a workable methodology for saying how good an approximation (a better way to say it would be “how good a model”) a quadratic dependence is. Precisely because the actual measured percentage depends on so many things besides some notional Platonic “three-point shooting accuracy,” validating such a model is fraught with significant peril.

    Let’s think about what makes the three-point shooting percentage rise at the start of a career: better shot selection, improved shot faking, increased array of moves with which to set up the three-point shot as a decoy, etc. And, conversely, what makes the percentage drop toward the end of a career: decreased elevation on the jump, decreased quickness, that sort of thing.

    In short, we see tactical improvements on the front end, and physical degradation on the back end. I see no reason at all why those would vary quadratically, or even be symmetric (as a quadratic model must be, centered on the peak). My intuition suggests trying to model the rise and fall separately. For instance, one might try something along the lines of

    (a + bt) exp (- ct²)

    Of course, you still have doubts as to whether you have enough players to rigorously test that, especially as you’d now have three degrees of freedom. And you still have the problem of trying to derive the “inherent” shooting ability of the player from his measured accuracy. One thing you might try is to first calculate the three-point shooting percentage of the teammates of players, and use that to normalize the shooting percentage of the player you’re interested in. In other words, how much better does the player shoot compared to other shooters playing with the same (or similar) teammates? I’m not sure that players mix freely enough to make this workable, but it’s a possibility.

    December 9th, 2009 at 5:47 pm
  4. Ryan said:

    Good stuff Brian.

    If we can agree that players peak at some point, what is your intuition on trying to fit some sort of spline to the data, where the peak is where the two curves are connected?

    December 9th, 2009 at 6:03 pm
  5. Brian Tung said:

    My intuition is that still leaves you with at least three degrees of freedom in any particular model: the initial 3FG%, the career age (years in league) at which peak 3FG% is reached, and the peak 3FG% itself. And there are also lots of different curves you could splice together at the peak; the number of models is the product of the numbers of the two halves. Trying to validate any individual one would be a complete nightmare, because there are so many possibilities that one of them is bound to fit well, regardless of whether it has any inherent validity.

    That said, I’d go ahead and try to fit something simple and specific. Characteristics for a candidate model would be:

    1. Concave downward initially, positive slope.
    2. Reaches peak, still concave downward.
    3. Reaches inflection point, then concave upward subsequently.

    Pick something you like and try it out. But try it out on complete careers first. LeBron in particular seems like a suboptimal choice. That small a career trajectory could be anything.

    December 9th, 2009 at 6:43 pm
  6. Ryan said:

    That makes sense. Another reason I chose this basic model is because I could fit it in multilevel form.

    As per your suggestion: my calculus isn’t too fresh, but wouldn’t we also want the function to be concave downward at the maximum?

    I also think I need to grab more data, since I have less than 10 years to work with. That should allow for some full player careers.

    December 9th, 2009 at 6:58 pm
  7. Jon said:

    Nash’s 3-ball longevity is superior to those for Kobe, Lebron, and Vince because he doesn’t use much body spring (read, spring off the butt) to launch his J. He uses a push technique that involves a lot more arm push and wrist snap. His body tension (body spring) is vastly inferior to those of the other 3, which makes launching long shots much more problematic. He (and many other less athletic players) has learned to compensate by using a strong arm push and wrist snap. Try out his shooting style at the local gym. You will see how accurate it can be even at long distances. But you should also notice that Nash cannot really shoot the 3 ball unless he is almost entirely open (this is because his shooting style depends on a forward lean), while Lebron, Vince, and Kobe regularly hit the long ones with defenders draped all over them. This ability stems from their superior body tension/athleticism. However, as the more athletic player ages, the legs begin to go, and with it goes the body tension that makes for their shooting accuracy. Nash can continue shooting (wide open) 3s with great accuracy until he retires. This kind of statistical analysis is quite interesting. But the differences in the four players’ 3-ball accuracy curves is best undertood by reference to specific shooting styles and the biodynamics that support them.

    December 9th, 2009 at 8:34 pm
  8. tyler whitehouse said:

    this is stupid. why do you use a quadratic model? you have nowwhere near enough data to fit any kind of a model.

    you are an idiot.

    December 9th, 2009 at 9:26 pm
  9. Ryan said:

    Tyler, thanks for the insight. Do you have any references that would help me learn more about the topic?

    When you say I have “no where near enough data to fit any kid of model”, what exactly do you mean? Are you specifically referring to a model with a quadratic, or are you suggesting no sort of regression should be fit with this amount of data?

    You’re clearly a smart guy, so I’d like to learn more if you’re willing to provide some references. That said, a little expansion of your reasoning behind why this is stupid would help. :)

    December 9th, 2009 at 10:30 pm
  10. Malcolm said:

    Taking just these 4 examples, there doesn’t seemt to be much support for the idea that 3pt shooting is quadratic in age. Kobe’s shooting appears to be slightly linearly rising with time, with a couple of early outliers, VC’s is showing a slight linear decline, Nash’s was showing a linear rise before the last data point, which could be an outlier, and LeBron doesn’t have enough data points to conclude much of anything, but before the most recent data point his shooting was on was a strong downward linear trajectory. Linear regression seems like it account for these data as well as quadratic.

    December 9th, 2009 at 11:26 pm
  11. tyler whitehouse said:

    first of all, vince carter’s looks periodic. the only one which event has any type of consistent structure if you disregard the outliers is kobes.

    what i mean by you don’t have enough data, is that well frankly, i don’t think there is any point trying to fit a quadratic curve to five or ten points, which is basically the size of the data set you have for an individual player.

    what you ought to do instead, is to aggregate the data for multiple players somehow, and then try and come up with a model to depict the average behavior of the players. because you are really trying to make a statement about a population. so, you should try to moel the population rather than modeling the individuals.

    coming up with a curve for modeling the behavior of an individual player is futile. just look at the types of errors and the percentage of outliers for your model for the given examples. it appears that for each of the players, there is a tendency for the data to oscillate, i.e. have alternating trends. there is absolutely nothing to suggest that the data is quadratic, any sort of heuristic or intuitive reasoning about the phenomena aside.

    are the years when carter’s percentages drop the years after big contract signings? this would not surprise me. all his numbers tend to drop in such situations. or at least that is what i’ve read places. trying to establish a theoretical basis for a model with concavity etc, really leaves quite a bit out in terms of player motivation yadda yadda yadda. it would be better to let the data speak, and again, let the actual population speak rather than try and extrapolate the numbers for a given player.

    December 10th, 2009 at 12:29 am
  12. tyler whitehouse said:

    ps sorry about the idiot comment, that was rude. i am an unhappy person ;_+)

    December 10th, 2009 at 12:31 am
  13. Ryan said:

    Just to be clear on how I’m fitting this data, I’m using the makes and misses for each player, and using the age and age^2 as the predictors. So Vince Carter, for example, has 1,173 makes on 3,102 attempts. The points in the graphs are merely the sample proportions for the player at each age to illustrate what the model would suggest compared to what actually happened. I too would agree that it would be pointless to try and fit a model to these sample proportions.

    I’ve now compiled the data from 1989-90 to 2008-09, and this produces the following multilevel model fit:

    Pr(Make) = logit-1(-2.58 + 0.12(age) – 0.002(age^2))

    Where the individual player intercepts and coefficients on age and age^2 are assumed to come from some normal distribution.

    This allows me to look at other players that have more data points to work with:

    I hope this better illustrates exactly what I’m doing.

    December 10th, 2009 at 12:47 am
  14. Brian Tung said:

    @Ryan: Only if it’s twice differentiable. Otherwise, it might just be flat, for instance. But yeah, mostly that was just reinforcing the point.

    December 11th, 2009 at 6:03 pm