Is One Lineup Better Than Another?
- 8 Comment
One part of Wayne Winston’s new book Mathletics that I didn’t really like was the way he compared raw lineup data to determine if one lineup is better than another. After thinking about it more, I think the real reason I don’t like his method is because he compares the lineups’ net points per minute (pts/min).
I’m a big proponent of points per possession (pts/poss), so I wanted to look at how we could compare lineups using raw pts/poss data. I think it is important to put emphasis on raw, as we’re not trying to control for strength of opponent, home court advantage, etc. Although these things have a real impact on the way our data is generated, the goal is to maintain simplicity while still being useful, just like the method Wayne proposes in his book.
Calculating the Difference Between Two Lineup’s Net Pts/Poss
As long as you have the data, comparing the difference between two lineups’ mean net pts/poss isn’t hard (I’m using hard here relatively, of course; for my mom, this would be really hard). That said, it helps to have something that makes the calculations easy. Therefore, I have created an R function compare_lineups() that you can find in compare_lineups.R.
This function takes four arguments:
- l1.o: vector of points scored on each offensive possession by lineup #1
- l1.d: vector of points allowed on each defensive possession by lineup #1
- l2.o: vector of points scored on each offensive possession by lineup #2
- l2.d: vector of points allowed on each defensive possession by lineup #2
Using this data, the compare_lineups() function calculates and reports the following:
- The mean and standard error for the difference in the lineups’ mean net pts/poss
- A 95% confidence interval for this difference
- The z-score of the difference and estimated probability lineup #1′s mean net pts/poss is greater than lineup #2′s mean net pts/poss
This function also returns these statistics in the form of a list. This allows you to do cool stuff like graph the plausible values for the difference between each lineup’s mean net pts/poss.
Application to the Example From Mathletics
To illustrate his method, Wayne gives an example of how he compares lineups on page 225 of Mathletics. In his example, he compares two Cavs lineups from the 2006-2007 season. The end result of his method is that we estimate there to be a >99% chance that the superior lineup has a higher mean net pts/48 minutes than the inferior lineup.
With compare_lineups(), we can now compare the lineups’ mean net pts/poss. To do this, you’ll first need to load the code in compare_lineups.R. Once loaded, you can run the following command to compare the lineups:
res <- compare_lineups(c(rep(0,39+45),rep(1,2+5),rep(2,29+37),rep(3,2+5)), c(rep(0,52+42),rep(1,3+2),rep(2,31+26),rep(3,6+1)), c(rep(0,87+30),rep(1,15+4),rep(2,66+23),rep(3,9+2)), c(rep(0,29+80),rep(1,3+4),rep(2,18+70),rep(3,9+18)))
Running this code will produce the following output:
For Lineup 1 – Lineup 2:
–> Mean: 28.5
–> Std Err: 15.3
–> 95% CI: (-1.5, 58.5)
–> Z-score: 1.86
–> Pr(L1 > L2): 0.9686
This output shows us that the estimated difference between the lineups’ mean net pts/poss is 28.5 pts/100 poss with a standard error of 15.3 pts/100 poss. A 95% confidence interval for the mean difference is (-1.5, 58.5) pts/100 poss, which means we have 95% confidence that the true mean difference is somewhere in this interval.
We’re specifically interested in the probability that the mean net pts/poss of lineup #1 is greater than the mean net pts/poss for lineup #2 (aka a one-tailed test for that inner stat nerd deep inside of you), so the z-score of 1.86 allows us to estimate that this probability is 0.97. In other words, we come to the conclusion that lineup #1′s mean net pts/poss is statistically significant from lineup #2′s mean net pts/poss.
We can also see this visually with a graph. In R,
will generate the following graph of the difference in the lineups’ mean net pts/100 poss:
In this case we arrive at the same conclusion that lineup #1 is better than lineup #2. Depending on the question you’re trying to answer, like determining which lineup you should play in a given situation, I think this really just provides us with a good starting point. It is important to try and understand why this is showing up in the data, as we want to ensure that, for example, the lineup with the better data isn’t showing this advantage simply because they’re playing inferior opponents.
My hope is that this code will help make it easier to compare lineups on a per possession basis, although you still have to go through and extract the data to use it. This is still just a starting point for comparing lineups, but it does give some evidence with which we want to dig deeper to understand what makes one lineup better than another in a given situation.