You’ve heard it before. When Cliff Lee has a rough outing, either T-Mac, Wheels, or Sarge will probably make a comment along the lines of, “You know, sometimes he can go out there and throw too many strikes, and he gets hurt. The hitters attacked him today.” While I recognize the guys up in the booth aren’t exactly known for astute analytical commentary, I’d like to put the numbers to the test on this one: is it possible for Lee to throw too many strikes?

Utilizing Baseball Reference’s play index, I gathered game-by-game data for each of the past four years of Lee’s career (before 2008, he wasn’t exactly the Cliff Lee we know and love now). This gives us a good-sized sample of 124 games. In each of them, I recorded innings pitched, earned runs allowed, pitches, and strikes.

In general, throwing strikes is a good thing. Pitchers who can stay around the plate typically see greater success than those who can’t. That’s why we expect to see an inverse relationship between strikes thrown and earned runs allowed: as a pitcher throws more strikes, we’d usually expect him to give up less runs. In Lee’s case, the scatterplot looks like this:

The black line running through this data is an approximation of the relationship between these two variables, and it seems to confirm our most basic assumption that strikes are typically good. While the correlation coefficient of -0.45 tells us that this least squares regression line fits this data relatively well, we have a problem here. Total pitches thrown affects the number of strikes, and total innings pitched affects earned runs allowed. These outside variables can be controlled for by a) converting strikes thrown to strike % and b) converted earned runs into in-game ERA. Yet when we do this, the relationship gets significantly weaker:

Our correlation coefficient drops to -0.15, which is pretty much useless. Where can we go from here?

The “too many strikes” theory seems to suggest a non-linear pattern. That is, more and more strikes are good until a certain point. At that point, additional strikes begin to hurt the pitcher and we expect to see more earned runs allowed. This can be modeled by a polynomial regression equation. Using earned runs adjusted for innings pitched and/or strike% didn’t yield any significant results, but using the raw totals of ER and strikes thrown produced something of note. Using this method to the second order, we get an estimated equation of:

f(x) = 0.004x^2 - 0.6744x + 30.206

where x is strikes and f(x) is earned runs allowed. The R-squared value is .24, meaning that 24% of the variability in earned runs can be explained by knowing strikes thrown. The graph of this function looks like this:

We can see that somewhere in the mid-80s, the function predicts that additional strikes will result in Lee allowing more earned runs, on average. Through basic differentiation, we can find the exact value where this change occurs. Derivating the function gives us the marginal impact of an additional strike thrown:

f ’(x) = 0.008x – 0.6744

By substituting 0 in for f’(x) and solving for x, we get 84.3. That is, anything beyond 84.3 strikes in a given game will result in more earned runs allowed, on average.

While these results are certainly interesting, it would be foolish to take this beyond face value as polynomial model suffers from some pretty huge setbacks. Both innings pitched and total pitches thrown aren’t controlled for here. The R^{2} value, while not awful, isn’t so great either. And most importantly, Lee has only thrown more than 84.3 strikes on 12 separate occasions in the last four years. Extrapolating meaning from such an extreme value is dangerous in regression analysis.

So, I believe it’s safe to say that in practical terms, it’s impossible for Lee to throw too many strikes. Of course, throwing strikes at the wrong time or leaving a curveball hanging over the plate will certainly hurt him, and I don’t think that can be disputed. But over the course of a game, it’s hard to make a rational case that Lee could throw too many strikes—the evidence simply isn’t there.