How to Use Pitch Mix Data to Project Pitching and Hitting Performance

Let’s set the stage. It was Game 3 of the 2022 World Series, with two outs and one on in the bottom of the scoreless first. Bryce Harper stepped to the plate in front of the electric Philly crowd. With Lance McCullers Jr. on the mound and his refusal to throw fastballs to lefties, John Smoltz claimed, “I’ll be kind of shocked if Bryce is not going to the plate sitting on a breaking ball… If he stays on the breaking ball and gets the one in the middle of the plate… loud noise.” Loud noise was right.

The question remains — do we need to be a Hall of Fame-caliber pitcher to understand how pitch mix dynamics can improve pitching and hitting forecasts or can we effectively perform this type of analysis ourselves?

Specifically, we seek to answer four questions:

1. Does pitch mix data improve long term forecasts for hitters?

All else equal, is it better for a hitter’s future outlook if he produces against fastballs or off-speed pitches?

2. Can pitch mix data improve our forecasts for specific hitters against specific pitchers?

If so, this would be a DFS gold mine.

3. Does pitch mix data improve long-term forecasts for pitchers?

All else equal, is it better for a pitcher’s outlook if he succeeds with one pitch over another?

4. Can pitch mix data improve our forecasts for specific pitchers against specific teams/lineups?

Once again, this would be a DFS gold mine.

The Data

For this study, we’ll be using FanGraphs’ Pitch Value Metric. It’s therefore important to immediately note that the findings in this study could be a reflection of the math behind the metric as much as it a reflection of the effect it’s trying to capture. Further analysis will be needed on pitch mix data like K rate or CSW on certain pitches, ISO against certain pitches, etc.

FanGraphs’ metric estimates runs gained or lost against a pitch per 100 pitches for hitters (positive numbers are desired for the hitter). For pitchers, it measures runs gained (read: saved or prevented) per 100 pitches for each pitch type, so that once again, positive numbers are desired for the pitcher.

In any case, the conclusions we draw from this study will inform, beyond a shadow of a doubt, the way in which we use this pitch mix data from FanGraphs.

Using Pitch Mix Data to Help Project Hitters in the Long Run

Imagine two players, Player A and Player B, who both had a 110 wRC+ in 2022. The question we’re hoping to answer here is whether we can project one over the other in 2023 by looking at their pitch mix data. For example, perhaps Player A was great against fastballs but struggled against off-speed pitches while Player B was the exact opposite.

At the heart of this question is the concept of stability. The stability of a metric refers to the likelihood of performance in a metric continuing in the future as it has in the last. Take performance against fastballs as the example. If FanGraphs’ pitch value metric for fastballs (for hitters) is stable, then we can say that players who have performed well against fastballs in the past are expected to continue to do so. If the metric is unstable, then that statement can’t be made.

You can see here how it’s possible for pitch mix data to improve long-term forecasts. If performance against fastballs is stable and performance against off-speed pitches is unstable, then we could project Player A ahead of Player B in 2023 since his success would be more likely to continue while his struggles would be less likely to continue.

There’s good news and bad news in the results. Good news first — we do see a difference in the relative stabilities. It’s actually performance against off-speed pitches that is nearly twice as stable as performance against fastballs. I don’t know about you, but I find this to be counterintuitive. Counterintuitive findings are the best findings!

However, this difference in stability isn’t powerful enough or severe enough to meaningfully improve future projections (it’s OK if you don’t understand the R output, I’ll explain the findings):

Text

Description automatically generated with medium confidence

Notes — I’m using wRC+ for this study so that I don’t have to account for potentially changing park factors. Also, wRC+ is more in line with the Fangraphs metric.

The best model for future wRC+, with the potential inputs being wRC+ from the previous season as well as performance against fastballs and off-speed pitches from the previous season, doesn’t include past performance against fastballs. You could even make the case that performance against off-speed pitches is unnecessary. This result does make some intuitive sense, since we know wRC+ stabilizes between 300 and 400 ABs, which all hitters in the sample reached or exceeded in both 2021 and 2022.

OK, so unfortunately pitch mix data can’t meaningfully help our long-term hitter forecasts, but can it help us evaluate individual matchups?

Using Pitch Mix Data to Analyze Individual Hitter vs. Pitcher Matchups

Intuitively, it makes sense that someone like Player A from earlier (great performance against fastballs, poor against off-speed) would project better against a pitcher who throws a ton of fastballs than one who throws mostly off-speed pitches, right? Well, this is only true if the metric is stable. So, let’s check the stabilities of performance against each pitch.

Table

Description automatically generated

Higher r^2 values mean the metric is more stable from year to year. There are a number of interesting, actionable findings here. First of all, hitting performance against traditional fastballs (4-seamers and 2-seamers) is stable, but performance against sinkers and cutters is completely unstable. Shockingly, performance against changeups is twice as stable as performance against sliders despite the fact that players in the sample saw about twice the number of sliders as changeups. In other words, performance against changeups isn’t just more stable year to year, but it also stabilizes faster than performance against sliders.

Finally, the coolest of all seems to be that performance against the collection of all off-speed pitches (sliders, curves, changeups, splitters) is extremely stable even though performance against the collection of fastballs (traditional fastballs, cutters, sinkers) is less stable than performance against traditional fastballs.

Here’s my first promise to you all — we will have pitch mix data at FTN this season including performance against the collection of off-speed pitches. You may as well sign up for the Diamond Subscription right now for 20% off with promo code BLICK.

We can summarize these actionable findings like so:

Target hitters who thrive against traditional fastballs against pitchers who throw a lot of them, but ignore the pitch mix data against pitchers who use non-traditional fastballs like sinkers and cutters.
Target hitters who thrive against changeups against pitchers who utilize changeups.
Above all else, look more at how a hitter does against the collection of off-speed pitches than how he does against any individual pitch. There’s likely a lot of signal here about his approach (does he sit fastball and react to off-speed or vice versa?).

One final note here is that it’s significantly easier to perform this analysis on hitters because they all see a similar distribution of pitch types. Therefore, they’re all seeing a similar sample size of each pitch type and sample size can’t skew the data.

Using Pitch Mix Data to Help Project Pitchers in the Long Run

You know the drill by now, let’s check how well we can predict future SIERA (the best predictor of future of ERA) using pitch mix data and past SIERA:

Text

Description automatically generated

There’s a comical finding here right off the bat (no pun intended). The estimated coefficient for fastball performance is positive, which means that if a pitcher’s fastball was effective in the year prior, he’s actually expected to have a higher SIERA in the following year. This is because for pitchers, traditional fastball performance is extremely unstable. Another counterintuitive finding!

Note however, that this is a pretty meaningless finding for long-term projections, as it barely changes the projection — it’s just not a strong enough effect. The model would be equally powerful if we dropped past fastball performance from the model. Like wRC+, SIERA has already stabilized once we get to this sample size.

Using Pitch Mix Data to Analyze Individual Pitcher vs. Hitter/Team Matchups

There’s a clear trend in the stabilities of performance with certain pitches — what was most predictive for hitters is typically least predictive for pitchers. Traditional fastballs are the obvious example, but let’s take a look at everything from the pitcher’s perspective:

Table

Description automatically generated

Note — I removed all instances of a pitcher throwing less than 5% of a single pitch from the individual regressions.

Fastballs and sinkers flip-flopped. Changeups and sliders flip-flopped in a major way. The collection of all off speed is now less predictive than performance of curves and sliders. Why is this?

Josh Hermsmeyer popularized air yards analysis in the NFL, and one of his great findings was that average depth of target “belongs” more to the receiver than a quarterback. This is similar to what we’re finding here.

Slider performance belongs to the pitcher, while traditional fastball performance belongs to the hitter. Put another way, the hitter is more in control of the outcome of a fastball while the pitcher is more in control of the outcome of a slider. As someone who threw 100% sliders in full-counts my senior year of high school with great success, I feel vindicated! How cool is this?

We can summarize these actionable findings like so:

Target pitchers who can choose between fastball types (like Aaron Nola). The traditional fastball will play well against hitters who struggle against the pitch, but then he can avoid throwing it to hitters who excel against the pitch.
Target pitchers with great breaking balls against hitters who struggle with off-speed pitches.
Target hitters who excel against off-speed pitches against pitchers who either struggle with breaking balls or have succeeded in the past with unstable off-speed pitches like changeups and splits.

One final word of caution here is that the sample size of pitchers who throw a lot of splitters is so small that we can’t reliably conclude much about the stability of split performance. Kevin Gausman, for example, has shown an ability to consistently perform well with the pitch.

Pitch Mix Findings

We originally sought to answer four questions. Let’s review how we did with each of them:

1. Does pitch mix data improve long term forecasts for hitters?

Not really, as long as the hitter has a reasonable sample size already, pitch mix data adds little, if anything, to the projections.

2. Can pitch mix data improve our forecasts for specific hitters against specific pitchers?

Yes! We now have a great understanding of who the performance of a pitch “belongs to,” so we can use pitch mix data to maximize predictive power of individual matchups.

3. Does pitch mix data improve long term forecasts for pitchers?

No, once again, the best predictors of pitching performance have already stabilized once we have adequate sample sizes of the pitch mix data.