This offseason, I will be beginning a series I am calling ‘Data Monster Says’. This series will cover a variety of topics and will essentially be a catch-all for all of my Data Monster-related musings. The Data Monster is my brainchild and can be found here. The DM – as I’ll call it going forward- has a series of metrics for both pitchers and hitters that try to break out whiffs, swings, and balls in play results into two components (location and player impact). If you want a more in-depth breakdown of everything included there, check out my Intro To The Data Monster piece. Note that the screenshots in that piece are a bit outdated as I am constantly reworking the display and have added a few new features over the course of this offseason. There is much more to come from this standpoint, which we will reveal over the offseason.
Why Does It Matter?
Whenever you debut new data, the first question you are often asked is ‘Why Should I Care?’. This is an extremely reasonable and fair question and one that I have slowly been trying to tackle since developing this data back in 2019. Thus far I have mostly tried to look at everything from an individual player level, you can check out my GPS Location Reports if you want to see more of this (here is one I wrote on Logan Webb). However, my goal with this piece is to look at the data from a league trend standpoint.
Mainly, I wanted to look into what predictive capabilities the DM has, and how we can use it to decide which players to target or avoid in 2022. To do this I devised a simple study*. I took all pitchers who threw at least 100 pitches in a season since 2015 and then went on to throw at least 50 innings as a starter in the following season. Then using standings gain point (SGP) numbers from NFBC leagues, I calculated the “Value” for each player. This value was calculated using only three metrics – ERA, WHIP, and Ks. I removed wins from this calculation as they are too heavily correlated with team strength, and since I strictly wanted to look at DM metrics, team strength cannot be discerned for the following season. Additionally, since I wanted to take innings pitched (IP) out of the mix, I prorated every pitcher season to 150 innings.
*NOTE: There are a ton of different ways to design this study, but this is the way I chose.
So now that I had my sample, I could look into what the prior season’s DM metrics told me about a pitcher’s fantasy value for the year ahead.
The most common way to measure “impact” or predictiveness is to look at the individual correlations between each metric and the Value metric from my study. The chart below shows all of the individual correlation values for all of the DM metrics, showing the correlation to next season’s value. To better understand the stat acronyms in the below chart, you can refer to the links above. We’ll wait while you refresh yourself on those stats.
|Correlations By Stat|
|To Next Season’s Value|
What we can see from the correlations above is that most of the DM metrics do not have strong correlations with next season value. However, I do think we can still glean a few interesting concepts from this.
Looking strictly at the whiff metrics (xWhiff and In_Whiff) we can see that location does not matter nearly as much for next season’s value as ‘stuff’ does. This is not all that surprising based on similar research in the fantasy baseball community, but it does tell us that the best strength a pitcher could possibly have is to generate whiffs.
As for the remainder of the metrics, all of the correlations seem to move in the direction I would expect. For example, IZ, which is essentially a measure of generating extra called strikes, has a negative correlation with the next season’s value. Some other relative surprises are the relative lack of importance of rfCommand as well as both of the wOBA metrics.
However, there is one notable exception to the “expected” results and that is the In-Zone xSwing Rate. It’s actually one of the more heavily correlated metrics to value, but I would have expected the direction to track with IZ. Based on my prior assumptions, I viewed a low expected swing rate on pitches in the zone as a measure of how well a pitcher attacked the corners. Similar to the above point about IZ, a low expected swing rate should lead to lots of called strikes. Yet, as the expected swing rate rises, so does next season’s value. So the question is why?
This question is a huge part of why I love data and why often what you are measuring is not always what it seems. It appears that In-Zone xSwing is actually a measurement of control*.
*Quick aside on Command vs Control: Control is the ability to throw pitches in the strike zone/limit walks/etc. Command is the ability to throw pitches to locations with intent.
So how exactly is In-Zone xSwing a measure of control? Well, let’s take a look at some splits for xSwing by count.
|xSwing By Count|
While the changes may not seem massive with the exception of 3-0 counts. xSwing drops a lot when the pitcher falls behind in the count and also rises significantly in two-strike counts. So pitchers who constantly put themselves into plus counts tend to have high xSwings. Then in return, these pitchers will have lower walk rates, which help to lower their WHIP and higher fantasy value.
While I have been able to find metrics that do have some predictive value, I think their usefulness is better highlighted by looking at things in a slightly different way.
After evaluating the data, I began to realize that correlation was not the best way to be looking at this. In fact, the raw magnitude did not matter nearly as much as I would have believed. So instead I decided to look at percentile splits. So for each metric, I looked at the next season’s fantasy value for pitchers who were 50th percentile or better and compared that to those who fell in the bottom half of the metric. Below are the results for the splits.
|Next Season Value By Split|
|Upper 50%-tile or Better|
What we see is a couple of interesting trends begin to emerge. Much like in the correlations we see that In_Whiff, In Zone xSwing, and Stuff ERA create the largest differences. Much like we saw above, a lower percentile ranking for In Zone xSwing is better for pitchers. When building the percentile values, I operated with the assumption that a lower xSwing was better. However, as we have seen from earlier in the piece, this is not necessarily the case for fantasy value.
Several other metrics like rfCommand, xlwoba, and In_woba do not seem to be massively useful for determining pitchers to target. However, we begin to see a bit more value with the two out-of-zone metrics as well as xWhiff and IZ. The extremely interesting part of this to me is that for IZ (which measures the ability to get hitters to take pitches in the zone), higher percentile rankings (more takes) are better for future value even though we have seen the opposite from the In-zone expected swing rate metrics. This is yet another example of why we cannot make assumptions that what a given metric was intended to measure is actually what it is measuring.
So Now What?
So the logical question from here is how exactly do I use this data going forward? Now that we have some additional information about what metrics are impactful, you can use the DM and all of its various metrics to help you to decide between two pitchers in the same tier. For a quick and dirty simple check to compare two pitchers, use In_Whiff, StuffERA, and IZ.xSwing percentiles. Additionally, by combining these metrics and the different splits with value, we can build out groupings of pitchers who we would expect to outperform their non-group counterparts. In fact, going forward, I will be doing just that in this article series, combining splits and evaluating the pitchers in 2021 it would have told you to target and avoid as well as evaluating if it has usefulness for the 2022 season. However, first I will be writing a similar article to this one evaluating the DM metrics for hitters to see which ones best predict next season’s fantasy value.