As a former college pitcher who was banished to the bullpen partway through his career, I have long had a fascination with relief pitching. As my focus shifted from player to analyst, I realized that this was a truly misunderstood portion of the game. On the scouting report for almost any high-end SP prospect, you will see a comment along the lines of “Could be a high impact RP”. This comes for almost pitcher with an elite fastball and at least one good off-speed pitch.
However, life does not always work out like that. Every season, a new random reliever pops up and becomes an elite closing option or a former journeyman becomes a star overnight. For every Josh Hader, there are countless pitchers who go the route of a Kirby Yates. They toil in anonymity for years until they are given their shot at the big time. However, just as quickly these pitchers fall off and once again find themselves on the precipice of being forced into early retirement.
A few years ago I set out to better understand relievers, what makes them successful, and if I could predict which pitchers could be successful in a high leverage role.
For me, the first step in model building is answering one simple question: What am I trying to predict/explain? Overall, this can be a complex question but in terms of simple explanation, this will be what is the dependant variable. In a model trying to predict power-hitting potential, this could be home run total or ISO for example. So this meant I needed to develop a measure of reliever value.
There were a couple of ways that I could have gone with this but I decided to create my own. The most likely options were WPA and WAR. WAR for relievers is incredibly flawed mainly because it is a context neutral stat. Relief pitching value is heavily contextualized and as a result, WAR would not work for me. WPA or win probability added, does a good job of factoring in context to a situation. However, I felt that it happened to add too much value to closers and was factoring in that portion of the situation too heavily. So this led me to my own metric: ADREIP.
ADREIP is a long name that stands for Adjusted RE24 per Inning Pitched. It is comprised of two components RE24 and gmLI. RE24 is a metric that takes the Base-Out states of each scenario and credits a pitcher for his contribution. For example, if a pitcher inherits runners on first and second with no outs, we would expect 1.373 runs to be scored in that scenario. If they get the next hitter out, then the expectation 0.908. The pitcher would get credit for the difference of 0.465 runs. To read more about it check out this link. gmLI is the average leverage index for which the pitcher was brought into the game. I liked to use this because there are scenarios where a pitcher’s average leverage index could be inflated due to his own doing. I wanted to work with a number that showed a manager’s trust level. One final adjustment made to this value is an adjustment to the league average for relievers. Each season the gmLI for relievers is a little over 1. So I wanted to adjust for this context. So a gmLI value of 1.2 for my purposes means that a given reliever entered situations where the leverage was 20% greater than the average reliever. So ADREIP is the combination of these two metrics. It is the RE24 value for a given pitcher multiplied by this adjusted gmLI value then divided by innings pitched. This develops an adjusted per inning value for all relievers.
Below are the top 10 pitchers with at least 40 innings in 2019.
Looking over the top names on this list there are not many major surprises among the most valuable pitchers in 2019. Yates and Hader who two of the game’s best closers and Workman was a savior at the back of the Boston bullpen. There is also some love thrown toward two breakout pitchers, Taylor Rogers and Liam Hendricks. Aaron Bummer is one of the best pitchers that no-one talks about in all of baseball.
Building The Model
The next step in my process is determining what factors lead to success for relievers. Typically, this is a bit of trial and error but it begins with a list of traits that I believe are important for a reliever. Here are the five different items that I think lead to a successful reliever.
- Swing and Miss: Often relievers enter the game with runners in scoring position and being able to generate a strikeout is vital to keeping that runner off base
- Limit Free Passes: Self-explanatory but it is difficult to string hits together so not giving away free bases is massive
- Limit Extra-Base hits: Again it is difficult to string multiple hits together at any one time limiting the hits you do allow to only singles is extremely important making it harder to score
- Generate Groundballs: Ground ball double plays can immediately erase a hit and make life on the pitcher much easier. Also, it is harder to hit grounders for extra bases which ties into 3
- Platoon Neutrality: In later innings, teams are much more willing to use a pinch hitter and their bench in general meaning it is likely the starter faces a number of opposite-handed hitters
The first four items are extremely easy to measure via existing stats and are all included in the model. Swinging Strike Rate, BB%, ISO-Allowed, and GB/FB are the factors used in the model to measure this. The final point, Platoon Neutrality, uses a custom metric I have devised that compares how a given pitcher’s platoon splits compare to the league averages, while also being adjusted for the ratio of same-handed versus opposite-handed hitters. This will help to determine pitchers who not only are strong against hitters from both sides of the plate and who are trusted by their managers to face both left-handed and right-handed hitters.
xADREIP: Version 1
Now that we were able to develop a list of inputs that had significance (correlation) with respect to ADREIP, we can take those inputs and develop a linear regression model using them. Since the predictor is a contextualized stat we needed to make sure that our sample factored in some form of context. The goal of the model is to determine relievers who would be successful in high leverage roles. So to define high-leverage I used all pitchers with at least 40 innings pitched and a gmLI of at least 1.20. For each season in my sample, this gave me around 60 pitchers or about two pitchers per team. This fits pretty well with my interpretation of high-leverage.
To avoid going too in-depth with the model building process I used a concept called principal component analysis. This is a way of handling collinearity in a model and helps to account for input measurements that may be correlated with one another. This can cause overfitting. After performing the PCA and extracting the relevant data points, I fit a model on my overall sample. The model R-Squared was around 0.5 meaning that about 50% of the variation in reliever “skill” is determined by items within my model. For a real-world application, this is an extremely successful model. Below are the top 10 relievers min 20 innings pitched by this metric.
As you can see there is some cross-over between the two lists but there are a couple of really interesting names that stand out on this list. First, we have the dominance that is Josh Hader near the top but he was actually bested by an under-appreciated name in Ryan Pressly. The Astros have an elite bullpen combining both Pressly and Roberto Osuna as well as the criminally under-appreciated Will Harris. Despite being a few years removed from arguably the best reliever season ever, Zack Britton still finds himself among the best pitchers in baseball. He should be extremely successful as a fill-in for Aroldis Chapman.
Another version of the model focuses on using the idea of comps. For those of you who may be familiar with my prospect model, or the old KATOH model from Fangraphs. This version takes the inputs described above and finds the most similar seasons. This is done using a distance formula called Mahalanobis distance, and a weighting system for the different inputs. Then once I am able to determine the most similar seasons I weight those based on their relative probabilities to develop a singular number. While this number does in fact have some value in a standalone context, the best use of it is for the range of outcomes charts.
The below chart compares the 2019 season results of Brad Hand and James Karinchak, the potential Indians closer in waiting. The output below is built entirely off this comparison based model. Karinchak showed some electric skills in his brief MLB debut, but the emphasis on this should be brief. The young right-hander had video game numbers in the minors last season and has the look of a dominant future closer.
The numbers and the model seem to agree. Karinchak seems to have sky-high upside with a smaller chance of a truly awful season than Hand. The best way to interpret this is that the bars show the probability that a pitcher with similar underlying metrics had an ADREIP in that window. So in the 0.25 to 0.3 window, similar seasons ended up here about 15% of the time for Karinchak and about 10% of the time for Hand. As you can see Hand has a higher rate in the lower ranges while Karinchak is better at the high end.
Typically, I do not like to mention pitchers with samples as small as Karinchak but he is a really fun example and I got to play against him several times in college as he pitched in my conference. It was fun watching him pitch then and it is even more fun that I get to write about him now.
This was a longer post than I typically like to write but it has been a long time since I really fleshed out the details of my relief pitcher model. Overall, the model uses underlying skills that have proven to be successful in high-leverage roles to determine pitchers who should have success. The issue is that a lot of these skills do not translate well from year to year and in extremely small samples like we get with relievers. The best way to use this is to describe what should have happened. However, xADREIP does correlate better with next season ADREIP than actual ADREIP does. There is value in using it when looking deep for those closer buys or for evaluating who may be the next man up for a given roster. However, it is important to remember that real-life decisions are made by managers who will factor in things that are beyond our ability to predict.
If you liked what you saw in the piece and want to look at how your own favorite pitcher grades, check out this link. I am in the process of making small backend changes that will not impact the numbers so bear with me as I go. Additionally, the name is way too cumbersome and confusing so I want to hear some name suggestions. Hit me up on Twitter with your suggestions and I will buy a Rotofanatic shirt for the person who suggests the name I ultimately choose.