When News Is Noise: Explaining Baseball’s Biggest Breakouts

Getty Images

Two months ago, Cleveland slugger Carlos Santana’s slump was a story. Santana, a .254/.367/.446 career hitter entering the season, was batting .159/.327/.301 as of May 25, and his 43 walks, while useful, couldn’t camouflage that eye-catching average.

On May 23, the Lake County, Ohio, News-Herald’s Jim Ingraham had recommended that the Indians send Santana to the minors: “Maybe a couple of weeks in Triple-A Columbus, out of the spotlight, would allow Santana to regroup, catch his breath, and find his swing,” Ingraham wrote. A few days later, Beyond the Box Score’s Michael Nestel dug into Santana’s batted-ball breakdowns and pitch distribution and came to a similarly sobering conclusion: “Carlos Santana does not stand a chance against any major league pitcher who can effectively change speeds within the strike zone,” Nestel wrote. “Expect more of the same going forward.” The Elyria Chronicle-Telegram’s Chris Assenheimer wondered whether Santana had really been any good to begin with. Sports Illustrated’s Jay Jaffe and the Athens Messenger’s Tyler Buchanan speculated that Santana’s part-time shift to third base (a position he’d played in the Dominican Winter League but had otherwise avoided after A-ball) had been a distraction. Santana himself said that he was “still tinkering with his swing,” which was “different than it was last season.”

On May 22, Indians blogger Evan Vogel asked me to explain Santana’s struggles. I mentioned that Santana had probably been hurt by the shift, but noted that without access to the HITf/x data that the Indians and other teams can consult, we’d have a hard time determining whether he’d been making weaker contact, which could indicate a screwed-up swing or an injury. However, I also noted that based on what we did know, Santana’s plate discipline seemed as strong as ever, and his BABIP hinted at lousy luck. My big, caveat-filled finish: “In the absence of [HITf/x] information (or maybe some in-depth swing/pitch-level analysis), the best bet is that his results will start looking more like the old Santana’s soon.”

Three days after those comments, a foul tip struck Santana’s mask, and two days after that, the resulting concussion sent him to the seven-day disabled list. Since returning from the DL on June 6, Santana has hit .311/.421/.622, including blasting five home runs against the Royals in Cleveland’s last three games. After chasing two ice-cold months with two torrid ones, his OPS+ on the season is 135, exactly what it was in 2013.

So, how do I project players so well? Unfortunately, I don’t, or haven’t demonstrated that I do. I’ve made my share of forecasting flubs, and I had plenty of company aboard the Santana bounce-back train. (Even in this instance, I didn’t do that well; I predicted that Santana would hit like his old self from that day forward, not that he’d suddenly start mashing like SUPER-SANTANA until his full-season stats returned to their typical level.) In Santana’s case, my best asset may have been the opposite of analysis: not knowing very much. If I’d seen Santana struggle day in and day out or done a deep dive into his stats, I almost certainly would have talked myself into identifying some underlying evil behind his lack of success, which might have made me more pessimistic about his future. As it was, I’d watched only snippets of his season and taken only a cursory look at his stats. Not having seen much of Santana probably saved me from saying something that would have sounded silly in retrospect.

On last week’s episode of the Jonah Keri Podcast, Jonah borrowed my podcast partner, Baseball Prospectus editor-in-chief Sam Miller, to discuss whether fans and analysts can do a better job than projection systems can of distinguishing between players whose fast or slow starts are statistical noise and those whose Santana-like slumps or hot streaks reflect a real change in true talent. “There are certain things that we with our eyes ought to be able to pick up on that computers and algorithms and one-size-fits-all formulas can’t,” Sam said. However, he continued, “We are all probably guilty of overvaluing and overrating how smart we are and thinking that we can find that factor in every player who’s doing something unexpected. If I [took 50 players who are dramatically overperforming their projections], I could probably find something for all 50. I could find some quote from a batting coach, or something in his PITCHf/x profile or in his spray charts or in something that he said or his diet the previous winter.”

In light of Santana’s seesaw season, let’s find out if Sam’s right! Below are two tables of hitters (minimum 200 plate appearances) and pitchers (minimum 75 innings pitched) who have exceeded their preseason PECOTA projections by the widest margins in 2014. (We could come up with explanations for the failures of players who’ve fallen furthest short of their projections, too, but we’ll accentuate the positive and stick with guys who are having good years.) We’re going to cap this exercise at 10 hitters and 10 pitchers, but if you want to accept Sam’s challenge and go 50 deep, you can find a full list of PECOTA over- and underperformers here.

For hitters, we’ll compare projected and actual performance using True Average, a league-, era-, and park-adjusted Baseball Prospectus stat that puts a player’s overall offensive production on the traditional batting average scale (where .260 is average and .300 is excellent). For pitchers, we’ll rely on ERA. Players are ordered by the percentage difference between their preseason projections and their actual performance.

Hitters

Name PA Actual TAv Preseason TAv Projection % Change Rest-of-season TAv Projection
Devin Mesoraco 262 .361 .254 142.1 .270
Seth Smith 338 .355 .272 130.5 .279
J.D. Martinez 236 .330 .256 128.9 .270
Carlos Gomez 434 .316 .256 123.4 .263
Scooter Gennett 326 .298 .243 122.6 .251
Corey Dickerson 270 .328 .270 121.5 .279
Jonathan Lucroy 423 .320 .265 120.8 .273
Russell Martin 270 .316 .261 121.1 .265
Dee Gordon 435 .289 .241 119.8 .247
Steve Pearce 238 .319 .271 117.7 .278

 

Now that we know the surprise players’ names, we just need to scout the Internet for things they’ve done differently:

[mlbvideo id=”34497529″ width=”500″ height=”280″ /]

Devin Mesoraco, Reds: gained confidence because of his new everyday role; adjusted his mechanics with hitting coach Don Long; stopped slouching at the plate thanks to Double-A manager Delino DeShields’s counsel

Seth Smith, Padres: started being able to see pitches on the outside part of the plate (even at night!) thanks to a second LASIK surgery

J.D. Martinez, Tigers: changed “everything” about his old swing and replaced it with a copy of Miguel Cabrera’s; started swinging harder

Carlos Gomez, Brewers: built on his 2012 swing change by raising his already-extreme first-pitch swing rate

Scooter Gennett, Brewers: learned the importance of laying off unhittable pitches thanks to Ron Roenicke’s tutoring.

Corey Dickerson, Rockies: allowed less fear of failure; prepared better for opposing pitchers; improved his swing rates against tough pitch types

Jonathan Lucroy, Brewers: figured out how to make time to work on his offense as well as his responsibilities behind the plate

Russell Martin, Pirates: moved beyond a right shoulder injury that limited his “ability to swing freely” last season

Dee Gordon, Dodgers: started hitting fewer fly balls; began standing closer to home plate; found a defensive home that translated into increased comfort at the plate

Steve Pearce, Orioles: shortened his stride, which shortened his swing, which keeps his bat “in [the] the hitting area longer”

Pitchers

Name IP Actual ERA Preseason Proj. % Change RoS Projection
Scott Kazmir 129.1 2.37 4.57 51.9 4.08
Jake Arrieta 91.0 2.18 4.14 52.6 3.91
Danny Duffy 98.1 2.47 4.39 56.3 3.98
Corey Kluber 149.1 2.77 4.66 59.5 4.42
Garrett Richards 137.1 2.62 4.40 59.6 4.08
Chris Young 124.1 3.04 4.96 61.3 4.34
Johnny Cueto 155.2 2.08 3.39 61.4 3.20
Tanner Roark 127.2 2.82 4.36 64.7 4.07
Henderson Alvarez 130.2 2.62 3.98 65.8 3.90
Dallas Keuchel 127.1 3.11 4.68 66.5 4.07

 

[mlbvideo id=”34469783″ width=”500″ height=”280″ /]

Scott Kazmir, Athletics: completed a mechanical resurrection prior to 2013; began throwing fewer four-seamers and more pitches down in the zone

Jake Arrieta, Cubs: began throwing a nastier slider; found better balance and delivery repetition; refined his fastball command

Danny Duffy, Royals: found an increased willingness to attack the zone and a greater ability to repeat his mechanics and harness his emotions

Corey Kluber, Indians: retooled his delivery and gained velocity; refined his sinker; began throwing a mysterious four-seamer

Garrett Richards, Angels: experienced a velocity boost; shifted toward the first-base side of the rubber; replaced fastballs with sliders

Chris Young, Mariners: traded fastballs for sliders

Johnny Cueto, Reds: changed locations and pitch types on two-strike counts to avoid empty pitches

Tanner Roark, Nationals: improved his mechanics, two-seamer command and movement, and mental fortitude

Henderson Alvarez, Marlins: improved his changeup and started throwing it much more often

Dallas Keuchel, Astros: started throwing more sinkers in the strike zone; developed a new breaking ball

Success! We’ve tied a neat bow on each breakout or bounce-back. On a case-by-case basis, each adjustment sounds convincing. Martin isn’t hurt anymore. Martinez overhauled his whole swing, modeling his new look on one of baseball’s best hitters. Richards throws harder. These are real-sounding reasons for newfound success that seem to promise more of the same.

And yet if the past several seasons are any guide, we know that the hitters and pitchers above are about to go back to being exactly what we’d think they were going to be if we’d seen their stats but knew nothing of their shortened swings and streamlined motions on the mound. In a twopart series published earlier this year, Mitchel Lichtman, coauthor of The Book and a former sabermetric consultant, studied the rest-of-season performance of the hitters and pitchers who had most defied their preseason projections through various points in the 2007-13 seasons. These are the players whose data from previous seasons fans are most tempted to disregard or downplay — after all, those stats were compiled by the old version of the player, not the new-and-improved model with the fancy new swing. Yet Litchtman found that even after five months of the regular season, the players in the “hot” and “cold” groups performed much like the projection systems (which slightly adjusted their expectations up or down based on in-season stats, but continued to place a heavy weight on past performance) said they would over the final month of the year. Aside from their slight impact on the updated projections, current-season stats offered little to no predictive value, even when Lichtman accounted for pitcher velocity changes, which PECOTA’s projections ignore.

Take the collective stats of the two 10-player groups in the tables above:

Hitters
PA Preseason TAv Projection Actual 2014 TAv Rest-of-season TAv projection
3,232 .258 .321 .266
Pitchers
IP Preseason ERA Projection Actual 2014 ERA Rest-of-season ERA projection
1,271.0 4.34 2.61 4.03

 

In both buckets, the players’ actual 2014 performances have significantly exceed their preseason projections, yet their rest-of-season projections have improved only slightly. It’s tempting to dismiss that eight-point projected-TAv shift and 0.31 projected-ERA reduction as a stats-only system underselling soft factors — hello, PECOTA, Seth Smith can see now — but Lichtman’s research suggests that in the aggregate, the hard factors alone do a damn good job.

On June 18, Sam and I decided to put Lichtman’s findings to an unscientific test by pitting ourselves against PECOTA’s rest-of-season projections for a group of 40 hitters and pitchers, half of whom had significantly underperformed their projections and half of whom had exceeded them. So far, Sam has hit on 15 out of 40, a success rate of 37.5 percent. I’ve picked correctly on 19 out of 40, or 47.5 percent — almost a coin flip, but not quite. And Sam and I are supposed to know things about baseball! The sample is much too small to determine our true talents as projection-system beaters, but the early returns aren’t encouraging. That doesn’t mean no one can distinguish between fleeting or immaterial changes and meaningful ones that are likely to last — scouts are paid to do that, while I’m paid to write about how hard it is to do that — but it’s fair to wonder how many amateur talent evaluators can convincingly claim to be able to beat the projections, and what the burden of proof should be.

This leaves us in a pickle that even Josh Harrison would have a hard time escaping. The players who’ve had the most surprising seasons are the ones we most want to write about, read about, and attempt to decipher, but they’re also the ones who are most likely to deviate from whatever they’ve done recently. The problem is that players are constantly tinkering. Some of their adjustments are merely cosmetic. Others are significant but not sustainable, difficult for the player to maintain or thwarted by opposing players’ counteradjustments; improved mechanics today might be a mess tomorrow, and a new pitch might stop working so well once the surprise wears off. If a player produces, we’re more likely to hear about (and buy into) whatever mechanical change he made, which means more potential for post hoc analysis. As Sam put it on the podcast, we have to wonder “whether we can trust ourselves to not dramatically over-find changes.”

Maybe early-season Santana’s swing was off. Maybe his position switch really was messing with his mind. (He hasn’t played third since May 22, although that has more to do with defense and Lonnie Chisenhall’s bat than with Santana’s psyche.) Or maybe all he had to do was wait for better bounces. If you can tell which explanation best suits Santana, and if you can consistently do the same with other slumping or streaking players, you might have a front-office future.

Filed Under: MLB, MLB Stats, Carlos Santana, MLB Breakouts, Regression, Projections, PECOTA, Man vs. Machine, Ben Lindbergh

Ben Lindbergh is a staff writer at Grantland.

Archive @ BenLindbergh