BOOKS AND ARTS SEPTEMBER 1, 2003
Moneyball: The Art of Winning an Unfair Game
By Michael Lewis (W. W. Norton, 288 pp., $24.95)
Michael Lewis’s new book is a sensation. It treats a topic that would seem to interest only sports fans: how Billy Beane, the charismatic general manager of the Oakland Athletics, turned his baseball team around using, of all things, statistics. What next—an inspirational tale about superior database management? But there are some broader lessons in Lewis’s book that make it worth the attention also of people who do not know the difference between a slider and a screwball. Those lessons have to do, above all, with the limits of human rationality and the efficiency of labor markets. If Lewis is right about the blunders and the confusions of those who run baseball teams, then his tale has a lot to tell us about blunders and confusions in many other domains.
Lewis focuses on the extraordinary success of Beane, who has produced a terrific baseball team despite one of the lower payrolls in baseball. Since 1999, when Beane took over, the Athletics have compiled an amazing record. Consider a few numbers. In 1999, the Athletics ranked eleventh (out of fourteen teams) in the American League in payroll and fifth in wins. In 2000, the Athletics ranked twelfth in payroll and second in wins, a feat that they duplicated in 2001. In 2002, they ranked twelfth in payroll again—and first in wins.
How did Beane pull this off? He did it largely by ignoring or defying baseball’s conventional wisdom, otherwise known in baseball lingo as The Book. (As in, “The Book says that you should bunt in this situation.”) It turns out that many chapters of The Book are simply wrong. Sacrifice bunts are rarely a good strategy, and steals are vastly overrated. (Unless a base stealer succeeds at least three-quarters of the time, his running efforts reduce runs scored rather than increase them.) The portion of The Book that was most in need of revision, and the most important edge that Beane was able to exploit, was in player evaluation. Here he tried to figure out, scientifically, how much a player was likely to contribute to his team’s chances. He relied on objective evidence, explicitly ignoring anything that could be dismissed as “subjective.”
Beane found that, as a statistical regularity, players drafted out of high school are much less likely to succeed than players drafted out of college. And so he drafted no high school players, regardless of how highly they were touted. He hired a young assistant named Paul DePodesta, a Harvard economics graduate, who relied on his computer to project players’ performances, without so much as ever seeing a player swing a bat. Much of the tension, and the comedy, of Lewis’s book comes from the conflict between Beane’s and DePodesta’s statistical methods of evaluation and the well established strategies of experts who have scouted, played, and breathed baseball for decades. The verdict? Statistical methods outperform experts. It’s not even close.
As Lewis tells the tale, Beane’s particular approach has intensely personal foundations. Beane himself was a top high school prospect, one of the most sought-after in the nation. He was fast, he was tall, he was strong, he could hit the ball a mile. The baseball scouts loved him. As one of them admitted, “I never looked at a single statistic of Billy’s. It couldn’t have crossed my mind.... He had it all.” According to another, “The boy had a body you could dream on. Ramrod-straight and lean but not so lean you couldn’t imagine him filling out. And that face!” Beane was selected in the first round of the draft, with the highest of expectations. He was destined to be a star.
There was just one problem: Beane did not play baseball very well. He thought too much. He was too emotional. His failures notwithstanding, baseball people saw his body, and his face, and his raw talent, and concluded that he was bound to succeed. “Teammates would look at Billy and see the future of the New York Mets. Scouts would look at him and see what they had always seen.... The body. The Good Face.” He certainly had talent, and once in a while he would do something truly sensational. But after several years in major league baseball, his performance was woefully bad. With only 301 at-bats, he hit .219; more embarrassingly, he had 80 strikeouts and only 11 walks. Abruptly, he quit the game. While playing for Oakland, he told the team’s general manager that he no longer wanted to be a player, and would prefer the job of advance scout, an employee who travels ahead of the team to analyze future opponents. The team’s general manager was stunned: “Nobody does that. Nobody says, I quit as a player. I want to be an advance scout.”
Beane was a much better baseball analyst than baseball player, and he quickly moved up the Oakland club’s hierarchy. He became interested in a simple question: what is the most efficient way to spend money on baseball players? The origins of Beane’s iconoclastic answers can be found in the writings of Bill James, a once obscure but now legendary baseball writer-statistician. While working as a night watchman for a pork-and-beans factory, James decided that he wanted to write about baseball in a way that would illuminate what really happened and why. In his view, conventional statistics were insufficiently helpful and sometimes downright misleading. Consider the area of defensive play. When a player mishandles a ball or makes a bad throw, he can be assigned an “error.” A player who accumulates a lot of errors seems like a bad fielder, whereas one with few errors seems really good. The problem is that a player may accumulate errors in part because he is unusually good at getting to the ball. If you do not get to the ball, you do not get an error (according to the chapter on scoring in The Book). So errors are a crude measure of fielding ability.
Or consider walks. Since the late nineteenth century, walks have been treated, in official statistics, as neutral—neither good nor bad. According to a nineteenth century expert whose advice is followed to the present day, “There is but one true criterion of skill at the bat, and that is the number of times bases are made on clean hits.” Of course, many people realized that a walk is a positive event for the hitting team and a negative event for the team in the field, but this commonsense notion was not incorporated into baseball’s most common measure of batting skill, the batting average, which leaves walks out. James found this preposterous, and he pushed for the use of the “on-base percentage” as an improvement. James also criticized a standard measure of a hitter’s value, the notion of “runs batted in.” James pointed out that some players are in a position to bat in a lot of runs, because they are lucky or because they play on good teams. Other players bat in fewer runs, but only because they do not have the opportunities of their apparent superiors: “There is a huge element of luck in even having the opportunity, and what wasn’t luck was, partly, the achievement of others.”
Eventually James punctured countless myths about what was important to winning in baseball. And he had a positive agenda, too. He devised a formula to measure “runs created”—a formula that predicted, from just a few aspects of a player’s performance, how many runs he would produce for an average team. James’s formula had explosive implications. It suggested that professional baseball experts, those who ran the teams, were placing far too much emphasis on batting averages and stolen bases, and far too little on walks and extra base hits. After a slow start, James was widely read; his books became best-sellers, and he became a kind of cult figure among certain baseball fans. But baseball’s experts and executives treated James’s work as irrelevant. He had no effect on what they did. And with a few exceptions, the tried-but-not-so-true baseball statistics such as batting average and RBIs remain the only ones reported.
So Billy Beane, the “can’t miss” prospect who missed, became an avid Bill James reader. As Lewis writes, “James had something to say specifically to Billy: you were on the receiving end of a false idea of what makes a successful baseball player. James also had something general to say to Billy, or any other general manager of a baseball team who had the guts, or the need, to listen: if you challenge the conventional wisdom, you will find ways to do things much better than they are currently done.” Wanting to ensure that statistical analyses were done, and done right, Beane hired DePodesta to study player performances with the aid of a computer. Some of Lewis’s most hilarious passages illustrate the debate between old baseball wisdom and statistical knowledge:
“The guy’s an athlete, Billy,” the old scout says. “There’s a lot of upside there.”
“He can’t hit,” says Billy.
“He’s not that bad a hitter,” says the old scout.
“Yeah, what happens when he doesn’t know a fastball is coming?” says Billy.
“He’s a tools guy,” says the old scout....
“But can he hit?” asks Billy.
“He can hit,” says the old scout, unconvincingly.
Paul reads the player’s college batting statistics. They contain a conspicuous lack of extra base hits and walks. “My only question is,” says Billy, “if he’s that good a hitter why doesn’t he hit better?” . . .
Over and over the old scouts will say, “The guy has a great body,” or, “This guy may be the best body in the draft.” And every time they do, Billy will say, “We’re not selling jeans here,” and deposit yet another highly touted player, beloved by the scouts, onto his shit list.
Beane ends up seeking, and getting, young players that other teams simply do not want. These were people largely ignored by the professional scouts, typically because they had something wrong with them—they did not match up with the scouts’ mental prototype of a successful ballplayer. While scouts on other teams were still searching for young players who looked like Beane did in high school, DePodesta was busy surfing the Internet. “The evaluation of young baseball players,” Lewis writes, “had been taken out of the hands of old baseball men and placed in the hands of people who had what Billy valued most (and what Billy didn’t have), a degree in something other than baseball.”
The statistical method was the only way for Beane to solve a serious problem: obtaining first-rate talent without a lot of money. After all, the New York Yankees had three times the budget of the Oakland Athletics. And if Beane did find good players, and they performed well, they would be bid away by richer teams. Owing to his low payroll, he would be forced to replace his own greatest successes. In 2001, Oakland won 102 games in the regular season, the second-highest total in baseball. They lost three players widely regarded as their best, and they were expected by many to have a catastrophic fall. Instead they used statistical methods to try to replace the lost players with new ones who would provide statistical equivalents—and they ended up winning 103 games, the most in baseball. Their payroll for that year was $34 million, less than half that of their division rivals the Seattle Mariners. In Lewis’s account, Beane was able to succeed because “the market for baseball players was so inefficient, and the general grasp of sound baseball strategy so weak, that superior management could still run circles around taller piles of cash.”
Lewis has a wonderful story to tell, and he tells it wonderfully. His account of Beane’s success is punctuated by descriptions of numerous colorful characters, among them a promising fat catcher dumbfounded by Beane’s interest in him, an excellent pitcher whose fastball is extremely slow, and of course Beane himself. Lewis also raises some serious puzzles that he does not resolve, and his account has some large and perhaps profound implications that he does not much explore.
Why do professional baseball executives, many of whom have spent their lives in the game, make so many colossal mistakes? They are paid well, and they are specialists. They have every incentive to evaluate talent correctly. So why do they blunder? In an intriguing passage, Lewis offers three clues. First, those who played the game seem to overgeneralize from personal experience: “People always thought their own experience was typical when it wasn’t.” Second, the professionals were unduly affected by how a player had performed most recently, even though recent performance is not always a good guide. Third, people were biased by what they saw, or thought they saw, with their own eyes. This is a real problem, because the human mind plays tricks, and because there is “a lot you couldn’t see when you watched a baseball game.”
Lewis is actually speaking here of a central finding in cognitive psychology. In making judgments, people tend to use the “availability heuristic.” As Daniel Kahneman and Amos Tversky have shown, people often assess the probability of an event by asking whether relevant examples are cognitively “available.” Thus, people are likely to think that more words, on a random page, end with the letters “ing” than have “n” as their next to last letter—even though a moment’s reflection will show that this could not possibly be the case. Now, it is not exactly dumb to use the availability heuristic. Sometimes it is the best guide that we possess. Yet reliable statistical evidence will outperform the availability heuristic every time. In using data rather than professional intuitions, Beane confirmed this point.
There is an even larger puzzle. Why didn’t someone like Beane come along sooner? Why didn’t baseball executives start using statistics a decade, or two decades, or three decades, earlier? Why have falsehoods and mistakes persisted? The economic stakes are extremely high, after all, and if Lewis is correct, the management of most baseball teams could have saved many millions of dollars simply by making more rational personnel decisions. Nor was the important information hard to find. James’s arguments have been around for nearly two decades. In a market as competitive as major league baseball, surely the information should have been used, and fast. What went wrong?
The problem is not that baseball professionals are stupid; it is that they are human. Like most people, including experts, they tend to rely on simple rules of thumb, on traditions, on habits, on what other experts seem to believe. Even when the stakes are high, rational behavior does not always emerge. It takes time and effort to switch from simple intuitions to careful assessments of evidence. This point helps to explain why baseball owners have been slow to copy Beane’s approach. But at least they are starting. The Toronto Blue Jays and the Boston Red Sox have recently hired general managers who follow Beane. And, in one of the longest-overdue moves in baseball, the new general manager of the Red Sox, just twenty-eight years old, has hired James as a consultant.
There are many lessons from this book that apply in domains far from baseball. One involves the harmful repercussions of using bad statistics. Consider the case of the “save” statistic. A save is awarded to a relief pitcher who comes in near the end of a close game with his team ahead and “saves” the win for his team. Most thoughtful observers realized long ago that this is a really dumb statistic. Why should pitching the last inning of a game have any special significance? A pitcher who comes in for the sixth inning of a tied game and pitches three scoreless innings has done something much more important than one who just pitches the ninth inning, protecting a three-run lead. Now, a dumb statistic could be harmless, but in this case, as is often true, the very fact that the number is collected and tabulated ends up influencing behavior. The existence of the save statistic (and the muddled thinking that goes along with it, namely the idea that runs scored at the end of the game count more) seems to have altered the way teams use their relief pitchers. Statistics can do a lot better than intuitions; and relevant statistics can do a lot better than irrelevant ones, which tend to take on lives of their own.
In the past twenty years, most teams have settled on using their best relief pitcher in a specific role that has acquired a name: the closer. The closer comes in just to pitch the ninth inning in close games with his team ahead. Though this strategy is nearly universally used, it is clearly stupid. A team that is ahead by three runs going into the ninth inning has a 97 percent chance of winning the game (a percentage that has not changed since the advent of the closer). As James has argued for years, it makes more sense to use your best relief pitcher in more crucial situations, such as a tied game. Saving a three-run lead is much easier than protecting a tie, since you can give up two runs and still win the game.
What should a team do if it figures this out? One strategy would be to take your best reliever and use him more strategically, sometimes in the seventh inning of a close game with the best players of the other team due to bat, other times with a one-run lead in the ninth. This strategy would win more games, but it would not create many “saves” for the ace. Beane did this, but he also did something deviously clever: he created closers in order to sell them. As Lewis puts it: “Established closers were systematically overpriced, in large part because of the statistic by which closers were judged in the marketplace: `saves.’ The very word made the guy who achieved them seem vitally important.... You could take a slightly above average pitcher and drop him into the closer’s role, let him accumulate some gaudy number of saves, and then sell him off. You could, in essence, buy a stock, pump it up with false publicity, and sell it off for much more than you paid for it.”
Beane’s use of relief pitchers, and his performance in general, bears on a more general problem. To what extent are the top managers in an organization—here the owners and the general managers—able to push a rational but radical change down through an organization? Beane has an owner who is sympathetic to his philosophy, but if he wants to try something new, such as using the relief ace flexibly, he has to convince the field manager to implement his strategy. He also has to avoid a rebellion by the players. To get his manager to use the player Beane thinks is his most effective pitcher in tight situations, Beane tells the manager to think of him as “the closer before the ninth inning.” A relief ace would likely complain about being used in the optimal manner, because he would accumulate fewer saves and thus would be worth less on the open market. Similarly, suppose a player takes more pitches in an attempt to draw more walks and as a result increases his on-base percentage at the cost of lowering his batting average. His team might like this trade-off, but if it lowered his value to other teams, then the player might suffer in the free-agent market.
Finally, there is the impact of the media and the fans. When James was hired by the Red Sox last winter, there was great anticipation about how the team would deal with relief pitchers. The rational strategy of using pitchers to maximize the chance of winning the game was quickly dubbed “bullpen by committee” by Boston sportswriters, who knew (from The Book, naturally) that this was a terrible idea. When Red Sox relievers lost the opening-day game to the woeful Tampa Devil Rays and suffered through an awful opening month, James was viewed as the villain. Of course, James does not advocate bad pitching, and, presumably with his help, the team has acquired three new relief pitchers. But, interestingly, they also seem to have designated one as their closer, perhaps deciding to let this particular battle wait for another day.
The difficulty of achieving sensible change in organizations is hardly special to baseball. If a new CEO comes in and wants to change how things are done, a selling job must be performed all the way down the organizational ladder. Every institution has organizational norms, ways of doing things (the in-house version of The Book) that are hard to overcome. The new guy is told, well, we don’t do things that way here. If the CEO forces his views down the organizational ladder and his methods are unconventional, then lower-level workers may face the same dilemma as the player trying to get more walks. Do they maximize their value to the current CEO or to the outside world?
Lewis’s account bears more generally on the performance of markets. He often refers to the inefficiency of the market for player talent, as the example of the closer illustrates. If it were not for this inefficiency, Beane would not be able to procure a winning team while spending one-third as much as the Yankees and one-half as much as his division rivals. How can this market be so inefficient? Simply put, the baseball owners seem to have evolved into a “bad equilibrium.” Teams have thought the same way for years, The Book has never been revised, and so there are massive inefficiencies that have been relatively easy for Beane to exploit. You do not need a Harvard economics graduate to realize that “a walk is as good as a hit,” an expression that was around well before James started writing. Beane’s key insight was exactly that this market was inefficient and could therefore be exploited. Now that Beane has succeeded, it is likely that the market for baseball-player talent will get more competitive. Still, it is embarrassing that it took so long for this to happen, especially for those who think that competitive markets always lead to a rational allocation of resources.
What does this tell us about other markets? Lewis poses this question: “If professional baseball players could be over- or undervalued, who couldn’t? Bad as they may have been, the statistics used to evaluate baseball players were probably more accurate than anything used to measure the value of people who didn’t play baseball for a living.” Right! On the basis of first principles, the market for baseball players should be one of the most efficient labor markets on earth. It is hard to think of any high-paid profession in which performance is measured so precisely—and is publicly available to every other potential employer. Compare the market for baseball players with the market for corporate executives. A company looking for a new director of human resource management would be hard-pressed to get any objective data on the past performance of job candidates. Instead, such a company would be forced to make choices based on interviews with the candidates—a process that is even less accurate than the one the old scouts use to size up a high school player. Interviews are notoriously bad predictors of future job performance. In most contexts their predictive value is essentially zero.
The biases caused by labor markets using subjective evaluations instead of objective measures of output are potentially huge. We can glimpse the scope of the problem by studying one subjective variable that most people would say should be irrelevant to accurate job evaluations in most positions: physical beauty. Away from places such as the back lots of Hollywood and the runways of Milan, most of us who do not look like Ben Affleck or Jennifer Lopez would like to think that successful people get where they do because of their accomplishments, not their attractiveness. Alas, it ain’t so. To take one simple example, height matters. In the United States, the taller presidential candidate has usually won. And beauty matters in domains where we might not expect it to matter: law and business. In both fields, and for both sexes, career success and earnings are correlated with good looks.
And consider another problem. We have said that interviews are not useful predictors of job performance—except that once you have conducted an interview, it is almost impossible to avoid the conclusion that you have learned a lot. But the facts are clear: interviews are nearly useless at predicting anything except whether the interviewer will subsequently like the interviewee. So a rational Beane-like strategy would be to interview only employees who will work directly with you and otherwise make all decisions based on objective criteria and statistical models. Rather than conduct interviews, firms should give tests and pay more attention to grades in school (a good predictor of both intelligence and diligence, admirable traits in nearly any job). Procedures such as this also prevent discrimination, inadvertent or otherwise, based on factors such as race, gender, or beauty. Major symphony orchestras have found that if they conduct auditions with the candidates hidden behind a screen, more women are hired. And when employees are hired, firms should seek objective measures of performance, which distinguish luck from skill.
Some enlightened firms are already taking steps along these lines. Many more should be doing so. We suspect that countless areas of enterprise, both private and governmental, would benefit from their own Billy Beanes and Paul DePodestas, challenging widespread intuitions, or what “everyone knows,” with statistical information about what works and what does not, and with performance measures that more accurately reflect the true contribution to organizational success. Baseball is not the only realm for which The Book is in need of revision.
Richard H. Thaler teaches at the University of Chicago Graduate School of Business and is the author of The Winner’s Curse (Princeton University Press). Cass R. Sunstein teaches at the University of Chicago Law School. His new book, Why Societies Need Dissent, will be published by Harvard University Press this fall. This article appeared in the September 1, 2003, issue of the magazine.