| RU
Access |
![]() |
|
|
Size Matters, a Faculty Essay, by Stephen T. Ziliak Stephen T. Ziliak has been an associate professor of economics at Roosevelt University since 2003. He is the author of award-winning "Statistical significance" is, you once learned, crucial for getting a scientific result. Now it’s time to unlearn it. Because it’s not. Suppose you want to help your mother lose weight, and are looking at two diet pills nearly identical: in price, side effects, style. The only difference between them is in amount of probable weight loss. Pill "Oomph" will on average take off 20 pounds but it’s a little shaky, at plus or minus 14 pounds. Not bad. Alternatively, the pill "Precision" will take off only five pounds on average, but it’s much more certain in its effect: Precision brings a probable error of plus or minus 0.5 pound. Sweet! Scientists say the "signal-to-noise ratio" of diet pill Oomph is
1.43-to-1—that’s because the predicted effect of 20 pounds
divided by the probable error of 14 pounds is 1.43. It’s
error-ridden. But the ratio for pill Precision is higher,
10-to-1. Error-ridden, yes, but much more precise, you see. Which pill for mom? "Well," say our scientific colleagues, "the one with the
highest ‘signal-to-noise ratio’ is Precision.
So Precision, right?" Wrong. Yet a distressingly large
number of scientists in fields from agronomy to zoology
choose Precision over Oomph. They decide whether The phrase for this singular pursuit of precision is "statistical significance." Interestingly, it’s almost never pursued by atomic physicists, say, or by cell biologists. Wildlife biologists are a lot more confused. Economists still more so. The very worst are medical scientists and epidemiologists: they take precision over Oomph, then equate them, nearly every time, as if inference were possible relative to no currency. Soon-to-be-dead sperm and minke whales of Antarctica, and the makers and users of Vioxx®, are only the most recent victims of this strange ritual. In "The Standard Error of Regressions" (1996) I showed
with Deirdre McCloskey how significance testing was used
during the 1980s in the leading journal of mainstream
economics, the American Economic Review. Of 182 papers
published in the Review 70 percent did not distinguish
statistical from policy or substantive significance—that
is, from what we call "economic significance." And fully
96 percent misused a statistical test in some (shall i say) significant way or another. Of the 70 percent that flatly
mistook statistical significance for economic significance, Proof that this mistaken use of chance is causing a loss of jobs and justice can be found in a September 1987 study of the state of illinois unemployment insurance system. The authors estimate benefit-cost ratios for the state of Illinois from a pilot experiment. The intent of the experiment was to find a cash bonus that would reduce the duration of insurance claims. One group of unemployed workers was given a cash bonus for getting a job quickly and keeping the job for several months. In the other group the employers were given the bonus if claimants got a job with them and kept it for several months. Like Mom and other users of Oomph, taxpayers want to know the economic significance of the experiment. But the authors focused instead on the statistical significance. They argued that the benefit-cost ratio of the employer-based subsidy, which was $4.29 of unemployment benefit for each $1 of bonus paid out to employers, was "not statistically different from zero." Strange. Illinois was ahead of the game by $3.29 for every dollar spent, and the workers furthermore got employed. The authors didn’t buy it. They said the noise was higher than one normally reports in the scientific journals so they ignored the benefit-cost ratio. Bad idea. The signal was ample, actually—they could reject the hypothesis of zero effect with 88 percent assurance—but anyway the "zero" talk is the crazy part. The authors found and then ignored an efficient government program, a diet pill that’s easy to swallow. Some of my economist colleagues said, "Fine. But
you bring old news. After the 1980s, best practice
improved." So in a new paper, "Size Matters," I applied
the same analysis to all the papers of the next decade,
the 1990s. Unhappily, statistical practice is not getting
better. It’s getting worse. Of 137 papers in the 1990s,
82 percent mistook a statistically significant finding for
an economically significant finding. Precision conquered
Oomph. Market forecast?—a dismal science as barbaric as
medical science. Little wonder that students have dubbed How do scientists manage to get something so simple so
wrong? I’m not too sure though i have hunches, and have
said so field by field, back to 1885, in a forthcoming book,
Size Matters: How Some Sciences Lost Interest in Magnitude,
and What to Do About It (The MiT Press, 2006). The temptation is certainly there. Think of the O-rings of the spaceship Challenger, and the "scientific" cover up. Sir Francis Galton said if "the Greeks" had known about the bell curve they would have "personified" and "deified" it. Apparently we’re all idol-worshippers now. One can only hope that scientists will abandon their little deity and embrace again the real prime mover of science: size matters. "No size," we should say, noisily as possible, "no significance." Or, Precision is nice but Oomph is the bomb. |
|
||
|
© 2006, Roosevelt University, All Rights Reserved |
|||