This August…Science magazine published a scandalous article. The subject was the practice of behavioral psychology.
Over 270 researchers, working as the Reproducibility Project, had gathered 100 studies from three of the most prestigious journals in the field of social psychology. Then they set about to redo the experiments and see if they could get the same results. Mostly they used the materials and methods the original researchers had used. Direct replications are seldom attempted in the social sciences, even though the ability to repeat an experiment and get the same findings is supposed to be a cornerstone of scientific knowledge. It’s the way to separate real information from flukes and anomalies.
These 100 studies had cleared the highest hurdles that social science puts up. They had been edited, revised, reviewed by panels of peers, revised again, published, widely read, and taken by other social scientists as the starting point for further experiments. Except . . .
The researchers “found something very disappointing. Nearly two-thirds of the experiments did not replicate, meaning that scientists repeated these studies but could not obtain the results that were found by the original research team.”
“Disappointing” was [a word] commonly heard that morning and over the following several days, as the full impact of the project’s findings began to register in the world of social science. Describing the Reproducibility Project’s report, other social psychologists, bloggers, and science writers tried out “alarming,” “shocking,” “devastating,” and “depressing.”
But in the end most of them rallied. They settled for just “surprised.” Everybody was surprised that two out of three experiments in behavioral psychology have a fair chance of being worthless.
The most surprising thing about the Reproducibility Project, however—the most alarming, shocking, devastating, and depressing thing—is that anybody at all was surprised. The warning bells about the feebleness of behavioral science have been clanging for many years.
For one thing, the “reproducibility crisis” is not unique to the social sciences, and it shouldn’t be a surprise it would touch social psychology too. The widespread failure to replicate findings has afflicted physics, chemistry, geology, and other real sciences. Ten years ago a Stanford researcher named John Ioannidis published a paper called “Why Most Published Research Findings Are False.”
“For most study designs and settings,” Ioannidis wrote, “it is more likely for a research claim to be false than true.” He used medical research as an example, and since then most systematic efforts at replication in his field have borne him out. His main criticism involved the misuse of statistics: He pointed out that almost any pile of data, if sifted carefully, could be manipulated to show a result that is “statistically significant.”
Statistical significance is the holy grail of social science research, the sign that an effect in an experiment is real and not an accident. It has its uses. It is indispensable in opinion polling, where a randomly selected sample of people can be statistically enhanced and then assumed to represent a much larger population.
But the participants in behavioral science experiments are almost never randomly selected, and the samples are often quite small. Even the wizardry of statistical significance cannot show them to be representative of any people other than themselves. 2
Ferguson notes that the defenses offered by social scientists inadvertently underscore the inherent dubiousness of the enterprise:
Defenders cited a host of biases to which the original researchers might have succumbed, especially publication bias and selective data bias. And besides, the defenders pointed out, a failed replication doesn’t tell us too much: The original study might be wrong, the replication might be wrong, they might both be wrong or right. Small changes in methodology might influence the results; so might the pool of people the samples are drawn from, depending on age, nation of origin, education level, and a long list of other factors. The original study might be reproducible in certain environments and not in others. The skill of the individual researcher might enter in as well.
All true! Rarely do social scientists concede so much about the limitations of their trade; the humility is as welcome as it is unexpected. But these are not so much defenses of social psychology as explanations for why it isn’t really science. If the point is to discover universal tendencies that help us predict how human beings will behave, then the fragility of its experimental findings renders them nearly useless. The chasm that separates the psych lab from everyday life looks unbridgeable. And the premise of behavioral science—that the rest of us are victims of unconscious forces that only social scientists can detect—looks to be not merely absurd but pernicious.
For even as it endows social scientists with bogus authority—making them the go-to guys for marketers, ideologues, policymakers, and anyone else who strives to manipulate the public—it dehumanizes the rest of us. The historian and humanist Jacques Barzun noticed this problem 50 years ago in his great book Science: The Glorious Entertainment. Social psychology proceeds by assuming that the objects (a revealing word) of its study lack the capacity to know and explain themselves accurately. This is the capacity that makes us uniquely human and makes self-government plausible. We should know enough to be wary of any enterprise built on its repudiation.
Those who have grown weary of reading about the Milgram experiment, the Zimbardo prison study, and the Mischel marshmallow test will find Ferguson’s article a useful reinforcement to both their native skepticism about the applicability of “psych lab” findings to real life and their nagging suspicion that such findings at best do little but cloak common wisdom—“people obey authority,” “power corrupts,” “self-control is valuable,” etc.—in the regalia of science. Those who take social science more seriously, on the other hand, may still find Ferguson’s criticisms worth pondering, if only in order to refute them—assuming, that is, that they can be refuted.