I am grateful to Catherine Turco and Ezra Zuckerman for writing such a thoughtful critique of my article, “Common Sense and Sociological Explanations” (AJS 120 : 313–51). There is much of value in their discussion— including, not the least, pointers to some interesting sociological research— and I encourage everyone to read it. As I will explain,
however, I think there is less daylight between my own position and theirs than they contend. I suspect, in fact, that much (although not all) of their objection to my argument rests on a misunderstanding of my claim that the interpretability of sociological explanations sits in tension with their scientific (i.e., causal) validity. That said, it is a misunderstanding that, on reflection, is easily made, so I welcome the opportunity to clarify my initial claim.
The misunderstanding appears to derive from a sentence (p. 315) in which I argue that “if sociologists want their explanations to be causal, they must place less emphasis on understandability (i.e., sense making) and more on their ability to make predictions.” From this sentence, Turco and Zuckerman infer that “the strong implication is that the pursuit of verstehen is a diversion—a waste of sociological time and energy” (2017, p. 1273). Actually, I didn’t intend to imply any such thing. Quite to contrary, I later argue that neither social life nor sociology would be possible without the ability of humans to put themselves in the place of others via some process of
mental simulation (pp. 326–27). I am certainly not advocating, as Turco and Zuckerman seem to infer, for some form of verstehen-free sociology—in fact, I can’t even imagine what that would look like.
On reflection, what that sentence should have said is “if sociologists want their explanations to be causal, then when evaluating them they must place less emphasis on understandability.” This phrase is just four extra words, but with it I would have more clearly articulated my actual point, which is somewhat different than the one that Turco and Zuckerman spend much of their commentary rebutting. As far as I’m concerned, sociological explanations can be generated in many ways: from data mining, from mathematical models, from the historical record, from ethnographic observations, from survey results, from everyday anecdotal experience, or simply from sitting and thinking about why people do what they do. All these modes of inquiry are useful in their own way and verstehen plays a role in all of them, as Turco and Zuckerman persuasively argue.1
Their objection, however, misconstrues my own argument, which was not about how we generate explanations at all, but rather about how we evaluate them. And it is in the evaluation phase of sociological explanation that I was arguing interpretability is problematic—again, not because it is bad per se, but rather because interpretability, valued for its own sake, diverts attention from scientific validity.
This is a subtle but important distinction, so I will say it again. My point was not that interpretable explanations are all else equal less likely to be valid than uninterpretable explanations. If two explanations have passed the same test of causal validity and have the same predictive power and one is more interpretable, then I am all in favor of preferring that one.
Rather my point was that in practice all else is not equal. Because sociologists, just like nonsociologists, value interpretability as an end in itself,
interpretable explanations are in practice treated as causal explanations without having to pass the appropriate test of scientific validity. The implication is that explanations that have been selected on the grounds of their interpretative appeal are less likely to be scientifically valid than if they had been selected for validity alone.
AN ILLUSTRATIVE EXAMPLE
To illustrate, imagine that we are trying to explain some outcome of interest, say the success of the Beatles. If my colleagues at Microsoft Research were to focus on such a question, they would probably download the number of albums sold by every band ever created along with every conceivable feature of these bands—number and gender of members, instruments played, year and city of formation, and so on—from whatever archives are available online; then they would train some machine-learning model on the data and evaluate its predictive accuracy on a held-out sample. Probably, they would find something like the following: the best performing model has an R2 in the range 0.3–0.4, and the most predictive feature is the presence of a single “star” member who is typically the lead singer—which of course is not the case for the Beatles. In other words, with all the data in the world, all they could say about the success of the Beatles is that they are part of the two-thirds of unexplained variance. Technically there’s nothing wrong with this approach: it is data driven, rigorous, quantitative, and evaluated in terms of the predictions it can make. Unfortunately it’s also deeply unsatisfying: not only does it not “explain” the success of the Beatles in the empathetic (i.e., sense making, interpretable, verstehen) sense of the word; it does not really explain much of anything.
Now compare this explanation with the sort that we would get from a cultural historian (Inglis 2000). Such an explanation might invoke the band members’ unusual musical background in skiffle groups in Liverpool; or their rejection of the conventional wisdom in favor of groups with a single identifiable star; or the newly affluent population of baby boomers coming of age in the United States during the Beatles’ emergence; or John, Paul, Ringo, and George’s evident sexual appeal; or even the assassination of President John F. Kennedy, the resulting pall it cast on the country, and the country’s need of psychic lift from a mop-topped foursome from the United Kingdom. There’s really no contest here: the rich, detailed, historically informed explanation beats the dry, shallow statistical analysis every time. We know that bands are made up of people who have particular backgrounds, skills, and desires; that success depends on these people taking advantage of the particular opportunities that present themselves; and that which opportunities arise in turn depend on the particular social and cultural circumstances that pertained at the time. And so it makes perfect sense that some particular combination of these individual intentions, beliefs, circumstances, and opportunities is what caused this particular outcome. Of course that’s why they succeeded! It all makes sense. In contrast with the dry statistical model, it does explain the outcome we care about in the verstehen sense of understanding. Hence we are naturally drawn to it.
The only problem with this deeply satisfying interpretable explanation is that it cannot be causal. The reason is that causal explanations are valid only inasmuch as they account for the counterfactual— what would have happened to Y (the band’s success) had X (some combination of social, cultural, and personal factors) been absent. And for a unique event like the success of the Beatles there is no counterfactual. Explanations of this sort, in other words, should immediately be rejected as invalid; yet they are extremely common (Mitchell 2004). How can this be so? The reason is that they are not subjected to any test of causal validity in the first place. Instead, the “test” is itself provided by verstehen; that is, by imagining the counterfactual: what the last 50 years of music might have looked like had the baby boomers not come of age at that moment, or had JFK not been assassinated, or had John not met Paul. Because these imagined counterfactuals are generated by the same faculty for mental simulation that generated the explanation in the first place, they strike us as entirely plausible. And so we treat them as if they are legitimate counterfactuals. But as I argued in my article, the combination of the frame problem, the indeterminacy problem, and the outcome problem renders imagined counterfactuals deeply suspect. The result is that “explanations” generated in this manner are in fact not explanations at all—they are just stories that are dressed up to look like explanations, what Mitchell (2004) calls “causal stories.”
EVALUATING EXPLANATIONS WITH PREDICTIONS
Causal stories are not the only problem of course. In a sense they are simply an extreme case of “overfitting”— a well-known problem in statistics and data mining in which too many “features” are invoked to explain too few “outcomes” (Provost and Fawcett 2013). In the case of a causal story, there can be a very large number of features (i.e., all the interesting details) and there is only one outcome, hence the extreme case. Stories are also especially problematic in practice because their sheer plausibility distracts us so effectively from the question of scientific validity that we don’t even apply the concept of overfitting; so we typically don’t even realize that it’s a problem to begin with. But as Turco and Zuckerman correctly point out, overfitting is a potential problem for all forms of explanations, including statistical models, whether they are interpretable or not.
What’s the solution? In “Common Sense and Sociological Explanations,” I suggest forcing explanations to make predictions, where “prediction” is very broadly construed as testing an explanation “out of sample.” Specifically I suggest the following simple and quite general procedure: “(1) construct a ‘model’ based on analysis of cases A, B, C, . . . (2) deploy the model to make a prediction about case X, which is in the same class as A, B, C, . . . but was not used to inform the model itself; (3) check the prediction” (p. 340). One advantage of this procedure is that it can be applied to explanations of many different sorts, including unique historical events (e.g., What does your explanation of this event predict about other comparable events?), but also explanations based on stylized mathematical models (e.g., “How would the distribution of outcomes change in the presence of social influence vs. its absence?), explanations based on ethnographic observations (e.g., What do your observations about this specific environment predict about other similar environments?), and of course explanations based on formal statistical models (e.g., What is your model’s out-of-sample performance?).
Once the notion of prediction is sufficiently broadly construed—as out of sample testing, allowing both for probabilistic predictions and for predictions about stylized facts or patterns of outcomes—then my claim that purportedly causal explanations should at a minimum be required to make testable predictions does not seem so controversial, nor do Turco and Zuckerman seem to have a problem with it. What they do have a problem with is my further claim that applying such a test to sociological explanations would result in less satisfying explanations. To the contrary, they argue that interpretatively satisfying explanations are actually more likely to be predictive. It’s an appealing thought, but I don’t think it’s right for two reasons. First, absent rigorous testing, our instinct for interpretability almost always biases us toward explanations that suffer from overfitting. And second, the more data that we have to test our explanations against, the more difficult it will be to avoid such tests. Putting these two observations together, I predict that as sociologists are increasingly required to guard against overfitting, interpretatively satisfying explanations will increasingly be rejected; hence the less we will be able to “explain” in the verstehen (i.e., empathetic) sense. Indeed, I predict that if we are being honest with ourselves we will increasingly have to concede that many why questions of interest—Why did the Beatles succeed or Why did Enron fail? or What caused the global financial crisis?—are not directly answerable at all (see also Gelman and Imbens 2013). Nothing could be less satisfying than that.
VERSTEHEN IS AN ESSENTIAL BUT EASILY MISUSED TOOL
None of this is to say that verstehen is a “fruitless diversion”—as I mentioned at the outset (and in the article under discussion here) I don’t see any way of doing sociology that does not rely on some form of verstehen as a means of generating hypotheses. To illustrate, let me take the example that Turco and Zuckerman bring up from my own work on small-world networks, which they criticize for lacking vertsehen. While not disputing for a moment that the particular model that Steve Strogatz and I proposed has serious limitations as a representation of real social networks—in part because it was never intended to be a literal representation of real social networks (see, e.g., Watts 2003, pp. 84–91)—the motivation for the model did draw on verstehen. For example, the idea of shortcuts came not from reading Granovetter’s masterpiece on weak ties but from contemplating my own experience in moving to Cornell for graduate school, and the effect it had on the network distance between my “old” friends in Australia and my “new” friends in Ithaca. Likewise, the idea—embodied in subsequent generations of small-world models (Watts, Dodds, and Newman 2002)—that networks could have short path lengths even in the absence of shortcuts came from thinking about how individuals could themselves act as bridges between otherwise distant groups: by, for example, being “close” to one group geographically while close to another professionally. It is not much of an exaggeration, in fact, to say that the whole modeling process was an extended exercise in converting verstehen into mathematics.
Once each model was specified, however, the exercise of exploring its properties was essentially a mathematical one. There is nothing particularly intuitive about the result that, as the number of individuals N in a network becomes increasingly large, an ever smaller fraction of shortcuts can cause the path length to transform from a linear function of N to a logarithmic function (Newman and Watts 1999). There is no verstehen here, just a mathematical result. Likewise, you might wonder—as Peter Dodds and I did (Watts 2002; Dodds and Watts 2004, 2005)—how things might spread differently if the effect of one contact with an “infected” other depended on the presence or absence of previous interactions. Once again, verstehen helps generate the hypothesis, but once the difference between what Centola and Macy (2007) subsequently called “simple” and “complex” contagion is formalized mathematically—the former assumes independence between infection probabilities whereas the latter assumes a strong dependency—the consequences can be understood only through mathematical modeling and simulation.
A similar point can be made about Turco and Zuckerman’s claim that verstehen is the key to deciding when to apply Granovetter’s threshold model of collective action versus Lieberson’s “taste for popularity” (TFP) model. Their argument starts from the observation that, whereas in Granovetter’s model the probability of adoption is a monotone increasing function of the number of other adopters, in Lieberson’s model it is nonmonotonic, first increasing (when the object is relatively unpopular) and then decreasing (once it has become “too popular”). Their question, then, is under what circumstances should each model apply? To answer the question they simply “apply verstehen and place ourselves in the position of the individuals whose collective behavior we are trying to model.” Without further effort they conclude that TFP is appropriate within the domain of cultural expression, whereas the threshold model should be applied in the context of public goods games.
If only this trick worked, then sociology would be much easier; but it does not. For example, some relatively simple mathematical analysis quickly demonstrates that public goods games can have nonmonotonic influence functions: depending on the particular shape of the production function, in fact, they can exhibit up-thresholds, down-thresholds, or up-and-down thresholds (Lopez-Pintado and Watts 2008). Naturally choices in cultural markets depend on different underlying assumptions about the decision process, but here also one can show that all three types of thresholds can arise, depending on the details (Lopez-Pintado and Watts 2008). Mathematically, these are not particularly deep results, but they flatly contradict the conclusion that Turco and Zuckerman casually intuit using their verstehen. Second, having determined (from the math) that the monotonicity of the threshold function depends on certain details of the underlying decision calculus (e.g., the shape of the production function), the question of which model to apply in a particular setting reduces to the question of estimating these details. But this is now an empirical question—again not one that can be resolved with unaided intuition. Unfortunately the empirical question has not yet been answered, so we do not yet know which model to apply in which circumstances. What we do know—or should know—by now, however, is that this is precisely not the sort of question that can be answered just by imagining what it would be like to be there.
I could make a similar argument about MusicLab, but I have belabored the point enough: verstehen, or whatever you want to call it—intuition, experience, thinking about why people do what they do—is an indispensable tool for hypothesizing about social action, processes, and outcomes. Regardless of whether the hypothesizing in question manifests itself as formulating assumptions in mathematical models, or as specifying the sign and significance of regression coefficients, or even as deciding what to look and listen for in an ethnographic study, verstehen clearly plays a useful and indeed, I would argue, unavoidable role. Where it gets us into trouble is when it is also used to establish the validity of the hypotheses that it generates—for that, something else is needed. In “Common Sense and Sociological Thinking,” I advocated for out-of-sample testing because I think it is conceptually simple and has very general applicability. But other methods of validating one’s hypotheses—whether experiments or natural experiments or simulations or follow-up field visits—could also help. Regardless, the point is that plausibility itself is not a reliable metric of scientific validity.
In fairness, Turco and Zuckerman at times come close to making the same point—for example, when they say that “the verstehen mode of sociological inquiry can be complementary with the causal mode of inquiry.” Moreover, some of the studies they cite do indeed attempt to evaluate hypotheses in an out-of-sample manner. So, to repeat my opening assertion, I think there is less disagreement between us than they believe. Certainly I could not agree more with their final call to action: we must struggle to “figure out ‘what the devil is going on,’ no matter how hard it is to answer that short, deceptively simple question.” Where I think we disagree is that I believe it is even harder to answer than they do. As some of their own attempts to demonstrate the efficacy of verstehen unintentionally illustrate, in the absence of some other test of validity it is all too easy to persuade oneself that one’s plausible answer is also correct.
My prediction, therefore, is that as sociology becomes increasingly data rich, and as sociologists become increasingly familiar with the methods of causal inference and out-of-sample testing, it will become increasingly clear that many interpretatively satisfying explanations are really just stories. Conversely it will become clear that the explanations that survive rigorous testing are not as satisfying as the stories to which we have become accustomed. And at that point,
I predict that sociologists will have to choose between story telling and science. Turco and Zuckerman, in contrast, predict that no such choice will be forced upon us; that in fact the most rigorous explanations will also be the most interpretatively satisfying. I think they are wrong about that, but that is the nice thing about predictions: we shall see.
1 Turco and Zuckerman note that I don’t discuss ethnography, and infer from the absence that I think ethnography has no place in sociological inquiry. In fact, my failure to mention ethnography implies nothing of the sort. Done well, I think ethnography is just as capable as any other method of generating data and insight. Like other methods, it is better suited to some applications than to others, and like other methods, when done poorly it can easily degenerate into story telling dressed up as explanation. But in principle I don’t think ethnography is a better or worse method than mathematical modeling or big data analysis or lab experiments. I didn’t mention it because it didn’t seem relevant to the argument I was making.
Centola, D., and M. Macy. 2007. “Complex Contagions and the Weakness of Long Ties.” American Journal of Sociology 113 (3): 702–34.
Dodds, P. S., and D. J. Watts. 2004. “Universal Behavior in a Generalized Model of Contagion.” Physical Review Letters 92:218701.
———. 2005. “A Generalized Model of Social and Biological Contagion.” Journal of Theoretical Biology 232 (4): 587–604.
Gelman, Andrew, and Guido Imbens. 2013. “Why Ask Why? Forward Causal Inference and Reverse Causal Questions.” National Bureau of Economic Research.
Inglis, Ian. 2000. “‘The Beatles Are Coming!’ Conjecture and Conviction in the Myth of Kennedy, America, and the Beatles.” Popular Music and Society 24 (2): 93–108.
Lopez-Pintado, Dunia, and Duncan J. Watts. 2008. “Social Influence, Binary Decisions and Collective Dynamics.” Rationality and Society 20 (4): 399–443.
Mitchell, Gregory. 2004. “Case Studies, Counterfactuals, and Causal Explanations.” University of Pennsylvania Law Review 152 (5): 1517–1608.
Newman, M. E. J., and D. J. Watts. 1999. “Scaling and Percolation in the Small-World Network Model.” Physical Review E 60 (6): 7332–42.
Provost, Foster, and Tom Fawcett. 2013. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. New York: O’Reilly Media. Turco, Catherine, and Ezra Zuckerman. 2017. “Verstehen for Sociology: Comment on Watts.” American Journal of Sociology 122 (4): 1272–91.
Watts, D. J., P. S. Dodds, and M. E. J. Newman. 2002. “Identity and Search in Social Networks.” Science 296 (5571): 1302–5.
Watts, Duncan J. 2002. “A Simple Model of Information Cascades on Random Networks.” Proceedings of the National Academy of Science, USA 99:5766–71.
———. 2003. Six Degrees: The Science of a Connected Age. New York: W. W. Norton.