a2 Department of Psychology, University of British Columbia, Vancouver V6T 1Z4, Canada. email@example.com
a3 Department of Psychology, University of British Columbia, Vancouver V6T 1Z4, Canada. firstname.lastname@example.org
Behavioral scientists routinely publish broad claims about human psychology and behavior in the world's top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers – often implicitly – assume that either there is little variation across human populations, or that these “standard subjects” are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior – hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.
Joseph Henrich holds the Canada Research Chair in Culture, Cognition, and Evolution at the University of British Columbia, where he is appointed Professor in both Economics and Psychology. His theoretical work focuses on how natural selection has shaped human learning and how this in turn influences cultural evolution, and culture-gene coevolution. Methodologically, his research synthesizes experimental and analytical tools drawn from behavioural economics and psychology with in-depth quantitative ethnography, and he has performed long-term fieldwork in the Peruvian Amazon, rural Chile, and in Fiji. Trained in anthropology, Dr. Henrich's work has been published in the top journals in biology, anthropology, and economics. In 2004 he was awarded the Presidential Early Career Award, the highest award bestowed by the United States upon scientists early in their careers. In 2007 he co-authored Why Humans Cooperate. In 2009 the Human Behavior and Evolution Society awarded him their Early Career Award for Distinguished Scientific Contributions.
Ara Norenzayan is an Associate Professor of Psychology at the University of British Columbia, Vancouver. He received his Ph.D. from the University of Michigan in 1999, was a postdoctoral fellow at the Ecole Polytechnique, Paris, and served on the faculty of the University of Illinois, Urbana-Champaign before his appointment at UBC. His most recent work addresses the evolution of religious beliefs and behaviors.
Steven J. Heine is Professor of Psychology and Distinguished University Scholar at the University of British Columbia. Much of his work has focused on how culture shapes people's self-concepts, particularly their motivations for self-esteem. Dr. Heine has received the Early Career Award from the International Society of Self and Identity and the Distinguished Scientist Early Career Award for Social Psychology from the American Psychological Association. He is the author of a textbook entitled Cultural Psychology, published in 2008.
List of Figures and Tables
Figure 1. The Müller-Lyer illusion. The lines labeled “a” and “b” are the same length. Many subjects perceive line “b” as longer than line “a”.
Figure 2. Müller-Lyer results for Segall et al.'s (1966) cross-cultural project. PSE (point of subjective equality) is the percentage that segment a must be longer than b before subjects perceived the segments as equal in length. Children were sampled in the 5-to-11 age range.
Figure 3. Behavioral measures of fairness and punishment from the Dictator and Ultimatum Games for 15 societies (Phase II). Figures 3A and 3B show mean offers for each society in the Dictator and Ultimatum Games, respectively. Figure 3C gives the income-maximizing offer (IMO) for each society.
Figure 4. Mean punishment expenditures from each sample for a given deviation from the punisher's contribution to the public good. The deviations of the punished subject's contribution from the punisher's contribution are grouped into five intervals, where [-20,-11] indicates that the punished subjects contributed between 11 and 20 less than the punishing subject;  indicates that the punished subject contributed exactly the same amount as the punishing subject; and [1,10] ([11,20]) indicates that the punished subject contributed between 1 and 10 (11 and 20) more than the punishing subject. Adapted from Herrmann et al. (2008).
Figure 5. Relative dominance of rule-based versus family resemblance–based judgments of categories for the same cognitive task. European-American, Asian-American, and East Asian university students were tested by Norenzayan et al. (2002b); the herders, fishermen, and farmers of Turkey's Black Sea coast were tested by Uskul et al. (2008). Positive scores indicate a relative bias towards rule-based judgments, whereas negative scores indicate a relative bias towards family resemblance–based judgments. It can be seen that European-American students show the most pronounced bias toward rule-based judgments, and they are outliers in terms of absolute deviation from zero. Adapted from Norenzayan et al. (2002b) and Uskul et al. (2008).
In the tropical forests of New Guinea, the Etoro believe that for a boy to achieve manhood he must ingest the semen of his elders. This is accomplished through ritualized rites of passage that require young male initiates to fellate a senior member (Herdt 1984/1993; Kelley 1980). In contrast, the nearby Kaluli maintain that male initiation is only properly done by ritually delivering the semen through the initiate's anus, not his mouth. The Etoro revile these Kaluli practices, finding them disgusting. To become a man in these societies, and eventually take a wife, every boy undergoes these initiations. Such boy-inseminating practices, which are enmeshed in rich systems of meaning and imbued with local cultural values, were not uncommon among the traditional societies of Melanesia and Aboriginal Australia (Herdt 1984/1993), as well as in Ancient Greece and Tokugawa Japan.
Such in-depth studies of seemingly “exotic” societies, historically the province of anthropology, are crucial for understanding human behavioral and psychological variation. However, this target article is not about these peoples. It is about a truly unusual group: people from Western, Educated, Industrialized, Rich, and Democratic (WEIRD)1 societies. In particular, it is about the Western, and more specifically American, undergraduates who form the bulk of the database in the experimental branches of psychology, cognitive science, and economics, as well as allied fields (hereafter collectively labeled the “behavioral sciences”). Given that scientific knowledge about human psychology is largely based on findings from this subpopulation, we ask just how representative are these typical subjects in light of the available comparative database. How justified are researchers in assuming a species-level generality for their findings? Here, we review the evidence regarding how WEIRD people compare with other populations.
We pursued this question by constructing an empirical review of studies involving large-scale comparative experimentation on important psychological or behavioral variables. Although such larger-scale studies are highly informative, they are rather rare, especially when compared to the frequency of species-generalizing claims. When such comparative projects were absent, we relied on large assemblies of studies comparing two or three populations, and, when available, on meta-analyses.
Of course, researchers do not implicitly assume psychological or motivational universality with everything they study. The present review does not address those phenomena assessed by individual difference measures for which the guiding assumption is variability among populations. Phenomena such as personal values, emotional expressiveness, and personality traits are expected a priori to vary across individuals, and by extension, societies. Indeed, the goal of much research on these topics is to identify the ways that people and societies differ from one another. For example, a number of large projects have sought to map out the world on dimensions such as values (Hofstede 2001; Inglehart et al. 1998; Schwartz & Bilsky 1990), personality traits (e.g., McCrae et al. 2005; Schmitt et al. 2007), and levels of happiness, (e.g., Diener et al. 1995). Similarly, we avoid the vast psychopathology literature, which finds much evidence for both variability and universality in psychological pathologies (Kleinman 1988; Tseng 2001), because this work focuses on individual-level (and unusual) variations in psychological functioning. Instead, we restrict our exploration to those domains which have largely been assumed, at least until recently, to be de facto psychological universals.
Finally, we also do not address societal-level behavioral universals, or claims thereof, related to phenomena such as dancing, fire making, cooking, kinship systems, body adornment, play, trade, and grammar, for two reasons. First, at this surface level alone, such phenomena do not make specific claims about universal underlying psychological or motivational processes. Second, systematic, quantitative, comparative data based on individual-level measures are typically lacking for these domains.
Our examination of the representativeness of WEIRD subjects is necessarily restricted to the rather limited database currently available. We have organized our presentation into a series of telescoping contrasts showing, at each level of contrast, how WEIRD people measure up relative to the available reference populations. Our first contrast compares people from modern industrialized societies with those from small-scale societies. Our second telescoping stage contrasts people from Western societies with those from non-Western industrialized societies. Next, we contrast Americans with people from other Western societies. Finally, we contrast university-educated Americans with non–university-educated Americans, or university students with non-student adults, depending on the available data. At each level we discuss behavioral and psychological phenomena for which there are available comparative data, and we assess how WEIRD people compare with other samples.
We emphasize that our presentation of telescoping contrasts is only a rhetorical approach guided by the nature of the available data. It should not be taken as capturing any unidimensional continuum, or suggesting any single theoretical explanation for the variation. Throughout this article we take no position regarding the substantive origins of the observed differences between populations. While many of the differences are probably cultural in nature in that they were socially transmitted (Boyd & Richerson 1985; Nisbett et al. 2001), other differences are likely environmental and represent some form of non-cultural phenotypic plasticity, which may be developmental or facultative, as well as either adaptive or maladaptive (Gangestad et al. 2006; Tooby & Cosmides 1992). Other population differences could arise from genetic variation, as observed for lactose processing (Beja-Pereira et al. 2003). Regardless of the reasons underlying these population differences, our concern is whether researchers can reasonably generalize from WEIRD samples to humanity at large.
Many radical versions of interpretivism and cultural relativity deny any shared commonalities in human psychologies across populations (e.g., Gergen 1973; see critique and discussion in Slingerland 2008, Ch. 2). To the contrary, we expect humans from all societies to share, and probably share substantially, basic aspects of cognition, motivation, and behavior. As researchers who see great value in applying evolutionary thinking to psychology and behavior, we have little doubt that if a full accounting were taken across all domains among peoples past and present, the number of similarities would indeed be large, as much ethnographic work suggests (e.g., Brown 1991) – ultimately, of course, this is an empirical question. Thus, our thesis is not that humans share few basic psychological properties or processes; rather, we question our current ability to distinguish these reliably developing aspects of human psychology from more developmentally, culturally, or environmentally contingent aspects of our psychology given the disproportionate reliance on WEIRD subjects. Our aim here, then, is to inspire efforts to place knowledge of such universal features of psychology on a firmer footing by empirically addressing, rather than a priori dismissing or ignoring, questions of population variability.
Before commencing with our telescoping contrasts, we first discuss two observations regarding the existing literature: (1) The database in the behavioral sciences is drawn from an extremely narrow slice of human diversity; and (2) behavioral scientists routinely assume, at least implicitly, that their findings from this narrow slice generalize to the species.
Who are the people studied in behavioral science research? A recent analysis of the top journals in six subdisciplines of psychology from 2003 to 2007 revealed that 68% of subjects came from the United States, and a full 96% of subjects were from Western industrialized countries, specifically those in North America and Europe, as well as Australia and Israel (Arnett 2008). The make-up of these samples appears to largely reflect the country of residence of the authors, as 73% of first authors were at American universities, and 99% were at universities in Western countries. This means that 96% of psychological samples come from countries with only 12% of the world's population.
Even within the West, however, the typical sampling method for experimental studies is far from representative. In the Journal of Personality and Social Psychology, the premier journal in social psychology – the subdiscipline of psychology that should (arguably) be the most attentive to questions about the subjects' backgrounds – 67% of the American samples (and 80% of the samples from other countries) were composed solely of undergraduates in psychology courses (Arnett 2008). In other words, a randomly selected American undergraduate is more than 4,000 times more likely to be a research participant than is a randomly selected person from outside of the West. Furthermore, this tendency to rely on undergraduate samples has not decreased over time (Peterson 2001; Wintre et al. 2001). Such studies are therefore sampling from a rather limited subpopulation within each country (see Rozin 2001).
It is possible that the dominance of American authors in psychology publications just reflects that American universities have the resources to attract the best international researchers, and that similar tendencies exist in other fields. However, psychology is a distinct outlier here: 70% of all psychology citations come from the United States – a larger percentage than any of the other 19 sciences that were compared in one extensive international survey (see May 1997). In chemistry, by contrast, the percentage of citations that come from the United States is only 37%. It seems problematic that the discipline in which there are the strongest theoretical reasons to anticipate population-level variation is precisely the discipline in which the American bias for research is most extreme.
Beyond psychology and cognitive science, the subject pools of experimental economics and decision science are not much more diverse – still largely dominated by Westerners, and specifically Western undergraduates. However, to give credit where it is due, the nascent field of experimental economics has begun taking steps to address the problem of narrow samples.2
In sum, the available database does not reflect the full breadth of human diversity. Rather, we have largely been studying the nature of WEIRD people, a certainly narrow and potentially peculiar subpopulation.
Sampling from a thin slice of humanity would be less problematic if researchers confined their interpretations to the populations from which they sampled. However, despite their narrow samples, behavioral scientists often are interested in drawing inferences about the human mind and human behavior. This inferential step is rarely challenged or defended – with important exceptions (e.g., Medin & Atran 2004; Rozin 2001; Triandis 1994; Witkin & Berry 1975) – despite the lack of any general effort to assess how well results from WEIRD samples generalize to the species. This lack of epistemic vigilance underscores the prevalent, though implicit, assumption that the findings one derives from a particular sample will generalize broadly; one adult human sample is pretty much the same as the next.
Leading scientific journals and university textbooks routinely publish research findings claiming to generalize to “humans” or “people” based on research done entirely with WEIRD undergraduates. In top journals such as Nature and Science, researchers frequently extend their findings from undergraduates to the species – often declaring this generalization in their titles. These contributions typically lack even a cautionary footnote about these inferential extensions.
In psychology, much of this generalization is implicit. A typical article does not claim to be discussing “humans” but will rather simply describe a decision bias, psychological process, set of correlations, and so on, without addressing issues of generalizability, although findings are often linked to “people.” Commonly, there is no demographic information about the participants, aside from their age and gender. In recent years there is a trend to qualify some findings with disclaimers such as “at least within Western culture,” though there remains a robust tendency to generalize to the species. Arnett (2008) notes that psychologists would surely bristle if journals were renamed to more accurately reflect the nature of their samples (e.g., Journal of Personality and Social Psychology of American Undergraduate Psychology Students). They would bristle, presumably, because they believe that their findings generalize much beyond this sample. Of course, there are important exceptions to this general tendency, as some researchers have assembled a broad database to provide evidence for universality (Buss 1989; Daly & Wilson 1988; Ekman 1999b; Elfenbein & Ambady 2002; Kenrick & Keefe 1992a; Tracy & Matsumoto 2008).
When is it safe to generalize from a narrow sample to the species? First, if one had good empirical reasons to believe that little variability existed across diverse populations in a particular domain, it would be reasonable to tentatively infer universal processes from a single subpopulation. Second, one could make an argument that as long as one's samples were drawn from near the center of the human distribution, then it would not be overly problematic to generalize across the distribution more broadly – at least the inferred pattern would be in the vicinity of the central tendency of our species. In the following, with these assumptions in mind, we review the evidence for the representativeness of findings from WEIRD people.
Our theoretical perspective, which is informed by evolutionary thinking, leads us to suspect that many aspects of people's psychological repertoire are universal. However, the current empirical foundations for our suspicions are rather weak because the database of comparative studies that include small-scale societies is scant, despite the obvious importance of such societies in understanding both the evolutionary history of our species and the potential impact of diverse environments on our psychology. Here we first discuss the evidence for differences between populations drawn from industrialized and small-scale societies in some seemingly basic psychological domains, and follow this with research indicating universal patterns across this divide.
Many readers may suspect that tasks involving “low-level” or “basic” cognitive processes such as vision will not vary much across the human spectrum (Fodor 1983). However, in the 1960s an interdisciplinary team of anthropologists and psychologists systematically gathered data on the susceptibility of both children and adults from a wide range of human societies to five “standard illusions” (Segall et al. 1966). Here we highlight the comparative findings on the famed Müller-Lyer illusion, because of this illusion's importance in textbooks, and its prominent role as Fodor's indisputable example of “cognitive impenetrability” in debates about the modularity of cognition (McCauley & Henrich 2006). Note, however, that population-level variability in illusion susceptibility is not limited to the Müller-Lyer illusion; it was also found for the Sander-Parallelogram and both Horizontal-Vertical illusions.
Segall et al. (1966) manipulated the length of the two lines in the Müller-Lyer illusion (Fig. 1) and estimated the magnitude of the illusion by determining the approximate point at which the two lines were perceived as being of the same length. Figure 2 shows the results from 16 societies, including 14 small-scale societies. The vertical axis gives the “point of subjective equality” (PSE), which measures the extent to which segment “a” must be longer than segment “b” before the two segments are judged equal in length. PSE measures the strength of the illusion.
Müller-Lyer results for Segall et al.'s (1966) cross-cultural project. PSE (point of subjective equality) is the percentage that segment a must be longer than b before subjects perceived the segments as equal in length. Children were sampled in the 5-to-11 age range.
The results show substantial differences among populations, with American undergraduates anchoring the extreme end of the distribution, followed by the South African-European sample from Johannesburg. On average, the undergraduates required that line “a” be about a fifth longer than line “b” before the two segments were perceived as equal. At the other end, the San foragers of the Kalahari were unaffected by the so-called illusion (it is not an illusion for them). While the San's PSE value cannot be distinguished from zero, the American undergraduates' PSE value is significantly different from all the other societies studied.
As discussed by Segall et al., these findings suggest that visual exposure during ontogeny to factors such as the “carpentered corners” of modern environments may favor certain optical calibrations and visual habits that create and perpetuate this illusion. That is, the visual system ontogenetically adapts to the presence of recurrent features in the local visual environment. Because elements such as carpentered corners are products of particular cultural evolutionary trajectories, and were not part of most environments for most of human history, the Müller-Lyer illusion is a kind of culturally evolved by-product (Henrich 2008).
These findings highlight three important considerations. First, this work suggests that even a process as apparently basic as visual perception can show substantial variation across populations. If visual perception can vary, what kind of psychological processes can we be sure will not vary? It is not merely that the strength of the illusory effect varies across populations – the effect cannot be detected in two populations. Second, both American undergraduates and children are at the extreme end of the distribution, showing significant differences from all other populations studied; whereas, many of the other populations cannot be distinguished from one another. Since children already show large population-level differences, it is not obvious that developmental work can substitute for research across diverse human populations. Children likely have different developmental trajectories in different societies. Finally, this provides an example of how population-level variation can be useful for illuminating the nature of a psychological process, which would not be as evident in the absence of comparative work.
By the mid-1990s, researchers were arguing that a set of robust experimental findings from behavioral economics were evidence for a set of evolved universal motivations (Fehr & Gächter 1998; Hoffman et al. 1998). Foremost among these experiments, the Ultimatum Game provides a pair of anonymous subjects with a sum of real money for a one-shot interaction. One of the pair – the proposer – can offer a portion of this sum to the second subject, the responder. Responders must decide whether to accept or reject the offer. If a responder accepts, she gets the amount of the offer and the proposer takes the remainder; if she rejects, both players get zero. If subjects are motivated purely by self-interest, responders should always accept any positive offer; knowing this, a self-interested proposer should offer the smallest non-zero amount. Among subjects from industrialized populations – mostly undergraduates from the United States, Europe, and Asia – proposers typically offer an amount between 40% and 50% of the total, with a modal offer of 50% (Camerer 2003). Offers below about 30% are often rejected.
With this seemingly robust empirical finding in their sights, Nowak et al. (2000) constructed an evolutionary analysis of the Ultimatum Game. When they modeled the Ultimatum Game exactly as played, they did not get results matching the undergraduate findings. However, if they added reputational information, such that players could know what their partners did with others on previous rounds of play, the analysis predicted offers and rejections in the range of typical undergraduate responses. They concluded that the Ultimatum Game reveals humans' species-specific evolved capacity for fair and punishing behavior in situations with substantial reputational influence. But, since the Ultimatum Game is typically played one-shot without reputational information, Nowak et al. argued that people make fair offers and reject unfair offers because their motivations evolved in a world where such interactions were not fitness relevant – thus, we are not evolved to fully incorporate the possibility of non-reputational action in our decision-making, at least in such artificial experimental contexts.
Recent comparative work has dramatically altered this initial picture. Two unified projects (which we call Phase 1 and Phase 2) have deployed the Ultimatum Game and other related experimental tools across thousands of subjects randomly sampled from 23 small-scale human societies, including foragers, horticulturalists, pastoralists, and subsistence farmers, drawn from Africa, Amazonia, Oceania, Siberia, and New Guinea (Henrich et al. 2005; 2006; 2010). Three different experimental measures show that people in industrialized societies consistently occupy the extreme end of the human distribution. Notably, people in some of the smallest-scale societies, where real life is principally face-to-face, behaved in a manner reminiscent of Nowak et al.'s analysis before they added the reputational information. That is, these populations made low offers and did not reject.
To concisely present these diverse empirical findings, we show results only from the Ultimatum and Dictator Games in Phase II. The Dictator Game is the same as the Ultimatum Game except that the second player cannot reject the offer. If subjects are motivated purely by self-interest, they would offer zero in the Dictator Game. Thus, Dictator Game offers yield a measure of “fairness” (equal divisions) among two anonymous people. By contrast, Ultimatum Game offers yield a measure of fairness combined with an assessment of the likelihood of rejection (punishment). Rejections of offers in the Ultimatum Game provide a measure of people's willingness to punish unfairness.
Using aggregate measures, Figure 3 shows that the behavior of the U.S. adult (non-student) sample occupies the extreme end of the distribution in each case. For Dictator Game offers, Figure 3A shows that the U.S. sample has the highest mean offer, followed by the Sanquianga from Colombia, who are renowned for their prosociality (Kraul 2008). The U.S. offers are nearly double that of the Hadza, foragers from Tanzania, and the Tsimane, forager-horticulturalists from the Bolivian Amazon. Figure 3B shows that for Ultimatum Game offers, the United States has the second highest mean offer, behind the Sursurunga from Papua New Guinea. On the punishment side in the Ultimatum Game, Figure 3C shows the income-maximizing offers (IMO) for each population, which is a measure of the population's willingness to punish inequitable offers. IMO is the offer that an income-maximizing proposer would make if he knew the probability of rejection for each of the possible offer amounts. The U.S. sample is tied with the Sursurunga. These two groups have an IMO five times higher than 70% of the other societies. While none of these measures indicates that people from industrialized societies are entirely unique vis-à-vis other populations, they do show that people from industrialized societies consistently occupy the extreme end of the human distribution.
Behavioral measures of fairness and punishment from the Dictator and Ultimatum Games for 15 societies (Phase II). Figures 3A and 3B show mean offers for each society in the Dictator and Ultimatum Games, respectively. Figure 3C gives the income-maximizing offer (IMO) for each society.
Analyses of these data show that a population's degree of market integration and its participation in a world religion both independently predict higher offers, and account for much of the variation between populations. Community size positively predicts greater punishment (Henrich et al. 2010). The authors suggest that norms and institutions for exchange in ephemeral interactions culturally coevolved with markets and expanding larger-scale sedentary populations. In some cases, at least in their most efficient forms, neither markets nor large populations were feasible before such norms and institutions emerged. That is, it may be that what behavioral economists have been measuring among undergraduates in such games is a specific set of social norms, culturally evolved for dealing with money and strangers, that have emerged since the origins of agriculture and the rise of complex societies.
In addition to differences in populations' willingness to reject offers that are too low, the evidence also indicates a willingness to reject offers that are too high in about half the societies studied. This tendency to reject so-called hyper-fair offers rises as offers increase from 60% to 100% of the stake (Henrich et al. 2006). This phenomenon, which is not observed in typical undergraduate subjects (who essentially never reject offers greater than half), has now emerged among populations in Russia (Bahry & Wilson 2006) and China (Hennig-Schmidt et al. 2008), as well as (to a lesser degree) among non-student adults in Sweden (Wallace et al. 2007), Germany (Guth et al. 2003), and the Netherlands (Bellemare et al. 2008). Attempts to explain away this phenomenon as a consequence of confusion or misunderstanding, have not found support despite substantial efforts.
Suppose that Nowak and his coauthors were Tsimane, and that the numerous empirical findings they had on hand were all from Tsimane villages. If this were the case, presumably these researchers would have simulated the Ultimatum Game and found that there was no need to add reputation to their model. This unadorned evolutionary solution would have worked fine until they realized that the Tsimane are not representative of humanity. According to the above data, the Tsimane are about as representative of the species as are Americans, but at the opposite end of the spectrum. If the database of the behavioral sciences consisted entirely of Tsimane subjects, researchers would likely be quite concerned about generalizability.
Recent work in small-scale societies suggests that some of the central conclusions regarding the development and operation of human folkbiological categorization, reasoning, and induction are limited to urban subpopulations of non-experts in industrialized societies. Although much more work needs to be done, it appears that typical subjects (children of WEIRD parents) develop their folkbiological reasoning in a culturally and experientially impoverished environment, by contrast to those of small-scale societies (and of our evolutionary past), distorting both the species-typical pattern of cognitive development and the patterns of reasoning in WEIRD adults.
Cognitive scientists using (as subjects) children drawn from U.S. urban centers – often those surrounding universities – have constructed an influential, though actively debated, developmental theory in which folkbiological reasoning emerges from folkpsychological reasoning. Before age 7, urban children reason about biological phenomena by analogy to, and by extension from, humans. Between ages 7 and 10, urban children undergo a conceptual shift to the adult pattern of viewing humans as one animal among many. These conclusions are underpinned by three robust findings from urban children: (1) Inferential projections of properties from humans are stronger than projections from other living kinds; (2) inferences from humans to mammals emerge as stronger than inferences from mammals to humans; and (3) children's inferences violate their own similarity judgments by, for example, providing stronger inference from humans to bugs than from bugs to bees (Carey 1985; 1995).
However, when the folkbiological reasoning of children in rural Native American communities in Wisconsin and Yukatek Maya communities in Mexico was investigated (Atran et al. 2001; Ross et al. 2003; Waxman & Medin 2007) none of these three empirical patterns emerged. Among the American urban children, the human category appears to be incorporated into folkbiological induction relatively late compared to these other populations. The results indicate that some background knowledge of the relevant species is crucial for the application and induction across a hierarchical taxonomy (Atran et al. 2001). In rural environments, both exposure to and interest in the natural world is commonplace, unavoidable, and an inevitable part of the enculturation process. This suggests that the anthropocentric patterns seen in U.S. urban children result from insufficient cultural input and a lack of exposure to the natural world. The only real animal that most urban children know much about is Homo sapiens, so it is not surprising that this species dominates their inferential patterns. Since such urban environments are highly “unnatural” from the perspective of human evolutionary history, any conclusions drawn from subjects reared in such informationally impoverished environments must remain rather tentative. Indeed, studying the cognitive development of folkbiology in urban children would seem the equivalent of studying “normal” physical growth in malnourished children.
This deficiency of input likely underpins the fact that the basic-level folkbiological categories for WEIRD adults are life-form categories (e.g., bird, fish, and mammal), and these are also the first categories learned by WEIRD children – for example, if one says “What's that?” (pointing at a maple tree), their common answer is “tree.” However, in all small-scale societies studied, the generic species (e.g., maple, crow, trout, and fox) is the basic-level category and the first learned by children (Atran 1993; Berlin 1992).
Impoverished interactions with the natural world may also distort assessments of the typicality of natural kinds in categorization. The standard conclusion from American undergraduate samples has been that goodness of example, or typicality, is driven by similarity relations. A robin is a typical bird because this species shares many of the perceptual features that are commonly found in the category BIRD. In the absence of close familiarity with natural kinds, this is the default strategy of American undergraduates, and psychology has assumed it is the universal pattern. However, in samples which interact with the natural world regularly, such as Itza Maya villagers, typicality is based not on similarity but on knowledge of cultural ideals, reflecting the symbolic or material significance of the species in that culture. For the Itza, the wild turkey is a typical bird because of its rich cultural significance, even though it is in no way most similar to other birds. The same pattern holds for similarity effects in inductive reasoning – WEIRD people make strong inferences from computations of similarity, whereas populations with greater familiarity with the natural world, despite their capacity for similarity-based inductions, prefer to make strong inferences from folkbiological knowledge that takes into account ecological context and relationships among species (Atran et al. 2005). In general, research suggests that what people think about can affect how they think (Bang et al. 2007). To the extent that there is population-level variability in the content of folkbiological beliefs, such variability affects cognitive processing in this domain as well.
So far we have emphasized differences in folkbiological cognition uncovered by comparative research. This same work has also uncovered reliably developing aspects of human folkbiological cognition that do not vary, such as categorizing plants and animals in a hierarchical taxonomy, or that the generic species level has the strongest inductive potential, despite the fact that this level is not always the basic level across populations, as discussed above. Our goal in emphasizing the differences here is to show (1) how peculiar industrialized (urban, in this case) samples are, given the unprecedented environment they grow up in; and (2) how difficult it is to conclude a priori what aspects will be reliably developing and robust across diverse slices of humanity if research is largely conducted with WEIRD samples.
Human societies vary in their linguistic tools for, and cultural practices associated with, representing and communicating (1) directions in physical space, (2) the color spectrum, and (3) integer amounts. There is some evidence that each of these differences in cultural content may influence some aspects of nonlinguistic cognitive processes (D'Andrade 1995; Gordon 2004; Kay 2005; Levinson 2003; Roberson et al. 2000). Here we focus on spatial cognition, for which the evidence is most provocative. As above, it appears that industrialized societies are at the extreme end of the continuum in spatial cognition. Human populations show differences in how they think about spatial orientation and deal with directions, and these differences may be influenced by linguistically based spatial reference systems.
Speakers of English and other Indo-European languages favor the use of an egocentric (relative) system to represent the location of objects – that is, relative to the self (e.g., “the man is on the right side of the flagpole”). In contrast, many if not most languages favor an allocentric frame, which comes in two flavors. Some allocentric languages such as Guugu Yimithirr (an Australian language) and Tzeltal (a Mayan language) favor a geocentric system in which absolute reference is based on cardinal directions (“the man is west of the house”). The other allocentric frame is an object-centered (intrinsic) approach that locates objects in space, relative to some coordinate system anchored to the object (“the man is behind the house”). When languages possess systems for encoding all of these spatial reference frames, they often privilege one at the expense of the others. However, the fact that some languages lack one or more of the reference systems suggests that the accretion of all three systems into most contemporary languages may be a product of long-term cumulative cultural evolution.
In data on spatial reference systems from 20 languages drawn from diverse societies – including foragers, horticulturalists, agriculturalists, and industrialized populations – only three languages relied on egocentric frames as their single preferred system of reference. All three were from industrialized populations: Japanese, English, and Dutch (Majid et al. 2004).
The presence of, or emphasis on, different reference systems may influence nonlinguistic spatial reasoning (Levinson 2003). In one study, Dutch and Tzeltal speakers were seated at a table and shown an arrow pointing either to the right (north) or the left (south). They were then rotated 180 degrees to a second table where they saw two arrows: one pointing to the left (north) and the other one pointing to the right (south). Participants were asked which arrow on the second table was like the one they saw before. Consistent with the spatial-marking system of their languages, Dutch speakers chose the relative solution, whereas the Tzeltal speakers chose the absolute solution. Several other comparative experiments testing spatial memory and reasoning are consistent with this pattern, although lively debates about interpretation persist (Levinson et al. 2002; Li & Gleitman 2002).
Extending the above exploration, Haun and colleagues (Haun et al. 2006a; 2006b) examined performance on a spatial reasoning task similar to the one described above, using children and adults from different societies and great apes. In the first step, Dutch-speaking adults and 8-year-olds (speakers of an egocentric language) showed the typical egocentric bias, whereas Hai//om-speaking adults and 8-year-olds (a Namibian foraging population who speak an allocentric language) showed a typical allocentric bias. In the second step, 4-year-old German-speaking children, gorillas, orangutans, chimpanzees, and bonobos were tested on a simplified version of the same task. All showed a marked preference for allocentric reasoning. These results suggest that children share with other great apes an innate preference for allocentric spatial reasoning, but that this bias can be overridden by input from language and cultural routines.
If one were to work on spatial cognition exclusively with WEIRD subjects (say, using subjects from the United States and Europe), one might conclude that children start off with an allocentric bias but naturally shift to an egocentric bias with maturation. The problem with this conclusion is that it would not apply to many human populations, and it may be the consequence of studying subjects from peculiar cultural environments. The next telescoping contrast highlights some additional evidence suggesting that WEIRD people may even be unusual in their egocentric bias vis-à-vis most other industrialized populations.
We have discussed several lines of data suggesting not only population-level variation, but that industrialized populations are consistently unusual compared to small-scale societies. There are also numerous studies that have found differences between much smaller numbers of samples (usually two samples). In these studies it is impossible to discern who is unusual, the small-scale society or the WEIRD population. For example, one study found that both samples from two different industrialized populations were risk-averse decision makers when facing monetary gambles involving gains (Henrich & McElreath 2002), whereas both samples from small-scale societies were risk-prone. Risk-aversion for monetary gains may be a recent, local phenomenon. Similarly, extensive inter-temporal choice experiments using a panel method of data collection indicates that the Tsimane, an Amazonian population of forager-horticulturalists, discount the future 10 times more steeply than do WEIRD people (Godoy et al. 2004). In Uganda, a study of individual decision-making among small-scale farmers showed qualitatively different deviations from expected utility maximization than is typically found among undergraduates. For example, rather than the inverse S-shape for probabilities in Prospect Theory, a regular S-shape was found.3
Some larger-scale comparative projects show universal patterns in human psychology. Here we list some noteworthy examples:
There are also numerous studies involving dyadic comparisons between a single small-scale society and a Western population (or a pattern of Western results) in which cross-population similarities have been found. Examples are numerous but include the development of an understanding of death (Barrett & Behne 2005), shame (Fessler 2004),5 and cheater detection (Sugiyama et al. 2002). Finding evidence for similarities across two such disparate populations is an important step towards providing evidence for universality (Norenzayan & Heine 2005); however, the case would be considerably stronger if it was found across a larger number of diverse populations.6
Although there are several domains in which the data from small-scale societies appear similar to that from industrialized societies, comparative projects involving visual illusions, social motivations (fairness), folkbiological cognition, and spatial cognition all show industrialized populations as outliers. Given all this, it seems problematic to generalize from industrialized populations to humans more broadly, in the absence of supportive empirical evidence.
4. Contrast 2: Western7 versus non-Western societies
For our second contrast, we review evidence comparing Western with non-Western populations. Here we examine four of the most studied domains: social decision making (fairness, cooperation, and punishment), independent versus interdependent self-concepts (and associated motivations), analytic versus holistic reasoning, and moral reasoning. We also briefly return to spatial cognition.
In the previous contrast, we reviewed social decision-making experiments showing that industrialized populations occupy the extreme end of the behavioral distribution vis-à-vis a broad swath of smaller-scale societies. Here we show that even among industrialized populations, Westerners are again clumped at the extreme end of the behavioral distribution. Notably, the behaviors measured in the experiments discussed below are strongly correlated with the strength of formal institutions, norms of civic cooperation, and Gross Domestic Product (GDP) per capita.
In 2002, Fehr and Gächter published their classic paper, “Altruistic Punishment in Humans,” in Nature, based on Public Goods Games with and without punishment, conducted with undergraduates at the University of Zurich. The paper demonstrated that adding the possibility of punishment to a cooperative dilemma dramatically altered the outcome, from a gradual slide towards little cooperation (and rampant free-riding), to a steady increase towards stable cooperation. Enough subjects were willing to punish non-cooperators at a cost to themselves to shift the balance from free-riding to cooperation. In stable groups this cooperation-punishment combination dramatically increases long-run gains (Gächter et al. 2008).
To examine the generalizability of these results, which many took to be a feature of our species, Herrmann, Thoni, and Gächter conducted systematic comparable experiments among undergraduates from a diverse swath of industrialized populations (Herrmann et al. 2008). In these Public Goods Games, subjects played with the same four partners for 10 rounds and could contribute during each round to a group project. All contributions to the group project were multiplied by 1.6 and distributed equally among all partners. Players could also pay to punish other players by taking money away from them.
In addition to finding population-level differences in the subjects' initial willingness to cooperate, Gächter's team unearthed in about half of these samples a phenomenon that is not observed beyond a trivial degree among typical undergraduate subjects (see our Fig. 4): Many subjects engaged in anti-social punishment; that is, they paid to reduce the earnings of “overly” cooperative individuals (those who contributed more than the punisher did). The effect of this behavior on levels of cooperation was dramatic, completely compensating for the cooperation-inducing effects of punishment in the Zurich experiment. Possibilities for altruistic punishment do not generate high levels of cooperation in these populations. Meanwhile, participants from a number of Western countries, such as the United States, the United Kingdom, and Australia, behaved like the original Zurich students. Thus, it appears that the Zurich sample works well for generalizing to the patterns of other Western samples (as well as the Chinese sample), but such findings cannot be readily extended beyond this.
Mean punishment expenditures from each sample for a given deviation from the punisher's contribution to the public good. The deviations of the punished subject's contribution from the punisher's contribution are grouped into five intervals, where [-20,-11] indicates that the punished subjects contributed between 11 and 20 less than the punishing subject;  indicates that the punished subject contributed exactly the same amount as the punishing subject; and [1,10] ([11,20]) indicates that the punished subject contributed between 1 and 10 (11 and 20) more than the punishing subject. Adapted from Herrmann et al. (2008).
Much psychological research has explored the nature of people's self-concepts. Self-concepts are important, as they organize the information that people have about themselves, direct attention to information that is perceived to be relevant, shape motivations, influence how people appraise situations that influence their emotional experiences, and guide their choices of relationship partners. Markus and Kitayama (1991) posited that self-concepts can take on a continuum of forms stretching between two poles, termed independent and interdependent self-views, which relate to the individualism-collectivism construct (Triandis 1989; 1994). Do people conceive of themselves primarily as self-contained individuals, understanding themselves as autonomous agents who consist largely of component parts, such as attitudes, personality traits, and abilities? Or do they conceive of themselves as interpersonal beings intertwined with one another in social webs, with incumbent role-based obligations towards others within those networks? The extent to which people perceive themselves in ways similar to these independent or interdependent poles has significant consequences for a variety of emotions, cognitions, and motivations.
Much research has underscored how Westerners have more independent views of self than non-Westerners. For example, research using the Twenty Statements Test (Kuhn & McPartland 1954) reveals that people from Western populations (e.g., Australians, Americans, Canadians, Swedes) are far more likely to understand their selves in terms of internal psychological characteristics, such as their personality traits and attitudes, and are less likely to understand them in terms of roles and relationships, than are people from non-Western populations, such as Native Americans, Cook Islanders, Maasai and Samburu (both African pastoralists), Malaysians, and East Asians (for a review, see Heine 2008). Studies using other measures (Hofstede 1980; Morling & Lamoreaux 2008; Oyserman et al. 2002; Triandis et al. 1990) provide convergent evidence that Westerners tend to have more independent, and less interdependent, self-concepts than those of other populations. These data converge with much ethnographic observation, in particular Geertz's (1975, p. 48) claim that the Western self is “a rather peculiar idea within the context of the world's cultures.”
There are numerous psychological patterns associated with self-concepts. For example, people with independent self-concepts are more likely to demonstrate (1) positively biased views of themselves; (2) a heightened valuation of personal choice; and (3) an increased motivation to “stand out” rather than to “fit in.” Each of these represents a significant research enterprise, and we discuss them in turn.
The most widely endorsed assumption regarding the self is that people are motivated to view themselves positively. Roger Brown (1986) famously declared this motivation to maintain high self-esteem an “urge so deeply human, we can hardly imagine its absence” (p. 534). The strength of this motivation has been perhaps most clearly documented by assessing the ways that people go about exaggerating their self-views by engaging in self-serving biases, in which people view themselves more positively than objective benchmarks would justify. For example, in one study, 94% of American professors rated themselves as better than the average American professor (Cross 1977). However, meta-analyses reveal that these self-serving biases tend to be more pronounced in Western populations than in non-Western ones (Heine & Hamamura 2007; Mezulis et al. 2004) – for example, Mexicans (Tropp & Wright 2003), Native Americans (Fryberg & Markus 2003), Chileans (Heine & Raineri 2009), and Fijians (Rennie & Dunne 1994) score much lower on various measures of positive self-views than do Westerners (although there are some exceptions to this general pattern; see Harrington & Liu 2002). Indeed, in some cultural contexts, most notably East Asian ones, evidence for self-serving biases tends to be null, or in some cases, shows significant reversals, with East Asians demonstrating self-effacing biases (Heine & Hamamura 2007). At best, the sharp self-enhancing biases of Westerners are less pronounced in much of the rest of the world, although self-enhancement has long been discussed as if it were a fundamental aspect of human psychology (e.g., Rogers 1951; Tesser 1988).
Psychology has long been fascinated with how people assert agency by making choices (Bandura 1982; Kahneman & Tversky 2000; Schwartz 2004), and has explored the efforts that people go through to ensure that their actions feel freely chosen and that their choices are sensible. However, there is considerable variation across populations in the extent to which people value choice and in the range of behaviors over which they feel that they are making choices. For example, one study found that European-American children preferred working on a task, worked on it longer, and performed better on it, if they had made some superficial choices regarding the task than if others made the same choices for them. In contrast, Asian-American children were equally motivated by the task if a trusted other made the same choices for them (Iyengar & Lepper 1999). Another two sets of studies found that Indians were slower at making choices, were less likely to make choices consistent with their personal preferences, and were less likely to view their actions as expressions of choice, than were Americans (Savani et al. 2008; in press). Likewise, the extent to which people feel that they have much choice in their lives varies across populations. Surveys conducted at bank branches in Argentina, Brazil, Mexico, the Philippines, Singapore, Taiwan, and the United States found that Americans were more likely to perceive having more choice at their jobs than were subjects from the other countries (Iyengar & DeVoe 2003). Another survey administered in more than 40 countries found, in general, that feelings of free choice in one's life were considerably higher in Western nations (e.g., Finland, the United States, and Northern Ireland) than in various non-Western nations (e.g., Turkey, Japan, and Belarus: Inglehart et al. 1998). This research reveals that perceptions of choice are experienced less often, and are a lesser concern, among those from non-Western populations.
Many studies have explored whether motivations to conform are similar across populations by employing a standard experimental procedure (Asch 1951; 1952). In these studies, which were initially conducted with Americans, participants first hear a number of confederates making a perceptual judgment that is obviously incorrect, and then participants are given the opportunity to state their own judgment. A majority of American participants were found to go along with the majority's incorrect judgment at least once. This research sparked much interest, apparently because Westerners typically feel that they are acting on their own independent resolve and are not conforming. A meta-analysis of studies performed in 17 societies (Bond & Smith 1996), including subjects from Oceania, the Middle East, South America, Africa, South America, East Asia, Europe, and the United States, found that motivations for conformity are weaker in Western societies than elsewhere. Other research converges with this conclusion. For example, Kim and Markus (1999) found that Koreans preferred objects that were more common, whereas Americans showed a greater preference for objects that were more unusual.
Variation in favored modes of reasoning has been compared across several populations. Most of the research has contrasted Western (American, Canadian, Western European) with East Asian (Chinese, Japanese, Korean) populations with regard to their relative reliance on what is known as “holistic” versus “analytic” reasoning (Nisbett 2003; Peng & Nisbett 1999). However, growing evidence from other non-Western populations points to a divide between Western nations and most everyone else, including groups as diverse as Arabs, Malaysians, and Russians (see Norenzayan et al.  for a review), as well as subsistence farmers in Africa and South America and sedentary foragers (Norenzayan et al., n.d.; Witkin & Berry 1975), rather than an East-West divide.
Holistic thought involves an orientation to the context or field as a whole, including attention to relationships between a focal object and the field, and a preference for explaining and predicting events on the basis of such relationships. Analytic thought involves a detachment of objects from contexts, a tendency to focus on objects' attributes, and a preference for using categorical rules to explain and predict behavior. This distinction between habits of thought rests on a theoretical partition between two reasoning systems. One system is associative, and its computations reflect similarity and contiguity (i.e., whether two stimuli share perceptual resemblances and co-occur in time); the other system relies on abstract, symbolic representational systems, and its computations reflect a rule-based structure (e.g., Neisser 1963; Sloman 1996).
Although both cognitive systems are available in all normal adults, different environments, experiences, and cultural routines may encourage reliance on one system at the expense of the other, giving rise to population-level differences in the use of these different cognitive strategies to solve identical problems. There is growing evidence that a key factor influencing the prominence of analytic versus holistic cognition is the different self-construals prevalent across populations. First, independent self-construal primes facilitate analytic processing, whereas interdependent primes facilitate holistic processing (Oyserman & Lee 2008). Second, geographic regions with greater prevalence of interdependent self-construals show more holistic processing, as can be seen in comparisons of Northern and Southern Italians, Hokkaido and mainland Japanese, and Western and Eastern Europeans (Varnum et al. 2008).
Furthermore, the analytic approach is culturally more valued in Western contexts, whereas the holistic approach is more valued in East Asian contexts, leading to normative judgments about cognitive strategies that differ across the respective populations (Buchtel & Norenzayan 2008). Below we highlight some findings from this research showing that, compared to diverse populations of non-Westerners, Westerners (1) attend more to objects than fields; (2) explain behavior in more decontextualized terms; and (3) rely more on rules over similarity relations to classify objects (for further discussion of the cross-cultural evidence, see Nisbett 2003; Norenzayan et al. 2007).
Relative dominance of rule-based versus family resemblance–based judgments of categories for the same cognitive task. European-American, Asian-American, and East Asian university students were tested by Norenzayan et al. (2002b); the herders, fishermen, and farmers of Turkey's Black Sea coast were tested by Uskul et al. (2008). Positive scores indicate a relative bias towards rule-based judgments, whereas negative scores indicate a relative bias towards family resemblance–based judgments. It can be seen that European-American students show the most pronounced bias toward rule-based judgments, and they are outliers in terms of absolute deviation from zero. Adapted from Norenzayan et al. (2002b) and Uskul et al. (2008).
In summary, although analytic and holistic cognitive systems are available to all normal adults, a large body of evidence shows that the habitual use of what are considered “basic” cognitive processes, including those involved in attention, perception, categorization, deductive reasoning, and social inference, varies systematically across populations in predictable ways, highlighting the difference between the West and the rest. Several biases and patterns are not merely differences in strength or tendency, but show reversals of Western patterns. We emphasize, however, that Westerners are not unique in their cognitive styles (Uskul et al. 2008; Witkin & Berry 1975), but they do occupy the extreme end of the distribution.
A central concern in the developmental literature has been the way people acquire the cognitive foundations of moral reasoning. The most influential approach to the development of moral reasoning has been Kohlberg's (1971; 1976; 1981), in which people's abilities to reason morally are seen to hinge on cognitive abilities that develop over maturation. Kohlberg proposed that people progressed through the same three levels: (1) Children start out at a pre-conventional level, viewing right and wrong as based on internal standards regarding the physical or hedonistic consequences of actions; (2) then they progress to a conventional level, where morality is based on external standards, such as that which maintains the social order of their group; and finally (3) some progress further to a post-conventional level, where they no longer rely on external standards for evaluating right and wrong, but instead do so on the basis of abstract ethical principles regarding justice and individual rights – the moral code inherent in most Western constitutions.
While all of Kohlberg's levels are commonly found in WEIRD populations, much subsequent research has revealed scant evidence for post-conventional moral reasoning in other populations. One meta-analysis carried out with data from 27 countries found consistent evidence for post-conventional moral reasoning in all the Western urbanized samples, yet found no evidence for this type of reasoning in small-scale societies (Snarey 1985). Furthermore, it is not just that formal education is necessary to achieve Kohlberg's post-conventional level. Some highly educated non-Western populations do not show this post-conventional reasoning. At Kuwait University, for example, faculty members scored lower on Kohlberg's schemes than the typical norms for Western adults, and the elder faculty there scored no higher than the younger ones, contrary to Western patterns (Al-Shehab 2002; Miller et al. 1990).
Research in moral psychology indicates that typical Western subjects rely principally on justice- and harm/care-based principles in judging morality. However, recent work indicates that non-Western adults and Western religious conservatives rely on a wider range of moral principles than these two dimensions of morality (Baek 2002; Haidt & Graham 2007; Haidt et al. 1993; e.g., Miller & Bersoff 1992). Shweder et al. (1997) proposed that in addition to a dominant justice-based morality, which they termed an “ethic of autonomy,” there are two other ethics that are commonly found outside the West: an ethic of community, in which morality derives from the fulfillment of interpersonal obligations that are tied to an individual's role within the social order, and an ethic of divinity, in which people are perceived to be bearers of something holy or god-like, and have moral obligations to not act in ways that are degrading to or incommensurate with that holiness. The ethic of divinity requires that people treat their bodies as temples, not as playgrounds, and so personal choices that seem to harm nobody else (e.g., about food, sex, and hygiene) are sometimes moralized (for a further elaboration of moral foundations, see Haidt & Graham 2007). In sum, the high-socioeconomic status (SES), secular Western populations that have been the primary target of study thus far, appear unusual in a global context, based on their peculiarly narrow reliance, relative to the rest of humanity, on a single foundation for moral reasoning (based on justice, individual rights, and the avoidance of harm to others; cf. Haidt & Graham 2007).
There are many other psychological phenomena in which Western samples differ from non-Western ones; however, at present there are insufficient data in these domains derived from diverse populations to assess where Westerners reside in the human spectrum. For example, compared with Westerners, some non-Westerners (1) have less dynamic social networks, in which people work to avoid negative interactions among their existing networks rather than seeking new relations (Adams 2005); (2) prefer lower to higher arousal-positive affective states (Tsai 2007); (3) are less egocentric when they try to take the perspective of others (Cohen et al. 2007; Wu & Keysar 2007); (4) have weaker motivations for consistency (Kanagawa et al. 2001; Suh 2002); (5) are less prone to “social-loafing” (i.e., reducing efforts on group tasks when individual contributions are not being monitored) (Earley 1993); (6) associate fewer benefits with a person's physical attractiveness (Anderson et al. 2008); and (7) have more pronounced motivations to avoid negative outcomes relative to their motivations to approach positive outcomes (Elliot et al. 2001; Lee et al. 2000).
With reference to the spatial reasoning patterns discussed earlier, emerging evidence suggests that a geocentric bias (i.e., a landscape- or earth-fixed spatial coordinate system) may be much more widespread than previously thought – indeed, it may be the common pattern outside of the West, even among non-Western speakers of languages which make regular use of egocentric linguistic markers. Comparative research contrasting children and adults in Geneva with samples in Indonesia, Nepal, and rural and urban India have found the typical geocentric reasoning pattern in all of these populations, except for the Geneva samples (Dasen et al. 2006). Although many of these population-level differences are pronounced, more research is needed before we can assess whether the geocentric pattern is common across a broader swath of humanity.
We expect that as more large-scale comparative studies of Western and non-Western populations are conducted, they will reveal substantial similarities in psychological processes. However, given the relative ease of conducting such studies (as compared to working in small-scale societies), there have been few comparative programs that have put universality claims to the test. Here we highlight three examples of larger-scale comparative projects that show broad and important similarities across populations.
Although robust patterns have emerged among people from industrialized societies, Westerners emerge as unusual – frequent global outliers – on several key dimensions. The experiments reviewed are numerous, arise from different disciplines, use diverse methods, and are often part of systematically comparable data sets created by unified projects. Many of these differences are not merely differences in the magnitude of effects but often show qualitative differences, involving effect reversals or novel phenomena such as allocentric spatial reasoning and antisocial punishment.
Above we compared WEIRD populations to non-Western populations. However, given the dominance of American research within psychology (see May 1997) and the behavioral sciences, it is important to assess the similarity of American data with that from Westerners more generally. Is it reasonable to generalize from Americans to the rest of the West? Americans are, of course, people too, so they will share many psychological characteristics with other Homo sapiens. At present, we could find no systematic research program to compare Americans with other Westerners, so the evidence presented is assembled from many sources.
Americans stand out relative to other Westerners on phenomena that are associated with independent self-concepts and individualism. A number of analyses, using a diverse range of methods, reveal that Americans are, on average, the most individualistic people in the world (e.g., Hofstede 1980; Lipset 1996; Morling & Lamoreaux 2008; Oyserman et al. 2002). The observation that the United States is especially individualistic is not new and dates at least as far back as de Toqueville (1835). The unusually individualistic nature of Americans may be caused by, or reflect, an ideology that particularly stresses the importance of freedom and self-sufficiency, as well as various practices in education and childrearing that may help to inculcate this sense of autonomy. American parents, for example, were the only ones in a survey of 100 societies who created a separate room for their baby to sleep (Burton & Whiting 1961; also see Lewis 1995), reflecting that from the time they are born, Americans are raised in an environment that emphasizes their independence (on the unusual nature of American childrearing, see Lancy 2008; Rogoff 2003).13
The extreme individualism of Americans is evident on many demographic and political measures. In American Exceptionalism, sociologist Seymour Martin Lipset (1996) documents a long list of the ways that Americans are unique in the Western world. At the time of Lipset's surveys, compared with other Western industrialized societies, Americans were found to be the most patriotic, litigious, philanthropic, and populist (they have the most positions for elections and the most frequent elections, although they have among the lowest voter turnout rates). They were also among the most optimistic, and the least class-conscious. They were the most churchgoing in Protestantism, and the most fundamentalist in Christendom, and were more likely than others from Western industrialized countries to see the world in absolute moral terms. In contrast to other large Western industrialized societies, the United States had the highest crime rate, the longest working hours, the highest divorce rate, the highest rate of volunteerism, the highest percentage of citizens with a post-secondary education, the highest productivity rate, the highest GDP, the highest poverty rate, and the highest income-inequality rate; and Americans were the least supportive of various governmental interventions. The United States is the only industrialized society that never had a viable socialist movement; it was the last country to get a national pension plan, unemployment insurance, and accident insurance; and, at the time of writing, remain the only industrialized nation that does not have a general allowance for families or a national health insurance plan. In sum, there is some reason to suspect that Americans might be different from other Westerners, as de Tocqueville noted.
Given the centrality of self-concept to so many psychological processes, it follows that the unusual emphasis in America on individualism and independence would be reflected in a wide spectrum of self-related phenomena. For example, self-concepts are implicated when people make choices (e.g., Vohs et al. 2008). While Westerners in general tend to value choices more than non-Westerners do (e.g., Iyengar & DeVoe 2003), Americans value choices more still, and prefer more opportunities, than do Westerners from elsewhere (Savani et al. 2008). For example, in a survey of people from six Western countries, only Americans preferred a choice from 50 different ice cream flavors compared with 10 flavors. Likewise, Americans (and Britons) prefer to have more choices on menus in upscale restaurants than do people from other European countries (Rozin et al. 2006). The array of choices available, and people's motivation to make such choices, is even more extreme in the United States compared to the rest of the West.
Likewise, because cultural differences in analytic and holistic reasoning styles appear to be influenced by whether one views the social world as a collection of discrete individuals or as a set of interconnected relationships (Nisbett 2003), it follows that exceptionally individualistic Americans should be exceptionally analytic as well. One recent study suggests that this might indeed be the case: Americans showed significantly more focused attention in the Framed Line Task than did people from other European countries (Britain and Germany) as well as from Japan (Kitayama et al. 2009). Although more research is needed, Americans may see the world in more analytic terms than the rest of the West.
Terror management theory maintains that because humans possess the conscious awareness that they will someday die, they cope with the associated existential anxiety by making efforts to align themselves with their cultural worldviews (Greenberg et al. 1997). The theory is explicit that the existential problem of death is a human universal, and indeed posits that an awareness of death preceded the evolution of cultural meaning systems in humans (Becker 1973). In support of this argument of universality, the tendency to defend one's cultural worldview following thoughts about death has been found in every one of the more than a dozen diverse populations studied thus far. However, there is also significant cross-population diversity in the magnitude of these effects. A recent meta-analysis of all terror management studies reveals that the effect sizes for cultural worldview defense in the face of thoughts of death are significantly more pronounced among American samples (r=0.37) than among other Western (r=0.30) or non-Western samples (r=0.26: Burke et al. 2010). Curiously, Americans respond more defensively to death thoughts than do those from other countries.
In the previous section, we discussed Herrmann et al.'s (2008) work showing substantial qualitative differences in punishment between Western and non-Western societies. While Western countries all clump at one end of Figure 4, the Americans anchor the extreme end of the West's distribution. Perhaps it is this extreme tendency for Americans to punish free-riders, while not punishing cooperators, that contributes to Americans having the world's highest worker productivity. American society is also anomalous, even relative to other Western societies, in its low relational focus in work settings, which is reflected in practices such as the encouragement of an impersonal work style, direct (rather than indirect) communication, the clear separation of the work domain from the non-work, and discouragement of friendships at work (Sanchez-Burks 2005).
We are unable to locate any research program (other than the ones reviewed in the first two telescoping contrasts) that has demonstrated that American psychological and behavioral patterns are similar to the patterns of other Westerners. We reason that there should be many similarities between the United States and the rest of the West, and we assume that many researchers share our impression. Perhaps this is why we are not able to find studies that have been conducted to explicitly establish these similarities – many researchers likely would not see such studies as worth the effort. In the absence of comparative evidence for a given phenomenon, it might not be unreasonable to assume that the Americans would look similar to the rest of the West. However, the above findings provide a hint that, at least along some key dimensions, Americans are extreme.
There are few research programs that have explicitly sought to contrast Americans with other Westerners on psychological or behavioral measures. However, those phenomena for which sufficient data are available to make cross-population comparisons reveal that American participants are exceptional even within the unusual population of Westerners – outliers among outliers.
The previous contrasts have revealed that WEIRD populations frequently occupy the tail-ends of distributions of psychological and behavioral phenomena. However, it is important to recognize, as a number of researchers have (e.g., Arnett 2008; Medin & Atran 2004; Sears 1986), that the majority of behavioral research on non-clinical populations within North America is conducted with undergraduates (Peterson 2001; Wintre et al. 2001). Further, within psychology, the subjects are usually psychology majors, or at least taking introductory psychology courses. In the case of child participants, they are often the progeny of high-SES people. Thus, there are numerous social, economic, and demographic dimensions that tentatively suggest that these subjects might be unusual. But, are they?
Highly educated Americans differ from other Americans in many important respects. In the following subsections, we first highlight findings from social psychology and then from behavioral economics.
For a number of the phenomena reviewed above in which Americans were identified as global outliers, highly educated Americans occupy an even more extreme position than less-educated Americans. Here we itemize eight examples.
More broadly, a second-order meta-analysis (N>650,000, Number of studies>7,000) of studies that included either college student samples or non-student adult samples revealed that the two groups differed either directionally or in magnitude for approximately half of the phenomena studied (e.g., attitudes, gender perceptions, social desirability: Peterson 2001). However, no clear pattern regarding the factors that accounted for the differences emerged. Other research has found that American undergraduates have higher degrees of self-monitoring (Reifman et al. 1989), are more susceptible to attitude change (Krosnick & Alwin 1989), and are more susceptible to social influence (Pasupathi 1999) compared to non-student adults.
Consistent and non-trivial differences between undergraduates and fully-fledged adults are emerging in behavioral economics as well. When compared with diverse and sometimes representative adult samples, undergraduate subjects consistently set the lower bound for prosociality in experimental measures of trust, fairness, cooperation, and punishment of unfairness or free-riding. For example, in both the Ultimatum and Dictator Games, non-student Americans (both rural and urban participants) make significantly higher offers than do undergraduate subjects (Henrich & Henrich 2007). The difference is most pronounced in Dictator Games in which samples of non-student American adults from Missouri (urban and rural Missouri did not differ) offered a mean 47% of the total stake while undergraduate freshmen gave 32%, well within the typical range for undergraduates in this game (Camerer 2003; Ensminger & Cook, under review; Henrich & Henrich, under review). These seemingly high offers among non-students in the Dictator Game are similar to those found in other non-student samples in the United States (Carpenter et al. 2005; Henrich & Henrich 2007). It is the student results that are anomalous. Similarly, more recent research comparing students with both representative and selectively diverse samples of adults using the Trust Game, Ultimatum Game, and Public Goods Game shows that undergraduates ride the lower bound on prosociality measures (Bellemare & Kröger 2007; Bellemare et al. 2008; Carpenter et al. 2008; Fehr & List 2004). In fact, “being an undergraduate” (or being young and educated) is one of the few demographic variables that seems to matter in explaining within-country variability.
Behavioral economics research also indicates that developmental or acculturative changes to some motivations and preferences are still occurring within the age range of undergraduates (Henrich 2008). For example, Ultimatum Game offers continue to change over the university years, with freshmen making lower offers than seniors (Carter & Irons 1991). Other work shows that offers do not hit their adult plateau in behavioral games until around age 24 (Carpenter et al. 2005), after which time offers do not change with age until people reach old age. In the Trust Game, measures of trust and trustworthiness increase with age, until they reach a plateau close to age 30 (Sutter & Kocher 2007a).
Such research may explain why treatment effects also depend on the subject pool used, with students being the most sensitive. For example, Dictator Game treatments involving double-blind setups, such that the experimenter cannot know how much a subject contributes, have dramatically smaller effects on offers among non-student adults, and sometimes no effect at all in adult populations outside the United States (Lesorogol & Ensminger, under review). Similarly, unconscious religious primes increased Dictator Game offers in a Canadian student sample of religious and nonreligious participants alike, but when non-student adults were sampled, no significant effect emerged for the nonreligious adults (Shariff & Norenzayan 2007).
For several of these economics measures, such as public good contributions (Egas & Riedl 2008), undergraduate behavior is qualitatively similar to fully-fledged adult behaviors, just less prosocial. However, in at least one area (so far), it appears that a particularly interesting phenomenon is qualitatively absent in undergraduates by comparison with fully-fledged adults from the same populations: As discussed earlier for small-scale societies, researchers using the Ultimatum Game have found systematic, non-trivial tendencies in many populations to reject offers greater than 50% of the stake, a phenomenon neither previously observed in students nor intuited by researchers. Recent work using representative adult samples has revealed this tendency for “hyper-fair rejections” among non-student adults in Western populations, though it is substantially weaker than in many of the non-Western populations discussed above (Bellemare et al. 2008; Guth et al. 2003; Wallace et al. 2007).
Although studying young children is one important strategy for discerning universals, it does not completely avoid these challenges, as developmental studies are frequently biased toward middle- and upper-class American children. Recent evidence indicates that something as seemingly basic as the differences in spatial reasoning between males and females (Hyde 1981; Mann et al. 1990; Voyer et al. 1995) does not generalize well to poor American children. On two different spatial tasks, repeated four times over two years with 547 second- and third-graders, low-SES children did not show the sex differences observed in middle- and high-SES children from Chicago (Levine et al. 2005). Such findings, when combined with other research indicating no sex differences on spatial tasks among migratory foragers (Berry 1966), suggest that a proper theory of the origins of sex differences in spatial abilities needs to explain why both poor Chicago children and foragers do not show any sex differences.
Research on IQ using analytical tools from behavioral genetics has long shown that IQ is highly heritable, and not strongly influenced by shared family environment (Bouchard 2004). However, research using 7-year-old twins drawn from a wide range of socioeconomic statuses, shows that contributions of genetic variation and shared environment vary dramatically from low- to high-SES children (Turkheimer et al. 2003). For high-SES children, where environmental variability is negligible, genetic differences account for 70–80% of the variation, with shared environment contributing less than 10%. For low-SES children, where there is far more variability in environmental contributions to intelligence, genetic differences account for 0–10% of the variance, with shared environment contributing about 60%. This raises the specter that much of what we think we have learned from behavioral genetics may be misleading, as the data are disproportionately influenced by WEIRD people and their children (Nisbett 2009).
A similar problem of generalizing from narrow samples exists for genetics research more broadly. Genetic findings obtained with one sample frequently do not replicate in a second sample, to the point that Nature Genetics now requires all empirical papers to include data from two independent samples. There are at least two ways in which geographically limited samples may give rise to spurious genotype-phenotype associations. First, the proportions of various polymorphisms vary across different regions of the world due to different migratory patterns and histories of selection (e.g., Cavalli-Sforza et al. 1994). A genetic association identified in a sample obtained from one region may not replicate in a sample from another region because it involves interactions with other genetic variants that are not equally distributed across regions. Second, the same gene may be expressed differently across populations. For example, Kim et al. (in press) found that a particular serotonin receptor polymorphism (5-HTR1A) was associated with increased attention to focal objects among Americans, but that the same allele was associated with decreased attention to focal objects among Koreans. Researchers would draw different conclusions regarding the function of this polymorphism depending upon the location of their sample. A more complete investigation of heritability and genetic associations demands a comparison of measures across diverse environments and populations.
Contemporary Americans may also be psychologically unusual compared to their forebears 50 or 100 years ago. Some documented changes among Americans over the past few decades include increasing individualism, as indicated by increasingly solitary lifestyles dominated by individual-centered activities and a decrease in group participation (Putnam 2000), increasingly positive self-esteem (Twenge & Campbell 2001), and a lower need for social approval (Twenge & Im 2007). These findings suggest that the unusual nature of Americans in these domains, as we reviewed earlier, may be a relatively recent phenomenon. For example, Rozin (2003) found that attitudes towards tradition are more similar between Indian college students and American grandparents than they are between Indian and American college students. Although more research is needed to reach firm conclusions, these initial findings raise doubts as to whether research on contemporary American students (and WEIRD people more generally) is even extendable to American students of previous decades.
The evidence of temporal change is probably best for IQ. Research by Flynn (1987; 2007) shows that IQ scores increased over the last half century by an average of 18 points across all industrialized nations for which there were adequate data. Moreover, this rise was driven primarily by increasing scores on the analytic subtests. This is a striking finding considering recent work showing how unusual Westerners are in their analytic reasoning styles. Given such findings, it seems plausible that Americans of only 50 or 100 years ago were reasoning in ways much more similar to the rest of the non-Western world than Americans of today.
We expect that typical American subjects are very similar to other Americans in myriad ways. The problem with this expectation, however, is that it is not immediately apparent in which domains they should be similar. We think that there are enough differences between these two groups to raise concerns about speaking incautiously on the thoughts and behaviors of Americans, in general. There have been rather few studies that have explicitly contrasted whether undergraduates or college-educated Americans differ in various psychological measures from those who are not currently students, or who were never college-educated. There are numerous meta-analyses that include data from both college student and non-student samples that speak partially to this issue. Although the meta-analyses do not specify the national origin of the participants, we assume that most of the subjects were American. Some of these analyses indicate considerable similarity between student and non-student samples. For example, the aforementioned second-order meta-analysis (Peterson 2001) revealed similarities between students and non-student samples for about half of the phenomena. Similarly, the relation between attribution styles and depression (Sweeney et al. 1986), and the relations among intentions, attitudes, and norms (Farley et al. 1981) do not show any appreciable differences between student and non-student samples. In these instances, there do not appear to be any problems in generalizing from student to non-student samples, which may suggest that college education, and SES more generally, is not related to these phenomena.
Numerous findings from multiple disciplines indicate that, in addition to many similarities, there are differences among typical subjects and the rest of the American population in unexpected domains. In some of these domains (e.g., individualism, moral reasoning, worldview defense in response to death thoughts, and perceptions of choice), the data from American undergraduates represent even more dramatic departures from the patterns identified in non-Western samples. Further, contemporary American college students appear further removed along some of these dimensions than did their predecessors a few decades earlier. Typical subjects may be outliers within an outlier population.
As the four contrasts summarized above reveal, WEIRD subjects are unusual in the context of the world in some key ways. In this section, we first discuss the main conclusions and implications of our empirical review. We then address two common challenges to our claim that WEIRD subjects are frequent outliers. Finally, we offer some recommendations for how the behavioral sciences may address these challenges.
There are now enough sources of experimental evidence, using widely differing methods from diverse disciplines, to indicate that there is substantial psychological and behavioral variation among human populations. As we have seen, some of this variability involves differences in the magnitude of effects, motivations, or biases. There is also considerable variability in both whether certain effects or biases exist in some populations (as with antisocial punishment and the Müller-Lyer illusion) and in which direction they go (as with preferences for analytic versus holistic reasoning). The causal origins of such population-level variation may be manifold, including behavioral plasticity in response to different environments, epigenetic effects, divergent trajectories of cultural evolution, and even the differential distribution of genes across groups in response to divergent evolutionary histories. With all these causal possibilities on the table, we think the existence of this population-level variation alone should suffice to energize course corrections in our research directions.
We have also identified many domains in which there are striking similarities across populations. These similarities could indicate reliably developing adaptations (e.g., theory of mind), by-products of innate adaptations (such as some aspects of religious cognition), or independent inventions or diffusions of learned responses that have universal utility (such as counting systems, dance, cooking practices, or techniques for making fire). We have no doubt that there are many more pan-human similarities than we have mentioned (e.g., movement perception, taste for sugar, chunking, habituation, and depth computation); however, thus far there are few databases with individual-level measures sufficient to evaluate the similarities or differences across populations.
Many of the processes identified above that vary dramatically across populations would seem to be “basic” psychological processes. The reviewed findings identified variation in aspects of visual perception, memory, attention, fairness motivations, categorization, induction, spatial cognition, self-enhancement, moral reasoning, defensive responses to thoughts about death, and heritability estimates of IQ. These domains are not unique to the social world – they span social as well as nonsocial aspects of the environment, and do not appear to be any less “fundamental” than those domains for which much similarity has been identified. At this point, we know of no strong grounds to make a priori claims to the “fundamentalness” or the likely universality of a given psychological process.
The application of evolutionary theory does not provide grounds for such a priori claims of “fundamental” or “basic” processes, at least in general. Evolutionary theory is a powerful tool for generating and eliminating hypotheses. However, despite its power (or perhaps because of it), it is often overly fecund, as it generates multiple competing hypotheses, with predictions sometimes dependent on unknown or at least debatable aspects of ancestral environments. Hence, adjudicating among alternative evolutionary hypotheses often requires comparative work. Moreover, theoretical work is increasingly recognizing that natural selection has favored ontogenetic adaptations that allow humans, and other species, to adapt non-genetically to local environments (Henrich 2008).
Although we do not yet know of a principled way to predict whether a given psychological process or behavioral pattern will be similar across populations in the absence of comparative empirical research, it would surely be of much value to the field if there were a set of criteria that could be used to anticipate universality (Norenzayan 2006; Norenzayan & Heine 2005). Here we discuss some possible criteria that might be considered.
First, perhaps there are some domains in which researchers could expect phenomena to be more universal than they are in other domains. We believe that the degree of universality does likely vary across domains, although this has yet to be demonstrated. Many researchers (including us) have the intuition that there are cognitive domains related to attention, memory, and perception in which inter-population variability is likely to be low. Our review of the data, however, does not bolster this intuition. Second, it might be reasonable to assume that some phenomena are more fundamental to the extent that they are measured at a physiological or genetic level, such as genotype-phenotype relations or neural activity. However, recall that the same genes can be expressed differently across populations (e.g., Kim et al., in press), and the same cognitive task may be associated with different neural activations across populations (e.g., Hedden et al. 2008). Third, there may be criteria by which one could confidently make generalizations from one well-studied universal phenomenon to another similar phenomenon; for example, because pride displays are highly similar across populations (e.g., Tracy & Matsumoto 2008), it might follow that the conceptually related shame display should also be similar across populations as well (Fessler 1999).
Fourth, it would seem that demonstrating a process or effect in other species, such as rats or pigeons, would indicate human universality (and more). Although this may generally be true, several researchers have argued that culture-gene coevolution has dramatically shaped human evolution in a manner uncharacteristic of other species (Richerson & Boyd 2005). Part of this process may involve the off-loading of previously genetically encoded preferences and abilities into culture (e.g., tastes for spices). Fifth, phenomena which are evident among infants might be reasonably assumed to be more universal than phenomena identified in older children or adults. We suspect this is the case, but it is possible that early biases can be reversed by later ontogeny. Showing parallel findings or effects in both adults and infants from the same population is powerful, and it raises the likelihood of universality; but quite different environments might still shape adult psychologies away from infant patterns (consider the spatial cognition finding with apes, children, and adults). Finally, perhaps particular brain regions are less responsive to experience, such that if a given phenomenon was localized to those regions one could anticipate more universality.
Whatever the relevant principles, it is an important goal to develop theories that predict which elements of our psychological processes are reliably developing across normal human environments and which are locally variable (focusing on the how and why of that variability: Barrett 2006). We note that behavioral scientists have typically been overly confident regarding the universality of what they study, and as this review reveals, our intuitions for what is universal do not have a particularly good track record. We also think this article explains why those intuitions are so poor: Most scientists are WEIRD, or were trained in WEIRD subcultures. Hence, any set of criteria by which universality can be successfully predicted must be grounded in substantial empirical data. We look forward to seeing data that can help to identify criteria to anticipate universality in future research.
The empirical foundation of the behavioral sciences comes principally from experiments with American undergraduates. The patterns we have identified in the available (albeit limited) data indicate that this sub-subpopulation is highly unusual along many important psychological and behavioral dimensions. It is not merely that researchers frequently make generalizations from a narrow subpopulation. The concern is that this particular subpopulation is highly unrepresentative of the species. The fact that WEIRD people are the outliers in so many key domains of the behavioral sciences may render them one of the worst subpopulations one could study for generalizing about Homo sapiens.
To many anthropologically savvy researchers it is not surprising that Americans, and people from modern industrialized societies more generally, appear unusual vis-à-vis the rest of the species. For the vast majority of its evolutionary history, humans have lived in small-scale societies without formal schools, governments, hospitals, police, complex divisions of labor, markets, militaries, formal laws, or mechanized transportation. Every household provisioned much or all of its own food; made its own clothes, tools, and shelters; and – aside from sexual divisions of labor – most everyone had to master the same skills and domains of knowledge. Children typically did not grow up in small, monogamous nuclear families with few kin around, nor were they away from their families at school for much of the day.
Rather, through the course of this history, and in some contemporary societies still, children have typically grown up in mixed-age playgroups, where they received little active instruction or exposure to books or TV (Fiske 1998; Lancy 1996; 2008); they learned largely by observation and imitation; received more directives, more physical punishment, and less praise; and were less likely to be engaged in conversation by adults (and there's no “why” phase). By age 10, children in some foraging societies obtain sufficient calories to feed themselves, and routinely kill and butcher animals. Adolescent females in particular take on most of the work-related responsibilities of adult women. People in small-scale societies tend to have less reliable nutrition, greater exposure to hunger, pain, chronic diseases, and lethal dangers, and more frequently experience the death of family members. WEIRD people, from this perspective, grow up in, and adapt to, a rather atypical environment vis-à-vis that of most of human history. It should not be surprising that their psychological world is unusual as well.
Relying on WEIRD populations may cause researchers to miss important dimensions of variation, and devote undue attention to behavioral tendencies that are unusual in a global context. There are good arguments for choosing topics that are of primary interest to the readers of the literature (i.e., largely WEIRD people); however, if the goal of the research program is to shed light on the human condition, then this narrow, unrepresentative sample may lead to an uneven and incomplete understanding. We suspect that some topics such as self-enhancement, cognitive dissonance, fairness, and analytic reasoning might not have been sufficiently interesting to justify in-depth investigation for most humans at most times throughout history. Alternatively, the behavioral sciences have shown a rather limited interest in such topics as kinship, food, ethnicity (not race), religion, sacred values, polygamy, animal behavior, and rituals (for further critiques on this point, see Rozin 2001; Rozin et al. 2006). Had the behavioral sciences developed elsewhere, important theoretical foci and central lines of research might likely look very different (Medin & Bang 2008). Moreover, it may be unnecessarily difficult to study psychological phenomena in populations where the phenomena are unusually weak, as is the case for conformity or shame among Americans (see Fessler 2004).
Working with children and nonhuman primates is essential for understanding human psychology. However, it is important to note that despite its great utility and intuitive appeal, such research does not fully obviate these challenges. In the case of primate research, discovering parallel results in great apes and in one human population is an important step, but it doesn't tell us how reliably a particular aspect of psychology develops. As the spatial cognition work indicates, because language and cultural practices can – but need not – influence the cognition humans acquired from their phylogenetic history as apes, establishing the same patterns of cognition in apes and Westerners is insufficient to make any strong claims about universality. Suppose most psychologists were Hai\\om speakers (instead of Indo-European speakers); they might have studied only Hai\\om-speaking children and adults, as well as nonhuman apes, and concluded (incorrectly) that allocentric spatial reasoning was universal. Similarly, imagine if Tsimane economists compared Ultimatum Game results for Tsimane adults to those for chimpanzees (Gurven 2004; Henrich & Smith 2001; Jensen et al. 2007). These researchers would have found the same results for both species, and concluded that standard game theoretic models (assuming pure self-interest) and evolutionary analyses (Nowak et al. 2000) were fairly accurate predictors in Ultimatum Game behavior for both chimpanzees and humans – a very tidy finding. In both of these cases, the conclusions would be opposite to those drawn from studies with WEIRD populations.14
Studying children is crucial for developing universal theories. However, evidence suggests that psychological differences among populations can emerge relatively early in children (as with folkbiological reasoning), and sometimes differences are even larger in children than in adults, as with the Müller-Lyer illusion. Moreover, developmental patterns may be different in different populations, as with sex differences in spatial cognition between low-income versus middle- and high-income subpopulations in the United States, or with performance in the false belief task. This suggests a need for converging lines of research. The most compelling conclusions regarding universality would derive from comparative work among diverse human populations done with both adults and children, including infants if possible. Human work can then be properly compared with work among nonhuman species (including but not limited to primates), based on a combination of field and laboratory work.
Evolution has equipped humans with ontogenetic programs, including cultural learning, that help us adapt our bodies and brains to the local physical and social environment. Over the course of human history, convergent forms of cultural evolution have effectively altered (1) our physical environments with tools, technology, and knowledge; (2) our cognitive environments with counting systems, color terms, written symbols, novel grammatical structures, categories, and heuristics; and (3) our social environments with norms, institutions, laws, and punishments. Broad patterns of psychology may be – in part – a product of our genetic program's common response to culturally constructed environments that have emerged and converged over thousands of years. This means that the odd results from small-scale societies, instead of being dismissed as unusual exceptions, ought to be considered as crucial data points that help us understand the ontogenetic processes that build our psychologies in locally adaptive and context-specific ways.
Based on this and the previous point, it seems clear that comparative developmental studies involving diverse human societies combined with parallel studies of nonhuman primates (and other relevant species) provide an approach to understanding human psychology and behavior that can allow us to go well beyond merely establishing universality or variability. Such a systematic, multi-pronged approach can allow us to test a richer array of hypotheses about the processes by which both the reliable universal patterns and the diversity of psychological and behavioral variation emerge.
Our argument should not be construed to suggest that the exclusive use of WEIRD samples should always be avoided. There are cases where the exclusive use of these samples would be legitimate to the extent that generalizability is not a relevant goal of the research, at least initially (Mook 1983). Research programs that are seeking existential proofs for psychological or behavioral phenomena, such as in the case of altruistic punishment discussed earlier (e.g., Fehr & Gächter 2002), could certainly start with WEIRD samples. That is, if the question is whether a certain phenomenon can be found in humans at all, reliance on any slice of humanity would be a legitimate sampling strategy. For another example, Tversky, Kahneman, and their colleagues sought to demonstrate the existence of systematic biases in decision-making that violate the basic principles of rationality (Gilovich et al. 2002). Most of their work was done with WEIRD samples. Counterexamples to standard rationality predictions could come from any sample in the world.16 Furthermore, existential proof for a psychological phenomenon in WEIRD samples can be especially compelling when such a finding is theoretically unexpected. For example, Rozin and Nemeroff (1990) found (surprisingly, to many) that even elite U.S. university students show some magical thinking. Nevertheless, even in such cases, learning about the extent to which population variability affects such phenomena is a necessary subsequent phase of the enterprise, since any theory of human behavior ultimately has to account for such variability (if it exists).
We have encountered two quite different sets of concerns about our argument. Those with the first set of concerns, elaborated below, worry that our findings are exaggerated because (a) we may have cherry-picked only the most extreme cases that fit our argument, and have thus exaggerated the degree to which WEIRD people are outliers, and/or (b) the observed variation across populations may be due to various methodological artifacts that arise from translating experiments across contexts. The second set of concerns is quite the opposite: Some researchers dismissively claim that we are making an obvious point which everyone already recognizes. Perhaps the most productive thing we offer is for these two groups of readers to confront each other.
We preface our response to the first set of concerns with an admonition: Of course, many patterns and processes of human behavior and psychology will be generally shared across the species. We recognize that human thought and behavior is importantly tethered to our common biology and our common experiences. Given this, the real challenge is to design a research program that can explain the manifest patterns of similarity and variation by clarifying the underlying evolutionary and developmental processes.
We offer three general responses to the concern that our review presents a biased picture. To begin, we constructed our empirical review by targeting studies involving important psychological or behavioral concepts which were, or still are, considered to be universal, and which have been tested across diverse populations. We also listed and discussed major comparative studies that have identified important cross-population similarities. Since we have surely overlooked relevant material, we invite commentators to add to our efforts in identifying phenomena which have been widely tested across diverse subpopulations.
Second, we acknowledge that because proper comparative data are lacking for most studied phenomena, we cannot accurately evaluate the full extent of how unusual WEIRD people are. This is, however, precisely the point. We hope research teams will be inspired to span the globe and prove our claims of non-representativeness wrong. The problem is that we simply do not know how well many key phenomena generalize beyond the extant database of WEIRD people. The evidence we present aims only to challenge (provoke?) those who assume that undergraduates are sufficient to make claims about human psychology and behavior.
Third, to address the concern that the observed population-level differences originate from the methodological challenges of working across diverse contexts, we emphasize that the evidence in our article derives from diverse disciplines, theoretical approaches, and methodological techniques. They include experiments involving (1) incentivized economic decisions; (2) perceptual judgments; (3) deceptive experimental practices that prevented subjects from knowing what was being measured; and (4) children, who are less likely than adults to have motivations to shape their responses in ways that they perceive as desirable (or undesirable) to the experimenter. The findings, often published in the best journals of their respective fields, hinged on the researchers making a compelling case that their methodology was comparably meaningful across the populations being studied.
Furthermore, the same methods that have yielded population differences in one domain have demonstrated similarities in other domains (Atran 2005; Haun et al. 2006b; Henrich et al. 2006; Herrmann et al. 2008; Medin & Atran 2004; Segall et al. 1966). If one wants to highlight the demonstrated similarities, one cannot then ignore the demonstrated differences which relied on the same or similar methodologies.
Note also that few of the findings that we reviewed involve comparing means across subjective self-report measures, for which there are well-known challenges in making cross-population comparisons (Chen et al. 1995; Hamamura et al. 2008; Heine et al. 2002; Norenzayan et al. 2002b; Peng et al. 1997). Therefore, while methodological challenges may certainly be an issue in some specific cases, we think it strains credulity to suggest that such issues invalidate the thrust of our argument, and thus eliminate concerns about the non-representativeness of typical subjects.
Our experience is that many researchers who work exclusively with WEIRD subjects would like to establish the broad generalizability of their findings. Even if they strongly suspect that their findings will generalize across the species, most agree that it would be better to have comparative data across diverse populations. The problem, then, is not exclusively a scientific or epistemological disagreement, but one of institutionalized incentives as well. Hence, addressing this issue will require adjusting the existing incentive structures for researchers. The central focus of these adjustments should be that in presenting our research designs to granting agencies, or our empirical findings in journals, we must explicitly address questions of generalizability and representativeness. With this in mind, we offer the following recommendations.
Journal editors and reviewers should press authors to both explicitly discuss and defend the generalizability of their findings. Claims and confidence regarding generalizability must scale with the strength of the empirical defense. If a result is novel, being explicitly uncertain about generalizability should be fine, but one should not imply universality without an empirically grounded argument.
This does not imply that all experimentalists need to shift to performing comparative work across diverse subject pools! As comparative evidence accumulates in different domains, researchers will be able to assess the growing body of comparative research and thus be able to calibrate their confidence in the generalizability of their findings. The widespread practice of subtly implying universality by using statements such as “people's reasoning is biased…” should be avoided. “Which people?” should be a primary question asked by reviewers. We think this practice alone will energize more comparative work (Rozin 2009).
The experience of evolutionarily-oriented researchers attests to the power of such incentives. More than other researchers in the social sciences, evolutionary researchers have led the way in performing systematic comparative work, drawing data from diverse societies. This is not because they are interested in variation per se (though some are), but because they are compelled, through some combination of their scientific drive and the enthusiasm of their critics, to test their hypotheses in diverse populations (e.g., Billing & Sherman 1998; Buss 1989; Daly & Wilson 1988; Fessler et al. 2005; Gangestad et al. 2006; Henrich et al. 2005; Kenrick & Keefe 1992a; 1992b; Low 2000; Medin & Atran 2004; Schaller & Murray 2008; Schmitt 2005; Sugiyama et al. 2002; Tracy & Robins 2008).
Meta-analyses are often compromised because many studies provide little background information about the subjects. Journal editors should require explicit and detailed information on subject-pool composition (see Rozin 2001). Some granting agencies already require this. Comparative efforts would also be greatly facilitated if researchers would make their data readily available to any who asked; or, better yet, data files should be made available online. Sadly, a recent investigation found that only 27% of authors in psychology journals shared their data when an explicit request was made to them to do so in accordance with APA guidelines (Wicherts et al. 2006). Tests of generalizability require broad access to published data.
Given the general state of ignorance with regard to the generalizability of so many findings, we think granting agencies, reviewers, and editors would be wise to give researchers credit for tapping and comparing diverse subject pools. Work with undergraduates and the children who live around universities is much easier than going out into the world to find subjects. As things stand, researchers suffer a competitive disadvantage when seeking a more diverse sampling of subjects. Because many of the best journals routinely require that papers include several studies to address concerns about internal validity (Carver 2004), the current incentives greatly favor targeting the easiest subject pool to access. There is an often unrecognized tradeoff between the experimental rigor of using multiple studies and the concomitant lack of generalizability that easy-to-run subject pools entail (Rozin 2009). If the incentive structure came to favor non-student subject pools, we anticipate that researchers could also be more persuasive in encouraging their universities and departments to invest in building non-student subject pools – for example, by setting up permanent psychological and behavioral testing facilities in bus terminals, Fijian villages, rail stations, airports, and anywhere diverse subjects might find themselves with extra time.
Beyond this, departments and universities should build research links to diverse subject pools. There are literally untapped billions of people around the world who would be willing to participate in research projects, as both paid subjects and research assistants. The amounts of money necessary to pay people who might normally make less than $12 per day are trivial vis-à-vis the average research grant. Development economists, anthropologists, and public health researchers already do extensive research among diverse populations, and therefore already possess the contacts and collaborations. Experimentalists merely need to work on building the networks.
Funding agencies, departments, and universities can encourage and facilitate both professors and graduate students to work on expanding sample diversity. Research partnerships with non-WEIRD institutions can be established to further the goal of expanding and diversifying the empirical base of the behavioral sciences. By supplying research leaves, adjusted expectations of student progress, special funding sources, and institutionalized relationships to populations outside the university as well as to non-WEIRD universities, these organizations can make an important contribution to building a more complete understanding of human nature.
Although we are certainly not the first to worry about the representativeness of prevalent undergraduate samples in the behavioral sciences (Gergen 1973; Medin & Atran 2004; Norenzayan & Heine 2005; Rozin 2001; 2009; Sears 1986; Sue 1999), our efforts to compile an empirical case have revealed an even more alarming situation than previously recognized. The sample of contemporary Western undergraduates that so overwhelms our database is not just an extraordinarily restricted sample of humanity; it is frequently a distinct outlier vis-à-vis other global samples. It may represent the worst population on which to base our understanding of Homo sapiens. Behavioral scientists now face a choice – they can either acknowledge that their findings in many domains cannot be generalized beyond this unusual subpopulation (and leave it at that), or they can begin to take the difficult steps to building a broader, richer, and better-grounded understanding of our species.
We thank several anonymous reviewers and the following colleagues for their very helpful comments on earlier versions of this manuscript: Nicholas Epley, Alan Fiske, Simon Gächter, Jonathan Haidt, Shinobu Kitayama, Shaun Nichols, Richard Nisbett, Paul Rozin, Mark Schaller, Natalie Henrich, Daniel Fessler, Michael Gurven, Clark Barrett, Ted Slingerland, Rick Shweder, Mark Collard, Paul Bloom, Scott Atran, Doug Medin, Tage Rai, Ayse Uskul, Colin Camerer, Karen Wynn, Tim Wilson, and Stephen Stich.
1. We also use the term “WEIRD” throughout this paper to refer to the exceptional nature of this sample, and do not intend any negative connotations or moral judgments by the acronym.
2. Key steps include: (1) establishing nationally representative experimental samples in Europe (Fehr et al. 2002; Guth et al. 2003); (2) applying experimental methods in developing countries (Cardenas & Carpenter 2008; Tanaka et al., forthcoming); (3) creating university-wide subject recruiting rather than discipline-specific subject pools (most economic experiments); and (4) targeting specific samples of non-student subjects (Bellemare et al. 2008; Bellemare & Kröger 2007; Harrison et al. 2002; List 2004).
3. Comparative studies of individual decision-making processes using samples from small-scale and WEIRD populations, including explorations of risk aversion, prospect theory, and inter-temporal choice, yield mixed results. Sometimes similarities, both qualitative and quantitative, are found. Other times differences emerge (Cardenas & Carpenter 2008; Henrich & McElreath 2002; Hsu et al. 2009; Humphrey & Verschoor 2004a; 2004b; Kirby et al. 2002; Tanaka et al., forthcoming). So far, we do not see how to figure out which features will vary and which will not.
4. Rivers, for instance, found that cultures with a single color term for blue and green could still tell the difference between a blue and a green thread. (See Rivers 1901a).
5. Fessler also emphasizes important differences in shame and guilt between Americans and Indonesians.
6. To illustrate the limits of inferring universality from two-population comparisons, we note the finding that field independence on the Rod & Frame test is shown for both migratory foragers and Americans (Witkin & Berry 1975), yet East Asians and sedentary foragers show evidence for field dependence (Ji et al. 2000).
7. We are using “Western” to refer to those countries clustered in the northwest of Europe (the United Kingdom, France, Germany, Switzerland, the Netherlands, etc.), and British-descent societies such as the United States, Canada, New Zealand, and Australia. In particular, we are concerned about those populations from which most subjects in behavioral and psychological experiments are drawn. We recognize that there are important limitations and problems with this label, but we use it for convenience.
10. Efforts to replicate these findings in various small-scale societies have all failed (Marlowe & Wetsman 2001; Sugiyama 2004; Yu & Shepard 1998). These failures suggest a more complicated and context-specific set of evolutionary hypotheses (Marlowe et al. 2005; Swami & Tovée 2007).
11. The factor structure was less evident in a number of developing populations (e.g., Botswana, Ethiopia, Lebanon, Malaysia, Puerto Rico, Uganda), where independent assessments revealed that the data quality was poor. Future efforts to obtain better-quality data from these countries are important for demonstrating the universality of the Five Factor Model.
12. The robustness of the Five Factor Model is considerably weaker when it is derived from indigenous personality traits from other languages, although some of the five traits do still emerge (Benet-Martinez & Waller 1995; Cheung et al. 1996; Saucier et al. 2005).
13. As American and Canadian researchers at a Canadian university, we note that Canada is also a highly unusual population along the same lines as the United States, although perhaps not quite as pronounced as the United States, at least in terms of individualism (Hofstede 1980).
14. These examples illustrate a parallel problem for those interested in the differences between human and nonhuman cognition. Since most ape-human comparisons involve WEIRD people (or their children) as subjects, some seeming ape-human differences may not represent real species-level contrasts, but may instead reflect the psychological peculiarities of WEIRD people (Boesch 2007).
15. Thanks to Shaun Nichols for pointing this out.
16. We note that the heuristics and biases derived from this empirical work were, however, readily extended to “people” without hesitation (Kahneman et al. 1982).