In 1973 there were 12,763 applications for graduate study at the University of California, Berkeley, of whom 8442 were male and 4321 were female. Of the male applicants, 3738 were admitted (44%), whereas of the female applicants only 1494 were admitted (35%). This suggests that success at admission to Berkeley is positively correlated with being male. Assuming that male applicants and female applicants are equally well-qualified, this seems unfair; it suggests that there may be sex discrimination in the selection procedure.

Could it be that the difference in the admission rates between male and female applicants occurred purely by chance? This is a matter of hypothesis testing; under the null hypothesis that male and female applicants have an equal probability of being admitted, what is the probability of getting a difference at least this great between the number of males and females admitted? It turns out that, because the numbers of applicants are quite large, the probability of getting a difference this large by chance is extremely small. Given any reasonable significance level, it seems like one should reject the null hypothesis that male and female applicants have an equal chance of admission, and accept instead the hypothesis that admission is easier for men.

In such a case, the next step is to try to find out who is to blame for the discrimination. Are some departments particularly culpable? Strangely, an analysis of the admission figures of the individual departments showed that most of them had roughly equal admission rates for men and women. Out of 101 departments, only ten had a difference in admission rates between men and women which was significant at the 0.05 level (i.e. which would be expected to occur less than 5% of the time by chance alone). What's more, of these ten potentially discriminatory departments, four had a higher admission rate for men and six had a higher admission rate for women.

How is it possible for there to be such strong evidence of a correlation between sex and admission rates at the university level, and yet little or no evidence of any correlation at the level of the individual departments? This puzzle is an example of Simpson's paradox, or spurious correlation. The apparent correlation at the university level is spurious; the different admission rates aren't the result of different probabilities of admission for men and women, but of different choices about which department to apply to.

To see how this works, consider a simple hypothetical example. Suppose that department A receives 150 applications, 100 from men and 50 from women. Department B also receives 150 applications, but here 50 are from men and 100 are from women. Suppose that department A has an acceptance rate (for men and women) of 80%, and department B has an acceptance rate (for men and women) of 20%. Then department A admits 80 men and 40 women, whereas department B admits 10 men and 20 women. Overall, 90 men and 60 women are admitted, giving an acceptance rate of 60% for men and 40% for women.

In this simple example, even though both departments are entirely fair in their admissions, the overall admission rate is much higher for men. The reason is that department B is harder to get into, and more women apply to department B. In the real-life case at Berkeley, the conclusion of the investigators was similar; women tend to apply in larger numbers to departments that are harder to get into. The investigators decided that there was no evidence of sex discrimination in graduate admissions, despite first appearances. The fault (if any) seemed to lie earlier in the education process; those departments which were easier to get into tended to teach more mathematical subjects, and female students tended to get less preparation in mathematics at the primary and secondary school level.

Simpson's paradox can occur any time that data from different sources are pooled together. In this case, admissions data from different departments were pooled together to produce the admissions data for the university. Simpson's paradox occurs when correlations appear in the pooled data which were absent in the various data sets before pooling. In such cases, the correlation which appears in the pooled data is spurious. In fact, in some cases, a negative correlation between two quantities before pooling can appear as a positive correlation after pooling, and vice versa.

The source of the data used in this section is P. J. Bickel, E. A. Hammel and J. W. O'Connell (1975), "Sex bias in graduate admissions: data from Berkeley", Science 187: 398-404.

The following table shows (fictional) data for travel by bus and by taxi between two points. The "Pass." columns show the mean number of passengers per day travelling between those points, and the "Time" columns show the mean travel time in minutes.

  Pass. Time Pass. Time Pass. Time
Sun. 1524 38 231 23 1755  
Mon. 246 41 386 27 632  

Calculate the overall mean travel time for Sunday and for Monday. Is the overall mean travel time correlated with the day of the week? Is this correlation spurious?