# [T10] Hypothesis testing

So far, we have been looking at using samples to estimate a particular value, such as the mean mercury level in a population of fish. Another important use of samples is in hypothesis testing. For example, suppose you think that exactly half the fish in a given population are male. In your random sample of 100 fish, you find that only 45 are male. Does this give you a good reason to think that fewer than half the fish in the population are male?

We call the hypothesis that exactly half the fish are male the null hypothesis. The hypothesis that fewer than half the fish are male is called the alternative hypothesis. In order to decide whether or not to reject the null hypothesis given our evidence, we calculate the probability that we would get such a small number of male fish in our sample purely by chance, if the null hypothesis is true.

If the null hypothesis true, you should expect, on average, to find 50 male fish in the sample. But of course, since the sample is random, you won't get exactly 50 male fish every time. In fact, it is easy to calculate the standard deviation for the number of male fish in the sample; it is given by , where n is the sample size and p is the proportion of male fish in the population. Since our null hypothesis says that p=1/2, the standard deviation is 5.

We saw previously that, for many kinds of quantity, there is a probability of 2/3 that the quantity will lie within one standard deviation of the mean, and a probability of 0.95 that it will lie within two standard deviations of the mean. The actual number of male fish in our sample (45) is exactly one standard deviation from the hypothesized mean (50), so if the null hypothesis is true, the probability of getting a result at least this far from the mean purely by chance is 1/3. This is quite a big chance; it would be jumping to conclusions to reject the null hypothesis on the basis of this evidence alone.

So what evidence would it take for it to be reasonable to reject the null hypothesis? That depends on the situation. In some cases, you want it to be extremely unlikely that the result was due to chance before you reject null hypothesis; in other cases, it doesn't matter so much. The relevant measure is called the significance level of the test. For example, if the test has a significance level of 0.1, this means that if the null hypothesis is true, there's a probability of 0.1 that you will get results at least this extreme by chance alone. In other words, such a test has a 1 in 10 chance of rejecting the null hypothesis erroneously.

In most scientific contexts, a significance level of 0.1 is not considered good enough; the chance of erroneously rejecting the null hypothesis when it is true is too high. A significance level of 0.05 is frequently used in the social sciences, and often only a significance level of 0.01 or less is acceptable in the physical sciences. Note that the smaller the significance level, the more stringent the test.

We can answer the question of when to reject the hypothesis that exactly half the fish are male by choosing a significance level to use. Suppose we choose a significance level of 0.05. This means that we reject the null hypothesis if we obtain a result that has a probability of 0.05 or less of occurring by chance under the null hypothesis is true. Remembering that there is a probability of 0.95 that our result will lie within two standard deviations of the mean, we can conclude that there is a probability of 0.05 of getting a result at least 10 fish away from the mean. Adopting a significance level of 0.05, then, means deciding to reject the null hypothesis if our sample of 100 fish contains less than 40 males (or more than 60 males).

In a famous 1968 trial, U.S. pediatrician and anti-war activist Dr. Benjamin Spock was accused of conspiracy to violate the Selective Service Act, the mechanism by which Americans were drafted to fight in the Vietnam War. Spock's defence lawyers challenged jury selection procedure, on the grounds that none of the 12 jurors were women. (Women were perceived as being more favourably disposed towards Spock.) Suppose that 50% of eligible jurors are women, and that jurors are chosen at random. What is the probability of an all male jury occurring purely by chance? What does this tell us about the hypothesis that jury selection is random?