logo

OpenCourseWare on critical thinking, logic, and creativity



Bias

We study samples to learn about a population. In this context, the population is just the set of things we are interested in; they could be people, but they could also be companies or fish or door-handles. The sample is the subset of the population that we actually investigate. We collect data from the sample, and calculate the figure we are interested in--say, the mean number of employees in a sample of Hong Kong companies, or the mean mercury level in a sample of locally caught fish. We want to conclude that the figure applies also to the population as a whole--that we now know something about the mean number of employees in Hong Kong companies generally, or the mean mercury level in locally caught fish generally. Under what circumstances are such inferences justified?

Part of the answer is that the sample must be random. (For the rest of the answer, you'll have to wait until the next section.) This doesn't mean that the sample is chosen in a haphazard way; often it takes a lot of care to make sure that a sample is random. What it means is that each item in the population has an equal probability of being included in the sample. (A random sample is sometimes also called a representative sample, although this name is somewhat misleading, since a random sample can fail to accurately represent the population, as we will see in the next section.)

A sample which is not random is called biased. In a biased sample, some members of the population have a greater chance of being included in the sample than others. Because of this, any figures calculated on the basis of the sample may not be applicable to the population as a whole. For example, suppose we collect a sample of 100 fish from the waters next to an industrial area, and calculate their mean mercury level. Clearly such a sample is biased, since all the fish living away from the industrial area have a zero chance of being included in the sample. One might well expect that the mercury level of fish living close to the industrial area will be higher, on average, than that of fish living elsewhere. This may or may not in fact be the case, but since it could be true, the mercury level you have calculated is not necessarily a good guide to the population as a whole.

Now suppose you take one netful of fish from every square mile of the relevant area. Now is your sample random? Suppose your net has 3-inch holes. Then you won't catch any fish smaller than 3 inches long, and fish less than 3 inches thick have a smaller chance of being caught than fish over three inches thick. So strictly speaking, your sample of fish still isn't random, since some fish have a higher chance of being caught than others. Is this a problem? You may have no reason to think that small fish have different mercury levels than big fish, but of course, it's possible. The beauty of a truly random sample is that it doesn't matter what other factors might be correlated with mercury level in fish, since each fish has an equal chance of being caught.

As this example illustrates, getting a truly random sample is often very difficult. Not all sources of bias are equally serious, but it is always best to obtain a sample that is as random as possible, as only then are the statistical results of the next section fully justified.

  • Self-test questions:
Identify possible sources of bias in each of the following examples:
  1. The Excite website invites visitors to participate in a poll on some item of current interest.
  2. A political poll is conducted by calling numbers picked at random from the telephone directory.
  3. A survey on views about redevelopment in a particular residential area is conducted by knocking on doors of a random sample of homes in that area.
  4. Tests to determine the incidence of hepatitis in the population are conducted on a random sample of people attending a blood donation centre.


Next: [4.3 Sampling error] Up: [4 Reasoning about samples] Previous: [4.1 Samples]
Back to: [Frontpage]

<< previous page


AddThis Social Bookmark Button

About

Search this site

Quote of the page

The great composer does not set to work because he is inspired, but becomes inspired because he is working.


Ernest Newman