Two things are correlated if the presence of one thing
makes the other thing more likely, or less likely. For
example, in humans being female is correlated with having a
long life (say, over 80 years). A woman is more likely to
live over 80 years than a man. This is a positive
correlation, since the first property (being female) makes the
second property (living over 80 years) more likely.
There is a negative correlation between smoking and
long life; if you smoke, you are less likely to live
over 80 years than if you don't smoke.
If two things are uncorrelated, the presence or
absence of the first thing has no effect of the probability of
the second thing. For example, if I say that the day of the
week is uncorrelated with the weather, I am saying that the
probability of rain is unaffected by what day it is; the
probability of rain is the same on a Sunday as on a Wednesday.
To say that two things are uncorrelated is the same as saying
that they are independent. Recall from before that if A
and B are independent, the conditional probabilities
P(A|B) and
P(A|not B) are the same--they are both equal to P(A).
That is, the probability of A given the presence of B
is just the same as the probability of A given the absence of
B. On the other hand, if A is positively (or negatively)
correlated with B, the probability of A given the presence of
B will be greater than (or less than) the probability of A
given the absence of B. We can use these facts to construct a
precise definition of correlation.
A and B are uncorrelated if P(A|B)=P(A|not B),
or equivalently, if P(A|B) = P(A).
They are positively correlated if
P(A|B) > P(A|not B),
or equivalently, if
P(A|B)>P(A).
They are negatively correlated if P(A|B) < P(A|not B),
or equivalently, if P(A|B) < P(A).
You throw two dice, one after the other, and you want
the sum to be 7. Is this outcome correlated with whether the
first die shows a 3?
Let A stand for "the dice sum to 7" and let B stand for
"the first die shows a 3". Since there are 6 possible outcomes
in which the two dice sum to 7, and since each outcome has a
probability of 1/36, the probability that the dice sum to 7 is
P(A) = 6 x 1/36 = 1/6. Now suppose the first die shows
a 3. Given this outcome, the second die must show a 4 if the
dice are to sum to 7. The probability of this is 1/6, so the
probability of getting a 7 given that the first die shows a 3
is P(A|B) = 1/6. Since P(A|B) = P(A), the two events are
not correlated; whether the dice sum to 7 is independent of
whether the first die shows a 3.
What if you want the sum to be 4?
Now let A stand for "the dice sum to 4", with B as
before. There are 3 outcomes in which the dice sum to 4, so
P(A) = 3 x 1/36 = 1/12. If the first die shows a 3,
then the second must show a 1 if the dice are to sum to 4.
Since the probability of this is 1/6, P(A|B) = 1/6. In this
case, then, P(A|B) > P(A). The two events are correlated;
getting a 3 on the first throw increases the chances that the
dice will sum to 4.
Suppose we collect data for 10 weeks, and we find that
it rains on 5 out of 10 Sundays and 3 out of 10 Wednesdays.
Does it follow that the day of the week is correlated with the
weather?
Not necessarily; even under the hypothesis that day of
the week and rainfall are uncorrelated, there is a reasonably
high probability of getting a difference this big purely by
chance. As we saw in the previous section, to properly test
the hypothesis that day of the week and rainfall are
uncorrelated, we need to use a test which has a low
probability of rejecting the hypothesis erroneously. We would
need to collect more data before we would be entitled to
conclude that rainfall really is correlated with the day of
the week. (Incidentally, recent
research
at Arizona State University suggests that
in parts of the United States rainfall is correlated with the
day of the week, with more rain falling on the weekends.)