6. Conditional Probability Rule

6. General Conditional Probability Rule

In the last lecture we looked at the general conjunction rule, which involves the use of conditional probabilities. Now if you look at this expression for conditional probability:

P(A|B) = P(A and B)/P(B)

which I’m calling the “general conditional probability” rule, you might notice that it’s exactly the same formula as the general conjunction rule, it’s just rearranged. This is, in fact, how conditional probability is defined in standard probability theory. In this video I want to explore what this formula means from the sample space perspective, with the hope that it’ll help to develop some intuitions about why it works.

Let me say up front that I’m not going to be talking about the Bayes’ Rule formulation of conditional probability here, I’m going to save that for another video.

An Example

Let’s start off with a simple example to refresh our memories. Consider these two events: the first event is rolling a 2 on a dice roll. We’ll label this event with the number 2. The second event isrolling an even number on a dice roll. We’ll label this event “E”. Thus,

2 = the dice roll is 2
E = the dice roll is even

We’ll call a probability a “categorical probability” if we’re just talking about the probability of an event that is not conditional on other events, so we’re just asking about the probability of A occurring, we’re not asking about the probability of A given some other event occurs — P(A).

We contrast “categorical” probabilities with “conditional” probabilities -- here we’re asking about the probability of A, given that some other event, B, has occurred — P(A|B). Or in other words, we’re asking about the probability of A, on the condition that B occurs.

So, with these two events, what are the categorical probabilities?

That’s easy: the probability of rolling a 2 is 1/6, and the probability of rolling an even number is 3/6, or 1/2.

P(2) = 1/6
P(E) = 1/2

But of course we’re interested in conditional probabilities, so a natural question to ask is, what is the probability of rolling a 2 , given that it’s even?

P(2|E) = ?

We know the answer to this just by inspection: if the dice roll is either a 2, 4 or 6, then the probability that it’s a 2 is just one third. Thus,

P(2|E) = 1/3

Now let’s see how this answer squares with our definition of conditional probability.

Here’s our definition in terms of general events A and B:

P(A|B) = P(A and B) / P(B)

The probability of A given B is equal to the probability of the conjunction of A AND B, divided by the unconditional probability of B all by itself.

If we substitute our events for this example it looks like this:

P(2|E) = P(2 and E) / P(E)

The probability of rolling a 2, given that it’s even, equals the probability of rolling both a 2 and an even number, divided by the probability of just rolling an even number.

We know the value of the denominator term, it’s just one half; P(E) = 1/2. The numerator is the only tricky part. It’s a conjunction, and in the last video we covered the general conjunction rule. In fact this is just another way of writing the general conjunction rule. The question we want to ask is this:

How many possible dice rolls are there, where the dice roll is both a 2 and even?

Answer: Just one. Rolling a 2 is the only dice roll that is both a 2 and even. And the probability of rolling a 2 is just 1/6, right? So now we have our numbers:

P(2|E) = P(2 and E) / P(E)
= (1/6) / (1/2)
= 2/6
= 1/3

And this is, indeed, the answer that we figured out just by inspection.

So the formula works. Now, I know for a fact that a lot of students don’t have a good intuitive sense of why it works. They’re not sure exactly why the conjunction is relevant, and they’re not sure why we’re dividing by the probability of the conditioning event. To help see why the formula makes sense, it helps to look at the situation from the sample space perspective.

he grey square, labeled “omega”, represents the set of all possible outcomes of a probabilistic trial, or what we’ve been calling the “sample space”. Events are represented by subsets of this sample space, and probabilities of events are represented by the area of the subset associated with a given event.

So in this example the ovals labeled A and B are events, and the areas of A and B are proportional to the probability of A and B occurring.

If we consider the area of the whole sample space, omega, then we assign this probability 1, which means that the outcome has to land somewhere inside this area.

If we think of a probabilistic trial by analogy with throwing a dart at a board, then we’re saying that the dart is guaranteed to land somewhere inside the grey square.

Now, in this diagram A and B overlap. This represents events where A and B both occur at the same time, indicating that these aren’t mutually exclusive events. In previous lectures we used the example of drawing a playing card that is both a face card and a spade. In this video we used the example of getting a dice roll that is both an even number and a 2. The area of this overlap region represents the probability that both A and B will occur.

Now let’s ask the question, how is conditional probability represented on this diagram? Let’s think about what’s going on when we ask “what is the probability of A, given B?”.

What we’re saying is that in this case, we know some additional information that we didn’t know before. We know that event B occurred. This is like saying that we know that our dart landed somewhere inside B.

So we’re saying, given that we know the dart landed in B, what are the odds that it also landed in A? In other words, given that we know the dart landed inside B ,what are the odds that the dart landed in the overlap region between B and A?

The overlap region makes up a fraction of B, and that’s precisely the fraction that we’re trying to estimate with the conditional probability rule. We’re asking for the ratio of the overlap region to the area of B.

We can say the same thing with a slightly different emphasis. When we know that B is true, or that B occurred, what we’re saying is that we’re no longer dealing with the whole sample space. We’re dealing with a reduced sample space, and treating this as our new “omega”.

The conditional probability of A given B is the area of the overlap region, the events where A and B both occur, divided not by the area of the original sample space, but by the area of the reduced sample space, B.

Now I think it’s much easier to visualize what the numerator and the denominator represent in the general rule for conditional probability.

The Sample Space View and the Dice Problem

This discussion is consistent with what we did when we solved the dice problem.

Omega is equal to the set of equally probable outcomes 1 through 6. We imagine assigning an area to this set equal to 1.

The event of rolling a 2 is a subset of this sample space. In this case the 2 takes up exactly one sixth of the total area, so the probability is 1/6.

The event of rolling an even number is a different, larger subset. It takes up exactly one half of the total area, so the probability is 1/2.

Now, if we we’re considering the probability of rolling a 2, given that it’s even, we’re dealing with a reduced sample space. We’re treating the evens as the new sample space, and looking at the proportion of events corresponding to the number 2, as a fraction of this new sample space. And we get the answer, 1/3.

Note that in this case the overlap is complete. The 2s are entirely within the evens. This corresponds to a sample space that looks like this:

The overlap is complete, with the 2 a proper subset of the evens. But the formula works all the same.

If A is included inside B, then the intersection of A and B is just equal to the area of A. In this case, the intersection of 2 and E is just equal to the area of 2. This simplifies the calculation.

The numerator is just the area of 2, and the denominator is just the area of E. Plugging in the probabilities we get the answer, 1/3.

Now, for the sake of completeness, let’s work it the other way. What’s the probability of the dice landing even, given that it’s a 2?

We know the answer to this already, it’s got to be equal to 1, since 2 is an even number. The calculation gets it right too.

We use the fact, once again, that the area of overlap is just equal to the area of 2, so P(E and 2) is just equal to P(2). And now the conditional probability is just 1/2 over 1/2, which is equal to 1. It’s like asking, what’s the probability that the dart landed on the 2, given that it landed on the 2? That’s a sure bet!

Okay, that wraps up this introduction to the general conditional probability rule in probability theory. I hope this lecture gives you a better sense of why the rule has the form it does and why it works.