What is Probability?

Introduction

In the last two lectures we’ve looked at the classical and the logical interpretations of probability. Now let’s turn to one of the most widely used interpretations of probability in science, the **“frequency”** interpretation.

The frequency interpretation has a long history. It goes back to Aristotle, who said that “the probable is that which happens often”. It was elaborated with greater precision by the British logician and philosopher (1834-1923) John Venn, in his 1866 book *The Logic of Chance*, and there are many important 20th century figures who have elaborated or endorsed some version of the frequency interpretation (e.g. Jerzy Neyman, Egon Pearson, Ronald Fisher, Richard von Mises, etc.)

The basic idea behind the frequency approach, as always, is pretty straightforward. Let’s turn once again to coin tosses. **How does the frequency interpretation define the probability of a coin landing heads on a coin toss?**

Well, let’s start flipping the coin, and let’s record the sequence of outcomes. And for each sequence we’ll write down the number of heads divided by the total number of tosses.

First toss, heads. So that’s 1/1.

Second toss is a tail. So that’s 1/2.

Third toss is a head, so now we’ve got 2/3.

Fourth is a tail, so now it’s 2/4.

Fifth is a tail, so know it’s 2/5.

Let’s cycle through the next five tosses quickly and see the result after ten tosses:

These ratios on the bottom are called **“relative frequencies”**, and a sequence like this is called a **relative frequency sequence**.

There are a few obvious observations we can make about this sequence. First, we see that it jumps around the value of 1/2, sometimes exactly 1/2, sometimes higher, sometimes lower.

It might be easier to look at the sequence in decimal notation to see this more clearly. And it to make it even more clear, let’s graph this sequence.

It’s more obvious now that the sequence bounces around the 0.5 mark. Three times it’s exactly 0.5, but we know it can’t stay at 0.5, since the next toss will move the ratio either above or below 0.5.

What’s also obvious, I think, is that *the range of variation gets smaller as the number of tosses increases*, and if we were to continue tossing this coin and recording the relative frequency of heads, we would expect that this number would get closer and close to 0.5 the more tosses we added.

Now, none of this is surprising, but **what does it have to do with the definition of probability**?

Everyone agrees that there are important relationships between probability and the relative frequencies of events. This is exactly the sort of behavior you’d expect if this was a fair coin that was tossed in an unbiased manner. We assume the probability of landing heads is 1/2, so the fact that the relative frequency approaches 1/2 isn’t surprising.

But what’s distinctive about frequency interpretations of probability is that they want to IDENTIFY probabilities WITH relative frequencies. On this interpretation, to say that the probability of landing heads is 1/2 IS JUST TO SAY that if you were to toss it, it would generate a sequence of relative frequencies like this one. Not exactly like this one, but similar.

For a case like this one,** the frequency interpretation will DEFINE the probability of landing heads as the relative frequency of heads that you would observe in the long run, as you kept tossing the coin**.

To be even more explicit, this long-run frequency is defined as the **limit** that the sequence of relative frequencies approaches, as the number of tosses goes to infinity. In this case it’s intuitive that the sequence will converge on 1/2 in the limit. And if that’s the case, then, according to this approach, we’re justified in saying that the probability of landing heads is exactly 1/2.

Actually, what we’ve done here is introduce two different relative frequency definitions:

You can talk about probabilities in terms of **finite relative frequencies**, where we’re only dealing with an actual finite number of observed trials; or we can talk about probabilities in terms of **limiting relative frequencies**, where we’re asked to consider what the relative frequency would converge to in the long run as the number of trials approaches infinity.

Some cases are more suited to one definition than the other.

Batting averages in baseball, for example, are based on actual numbers of hits over actual numbers of times at bat. It doesn’t make much sense to ask what Ty Cobb’s batting average would be if he had kept playing forever, since (a) we’d expect his performance to degrade as he got older, and (b) in the long run, to quote John Maynard Keynes, we’re all dead!

Coin tosses, on the other hand (and other games of chance) look like suitable candidates for a limiting frequency analysis. But it’s clear that more work needs to be done to specify just what the criteria are and what cases lend themselves to a limiting frequency treatment, and this is something that mathematicians and philosophers have worked on and debated over the years.

I’ve said a couple of times that frequency interpretations are widely used in science, and I’d like to add a few words now to help explain this statement. There’s a version of the frequency approach that shows up in ordinary statistical analysis, and it’s arguably what most of us are more familiar with. It’s based on the fact that sequences of random trails are formally related to proportions in a random sampling of populations.

Just to make the point obvious, when it comes to relative frequencies, there’s **no real difference** between **flipping** **a single coin ten times in a row** and **flipping ten coins all at once**. In either case some fraction of the tosses will come up heads.

In the **single coin case**, as you keep tossing the coins, we expect the relative frequency of heads to converge on 1/2.

In the **multiple coin case**, as you increase the number of coins that you toss at once — from ten to twenty to a hundred to a thousand — we expect the ratio of heads to number of coins to converge on 1/2.

This fact leads to an obvious connection between relative frequency approaches and standard statistical sampling theory, such as what pollsters use when they try to figure out the odds that a particular candidate will win an election. You survey a representative sampling of the population, record proportions of “Yes” or “No” votes, and these become the basis for an inference about the proportions one would expect to see if you surveyed the whole population.

All I’m drawing attention to here is the fact that frequency approaches to probability are quite commonly used in standard statistical inference and hypothesis testing.

Let’s move on to some possible objections to the frequency interpretation of probability. Let me reiterate that my interest here is not to give a comprehensive tutorial on the philosophy of probability. My goal, as always, is nothing more than **probability literacy** — we should all understand that probability concepts can be used and interpreted in different ways, and some contexts lend themselves to one interpretation better than another. These objections lead some to believe that the frequency interpretation just won’t cut it as a general theory of probability, but for my purposes I’m more concerned about developing critical judgment, knowing when a particular interpretation is appropriate and when it isn’t.

Let’s start with this objection. If probabilities are limiting frequencies, then how do we know what the limiting frequencies are going to be? The problem arises from the fact that these limiting behaviors are supposed to be inferred from the patterns observed in actual, observed, finite sequences, they’re not defined beforehand, like a mathematical function. So we can’t deductively PROVE that the relative frequencies of a coin toss will converge on 0.5. Maybe the coin is biased, and it’s going to converge on something else? Or let’s say that we now get a series of ten heads in a row? Does that indicate that the coin is biased and that it won’t converge on 0.5? But isn’t a series of ten heads in a row still consistent with it being a fair coin, since if you tossed the coin long enough you’d eventually get ten in a row just by chance.

I’m not saying these questions can’t be worked out in a satisfying why, I’m just pointing out one of the ways that the application of the limiting frequency approach to concrete cases can be difficult, or can be challenged.

Let’s move on to another objection, which is sometimes called the “reference class problem”. And this one applies both to finite and limiting frequency views.

Let’s say I want to know the probability that I, 46 years old at the time of writing this, will live to reach 80 years old. One way to approach this is to use historical data to see what proportion of people who are alive at 46, also survive to 80. The question is, how do we select this group of people from which to measure the proportion? A random sample of people will include men and women, smokers and non-smokers, people with histories of heart disease and people who don’t, people of different ethnicities, and so on. Presumably the relative frequency of those who live to age 80 will vary across most of these reference classes. Smokers as a group are less likely to survive than non-smokers, all other things being equal, right?

The problem for the frequency interpretation is that it doesn’t seem to give a single answer to the question “What is the probability that I will live to 80?”. Instead, what it’ll give me is **a set of answers relative to a particular reference class** — my probability *as a male*, my probability *as a non-smoker*, my probability *as a male non-smoker*, and so on.

To zero in on a probability specific to me, it seems like you need to define a reference class that is so specific that it may only apply to a single person, me. But then you don’t have a relative frequency anymore, what you’ve got is a “single-case” probability.

Single-case probabilities are another category of objection to frequency interpretations.

When I toss a coin, it doesn’t seem completely crazy to think that for this one, single coin toss, there’s an associated probability of that toss landing heads. But frequency interpretations have a hard time justifying this intuition. This is important to see: **on the frequency interpretation, probabilities aren’t assigned to single trials, they’re assigned to actual or hypothetical sequences of trials**. For a strict frequentist, it doesn’t make any sense to ask, "what is the probability of a single-case event?" But a lot of people think this concept should make sense, and so they reject frequency interpretations in favor of interpretations that do make sense of single-case probabilities.

* * *

So, for these and other reasons, many believe that the frequency interpretation just can’t function as a truly general interpretation of probability.

In the next two lectures we’ll look at interpretations of probability that, as we’ll see, are much better at handling single-case probabilities. These are the **subjective **interpretations and the **propensity** interpretations, respectively.