Bayes’ Theorem

The formula

\[P\left(S_i | O_j\right) = \frac{ P\left(O_j | S_i\right) P\left(S_i\right) }{ \displaystyle{\sum_{k=1}^{n} { P\left(O_j | S_k\right) P\left(S_k\right)} }}\]

In words, this says that the probability of State i given we observe Outcome j is equal to the fraction whose numerator is the probability of Outcome j given State i multiplied by the probability of State i all over the denominator which is the sum over all states of the probability of Outcome j given each state multiplied by the probability of each state.

Link to a probability tree

In setting up a problem for which we apply Bayes’ Theorem, we might depict the scenario using a probability tree. The first branching of the tree is for the different possible states. The second set of branching is for the outcomes that can be observed for each of the states.

Probability tree with states at the first branching, and outcomes for each state

The question is then posed as, “Given we do observe an outcome, what is the probability of a specified state?”

If you can formulate the problem as a probability tree as described, then the desired probability is simply the joint probability of the observed outcome for the specified state over the sum (over all states) of the joint probabilities involving the observed outcome.

An example

Let’s use a fairly topical issue (at the time of writing anyway). Imagine you are required to be tested for COVID-19. You are told that the test you are about to be given has a 0.999 probability of correctly saying an infected person does have COVID-19. The false positive rate is a little high though, with a positive test result occurring for an uninfected person with probability 0.03; each of these probabilities is a conditional probability, dependent on the state of the person being tested which we do not know. The estimates for these probabilities must have come from all the evidence gathered to date on the efficacy of the test being used. (My numbers are fictitious and only relevant for this example; please do not quote them as fact.)

Let’s now say that the rate of infection in the wider community is 0.02; again, this probability is subjective and very reliant on the efficacy of testing regimes being used.

If you got a positive result, what is the probability you are actually infected?

To answer this, we need to think about all the possible outcomes for our scenario; there are four, being the combinations of infected or not, and testing positive or not. Only two of these combinations are of interest to us.

The probability of a random person being infected and testing positive is \(0.02\times0.999=0.01998\) whereas the probability of a random person being clear and still testing positive is \(0.98\times0.03=0.0294\).

The probability that the random person (who has now tested positive) actually is infected is 0.01998 divided by the total probability that they were positive (0.01998+0.0294) is therefore \(\frac{0.01998}{0.01998+0.0294}\) = 0.4046173.

While this probability might not seem all that high (certainty is useful for knowing what actions should be taken even if we don’t like the situation), we need to put the value of the test in context. If you did not have the test, the probability that you have COVID-19 is just the population rate. We’ve seen what the probability is if you tested positive, but the complement is also of interest.

The probability that you are not infected when you return a negative test is simply the probability of being both negative and not infected, \(0.98\times0.97\) = 0.9506 divided by the probability of testing negative 0.95062 = 0.999979.

Being tested is a good idea, even if the testing method is far from perfect. If you test negative, you can be pretty sure you are in fact not infected. If you tested positive though, your probability of actually being infected has jumped significantly and you ought to consider isolation so that you do not contribute to the spread of COVID-19.

Bayes’ Theorem

A. Jonathan R. Godfrey

last updated 17 Jan 2022

The formula

Link to a probability tree

An example