The short answer

The expression, $p(a|b)$, read as the probability of “$a$ given $b$”, or as the conditional probability of event $a$ with respect to event $b$, cannot generally provide the probability of “$a$ given $b$” using the same probability function that assigns probabilities to $a$ and $b$. This is because such an event does not exist in the same space as $a$ and $b$. In fact, $p(a|b)$ is just a notation expressing the ratio of two probabilities. Therefore, our brain, which toggles back-and-forth between natural languages and mathematics and needs an $x$ to compute or interpret “the probability of $x$,” likely struggles to interpret $p(a|b)$, making it appear more abstract.

An Explanation

This seems to be what happened to Erdős (one of the most outstanding mathematicians of the 20th century) when he tried to solve the Monty Hall Problem, which has a straightforward solution using Bayes Theorem1. He did not believe the analytic solution for years until a computer demonstrated the correct answer.

The point I want to clarify is the meaning of “the event of ‘$a$ given $b$’”: Loosely speaking, it’s akin to the case of the imaginary complex number $i$, which does not exist on the real line. Nonetheless, even in the 16th century, mathematicians were using the notation $\sqrt{-1}$ for practical computations to solve real-number equations.

Let’s delve into the details. When we talk about a probability, we mean a pair $(A,p)$, where $A$ (the set of events) is a set of subsets of some other set $X$, and $p$ (the probability distribution) is a function $p: A \to [0,1]$. For example, let $X={1,2,3}$, and $A$ be the set of all subsets of $X$, i.e., $A={\emptyset, X, {1}, {2}, {3}, {1,2}, {2,3}, {1,3}}$. We can define $p$ as follows:

$$p(a)=\frac{|a|}{3},$$

where $|a|$ is the number of elements in the subset $a \in A$. To create a functional environment, that is, for a general $A$, we require that $A$ be at least a boolean algebra. This means that $\emptyset, X \in A$ and for any $a, b$ in $A$, $a \cap b$, $a \cup b$, and $X \setminus a$ are in $A$. In our terms, if $a$ and $b$ are events, we have the events ($a$ and $b$), ($a$ or $b$), and (not $a$), all residing again in $A$. We can further combine the operators and, or, not to construct the event $(a \Rightarrow b)$, called “$a$ implies $b$”, defined as $\big((\text{\textbf{not}}\ a)$ or $b\big)$, again in $A$. We expect $p$ to respect the boolean algebra structure of $A$, i.e., to satisfy

$$p(a \cup b) = p(a) + p(b)$$

whenever $a \cap b = \emptyset$; in addition to the rules $p(\emptyset) = 0$ and $p(X) = 1$. The pair $(A,p)$ is called a probability space. One can easily verify that the above example satisfies all these requirements.

The conditional probability of $a$ given $b$, denoted $p(a|b)$, is then defined as the number

$$p(a \cap b) / p(b).$$

Now, suppose we seek an operation, like the ones we have previously considered (or, $\Rightarrow$, and, not), which assigns to every pair $(a, b)$ of events, an event $c_{a, b} \in A$ such that

$$p(a|b) = p(c_{a, b}).$$

Consider the above example. Take $a = {1,2}$ and $b = {2,3}$. Then

$$ \begin{aligned} p(a|b) &= p(a \cap b) / p(b), \ p(a \cap b) &= p({2}) = 1/3, \ p(b) &= 2/3, \ p(a|b) &= 1/2. \end{aligned} $$

If such an operator existed, then we would have an event $c_{a, b} \in A$ with $p(c_{a, b}) = 1/2$. One can easily verify that no event in $A$ has a probability of 1/2. Therefore, the required operator does not exist (as a function $A \times A \to A$)2.

One can accept and work with this, continuing to compute as most people do, or as mathematicians formerly did with $\sqrt{-1}$. Nonetheless, a discerning mathematician would ask for a meaningful interpretation of the notation “a|b”, that is, to concretize it as an event. As often happens, what is misleadingly termed abstract mathematics provides the desired concretization:

Given a probability space $(A,p)$, there is a probability space $(\hat{A}, \hat{p})$, such that $A \subseteq \hat{A}$, and $\hat{p}(a) = p(a)$ for all $a \in A$. Moreover, for every $a, b \in A$, there is an event $c_{a, b} \in \hat{A}$ such that $p(a|b) = \hat{p}(c_{a, b}).$

While there are several options for $\hat{A}$, one of the most natural for me was the Goodman–Nguyen–Van Fraassen algebra, introduced following the ideas of philosopher-scientist Van Fraassen in the 1970s (see his related work here). Even a serious undergraduate student can understand its construction. What I want to emphasize here is that its very existence already clarifies the concept of conditional probability, a natural occurrence in mathematics.

For those interested, a wide range of papers on the algebra of conditional events exist. I would like to cite a recent one by Vera Koponen that caught my attention, as I am originally a logician: link. It provides deeper insight into so-called (lifted-) Bayesian Networks.

Lastly, with the IT revolution and the industrialization of higher education, we increasingly neglect “pure mathematics” in favor of industrial mathematics. For example, several linear algebra courses are now limited to matrix computations, quickly advancing to AI applications. It would not surprise me to see the neglect of mathematics from the last two centuries lead to rediscoveries in the next fifty years. I urge you to look at the campaign «Protect Pure Maths».


  1. The fact that we refer to the identity $p(a|b) = p(b|a)p(a) / p(b)$ as a theorem, which is an immediate consequence of the definition, indicates the conceptual complexity of conditional probability. ↩︎

  2. Be cautious here: given an event $b$ with a non-zero probability $p(b)$, the conditional probability map $p_b: a \mapsto p(a \cap b) / p(b)$ is well defined, and by definition, $p(a|b) = p_b(a \cap b)$. One might argue that the event we seek is just $(a \cap b)$, and it resides in the same space as $a$ and $b$. However, recall that by a probability space, we mean a fixed pair $(A,p)$. ↩︎