One-time pad and perfect secrecy
The one-time pad is one of the simplest and well known secure encryption schemes. Suppose we have a message \(m\), which is a sequence of \(L\) bits. The idea of the one-time pad is to take an additional \(L\) bits selected at random, called the key, and then create the ciphertext \(c\) by xor’ing the key and message together:
$$ c = k \oplus m. $$
To decrypt this ciphertext, we simply apply the key again:
$$ m = k \oplus m. $$
To understand why this is secure, we need to think like an attacker. Suppose we had access to \(c\). What can we say about \(m\)? We might try a few possible keys and see whether we recover anything interesting, but the a problem is immediate: with a length \(L\) message, there are \(2^L\) keys. And since the key is selected at random, we have no reason to suspect one more than other. This is like finding a needle in the haystack.
Intuitively we know there is no way we can expect to glean any information about the message from the ciphertext. But how can we formalize this? Claude Shannon captured this idea with the notion of perfect secrecy. An attacker might start with some initial impressions of what messages are likely or unlikely. Perfect secrecy demands that they have no reason to change these impressions based on intercepting a ciphertext alone. Mathematically, we might write this like
$$ P(m) = P(m \mid c), $$
where \(P(m)\) is the prior probability of a given message, and \(P(m \mid c)\) is the conditional probability of a message given the ciphertext.
We can use Bayes’ theorem to find an equivalent reformulation. Recall that this tells us that
$$ P(m \mid c) = \frac{P(c \mid m) P(m)}{P(c)}. $$
Combining the two equations we see that perfect secrecy is equivalent to
$$ P(c) = P(c \mid m), $$
or in English: the ciphertext must be independent of the message.
It’s easy to verify this for the one-time pad. Recalling that we choose the key at random, we have
$$ P(c \mid m) = P( c = m \oplus k) = \frac{1}{2^L}. $$
Here we see that we have more than independence: the conditional probability is uniform. Thus, we’ve shown that the one-time pad has perfect secrecy.
If you look at a book like Boneh and Shoup you’ll see this idea of perfect secrecy formulated in a slightly more general context. Here they say a Shannon cipher involves a key space \(\mathcal{K}\), a message space \(\mathcal{M}\), and a cihpertext space \(\mathcal{C}\), together with an encoder \(E : \mathcal{K} \times \mathcal{M} \to \mathcal{C}\) and decoder \(D : \mathcal{K} \times C \to \mathcal{M}\). To say that decoding “works”, mathematically, is to say that for all \(k \in \mathcal{K}, m \in \mathcal{M}\), we have
$$ D(k, E(k, m)) = m. $$
In this context, the requirement of perfect security demands that, for all \(m_0, m_1 \in \mathcal{M}\) and \(c \in \mathcal{C}\), if \(k\) is uniformly distributed over \(\mathcal{K}\), then
$$ P[ E(k, m_0) = c] = P[ E(k, m_1) = c]. $$
This is just a generalization of what we saw with the one-time pad!
Perfect secrecy is a lot to ask for. Although the one-time pad has this property, it comes at a significant cost: we need a key that is as large as our messages. This is typically not tractable, but the scheme is still important as a conceptual “gold standard”.