Site's logo

KL, MY

7:30:00 AM

A birthday cake by Stephanie McCabe on Unsplash

Birthday Attacks: Why intuitive probabilities lie

Why do 23 people have a 50% chance of a shared birthday? This paradox explains the "birthday attack," a crypto exploit that finds hash collisions fast.

October 19, 2025

8 min read

I recently wrote a simple C program to “simulate birthdays” and found myself staring at an unintuitive result: with only 23 people in a room there’s already a ~50% chance two people share a birthday.

That number feels wrong until you realize we’re not asking whether a particular person matches a specific date — we’re asking whether any pair matches. This same combinatorial surprise supports the cryptographic “birthday attack”, where an adversary search for collisions in hash outputs.

Birthday Simulation

2025 May

A simple command-line application demonstrating the surprising likelihood of birthday collisions.

C
Cryptography
Algorithms

Before that, Birthday Paradox

Birthday paradox is the probability theory, that in a set of n randomly selected people, at least two people share the same birthday.

Given that:

  • There are 365 days in a year (ignoring leap years)
  • Each person’s birthday is equally likely to fall on any of the 365 days (no patterns to consider)

The probability P(n) that at least two people share a birthday can be calculated using the complementary probability that no two people share a birthday:

P(n)=1365365×364365×...×365n+1365P(n) = 1 - \frac{365}{365} \times \frac{364}{365} \times ... \times \frac{365 - n + 1}{365}

Let’s take 23 people (n=23) as an example:

P(n)=1365365×364365××343365=1k=0342365k3650.507350.73%\begin{aligned} P(n) &= 1 - \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{343}{365} \\[6pt] &= 1 - \prod_{k=0}^{342} \frac{365 - k}{365} \\[6pt] &\approx 0.5073 \\[6pt] &\approx 50.73\% \end{aligned}

This means that in a group of 23 people, there’s about a 50.73% chance that at least two people share the same birthday.

This does not feels right…

That is the point! Intuitively, one might think that with 365 days in a year, you’d need a much larger group to have a significant chance of shared birthdays. However, the key is that we’re considering all possible pairs of people, not a specific person’s birthday.

The chart below illustrates how the probability increases with the number of people in the group:

The Quadratic2 Growth of Possible Pairings
Probability that at least two people share a birthday as group size increases
50% Probability23 people
99% Probability57 people

The birthday paradox shows that in a group of just 23 people, there's a 50% chance that two people share the same birthday. With 70 people, the probability exceeds 99.9%.

The formula considers the number of unique pairs that can be formed from n people, which is given by the combination formula:

Pairs=n×(n1)2Pairs = \frac{n \times (n - 1)}{2}

So if we look at the previous example of 23 people, there are actually:

23×(231)2=253\frac{23 \times (23 - 1)}{2} = 253

possible pairs of people. Each of these pairs has a chance of sharing a birthday, which is why the overall probability becomes surprisingly high even with a relatively small group.

So what is a birthday attack?

Birthday attack is a method of brute force attack against cryptographic hash functions to find two different inputs that produce the same hash output (a collision) that exploits the mathematics behind the birthday paradox.

BY using birthday attack, an attacker can significantly reduce the number of attempts needed to find a collision compared to a naive brute-force approach.

For example, for a hash function with an n-bit output, a naive brute-force attack would require about 2n attempts to find a collision. However, using the birthday attack, an attacker can find a collision in approximately 2n/2 attempts.

This means that a hash function like SHA-1, which produces a 160-bit hash (digest), now the security property is effectively reduced to 80 bits against birthday attacks.

Hash FunctionOutput Size (n)Brute-Force Strength (Pre-image)Birthday Strength (Collision)Status
MD5128-bit2128264Broken
SHA-1160-bit2160280Broken
SHA-256256-bit22562128Secure

How an Attacker Uses a Collision

Finding a collision (two inputs, one hash) is bad, but the real danger is a chosen-prefix collision attack. An attacker can:

  1. Create two versions of a document: one “good” (e.g., a legitimate software update) and one “malicious” (e.g., malware).
  2. Use a birthday-style attack to find a small, invisible “padding” to add to both documents so that they produce the exact same hash value.
  3. The attacker gets the “good” document signed by a trusted authority (like a developer signing their software).
  4. The attacker then attaches that valid signature to their “malicious” document.

Because the hash values match, the signature appears valid on the malware, tricking systems into trusting and running it.

How it works?

The easier implementation of a birthday attack can be described in the following pseudocode:

function HASH(input) -> hash_value
function birthday_collision_search(max_iterations):
hashmap = hashmap_create(initial_capacity = max_iterations * 1.5)
iterations = 0
while iterations < max_iterations:
candidate = generate_random_input() // e.g., random byte array or random string
digest = HASH(candidate)
existing = hashmap_get(hashmap, digest)
if existing != NULL:
// Verify inputs are actually different (to avoid same-input duplicates)
if existing != candidate:
// Found collision: digest collides for existing and candidate
return (existing, candidate, digest)
else:
// same input seen again (rare with good randomness) — skip
continue
// Not seen: store mapping hv -> candidate
hashmap_put(hashmap, digest, candidate)
iterations = iterations + 1
// If reached here, no collision found within the budget
return NULL

This pseudocode basically generates random inputs, computes their hash values, and stores them in a hashmap. If a hash value is found that already exists in the hashmap, a collision is detected.

Features:

  • Memory Usage: Since each unique hash value needs to be stored, it has 0(n) space complexity.
  • Randomness Quality: The effectiveness of the attack relies on the quality of the random input generator. Poor randomness can lead to suboptimal results.
  • Time: O(n) hash computations and O(1) average lookup per sample → O(n).

Pollard’s Rho implementation

This is a more sophisticated and memory-efficient variation of the birthday attack. Unlike the basic approach, which requires storing every computed hash in a large hash map, Pollard’s Rho finds collisions using minimal memory.

Features:

  • Memory Efficiency: Uses O(1) space, making it suitable for environments with limited memory.
  • Time Complexity: About O(2n/2) operations

Cryptocurrency and Birthday Attacks

So, how does this affect cryptocurrencies?

Consider Bitcoin as an example. Bitcoin uses the SHA-256 hash function for its proof-of-work mechanism. If an attacker could find collisions in SHA-256 using a birthday attack, they could potentially create two different blocks with the same hash.

However, a 2128 attack is computationally infeasible as it is still a very large number. So while birthday attacks are a theoretical concern, they do not currently pose a practical threat to Bitcoin’s security. The same thing applies to other cryptocurrencies like Ethereum, which uses the Keccak-256 hash function (similar to SHA-3).

The Takeaway

The Birthday Paradox is a perfect example of how our intuition can fail us when dealing with probabilities. What starts as a fun party trick (finding shared birthdays) scales up to become a fundamental concept in digital security.

It teaches us that “very unlikely” events become “very likely” given enough attempts. This is why modern cryptographic standards like SHA-256 don’t just aim to be “hard” to break; they aim for a security margin so vast (like 2128) that even these clever “halving” attacks remain firmly in the realm of science fiction.

Reference