Chan Alston | Full stack developer

I recently wrote a simple C program to “simulate birthdays” and found myself staring at an unintuitive result: with only 23 people in a room there’s already a ~50% chance two people share a birthday.

That number feels wrong until you realize we’re not asking whether a particular person matches a specific date — we’re asking whether any pair matches. This same combinatorial surprise supports the cryptographic “birthday attack”, where an adversary search for collisions in hash outputs.

Birthday Simulation

2025 May

A simple command-line application demonstrating the surprising likelihood of birthday collisions.

Cryptography

Algorithms

Before that, Birthday Paradox

Birthday paradox is the probability theory, that in a set of n randomly selected people, at least two people share the same birthday.

Given that:

There are 365 days in a year (ignoring leap years)
Each person’s birthday is equally likely to fall on any of the 365 days (no patterns to consider)

The probability P(n) that at least two people share a birthday can be calculated using the complementary probability that no two people share a birthday:

P(n) = 1 - \frac{365}{365} \times \frac{364}{365} \times ... \times \frac{365 - n + 1}{365}

Let’s take 23 people (n=23) as an example:

\begin{aligned} P(n) &= 1 - \frac{365}{365} \times \frac{364}{365} \times \cdots \times \frac{343}{365} \\[6pt] &= 1 - \prod_{k=0}^{342} \frac{365 - k}{365} \\[6pt] &\approx 0.5073 \\[6pt] &\approx 50.73\% \end{aligned}

This means that in a group of 23 people, there’s about a 50.73% chance that at least two people share the same birthday.

This does not feels right…

That is the point! Intuitively, one might think that with 365 days in a year, you’d need a much larger group to have a significant chance of shared birthdays. However, the key is that we’re considering all possible pairs of people, not a specific person’s birthday.

The chart below illustrates how the probability increases with the number of people in the group:

The Quadratic² Growth of Possible Pairings

Probability that at least two people share a birthday as group size increases

50% Probability23 people

99% Probability57 people

The birthday paradox shows that in a group of just 23 people, there's a 50% chance that two people share the same birthday. With 70 people, the probability exceeds 99.9%.

The formula considers the number of unique pairs that can be formed from n people, which is given by the combination formula:

Pairs = \frac{n \times (n - 1)}{2}

So if we look at the previous example of 23 people, there are actually:

\frac{23 \times (23 - 1)}{2} = 253

possible pairs of people. Each of these pairs has a chance of sharing a birthday, which is why the overall probability becomes surprisingly high even with a relatively small group.

So what is a birthday attack?

Birthday attack is a method of brute force attack against cryptographic hash functions to find two different inputs that produce the same hash output (a collision) that exploits the mathematics behind the birthday paradox.

BY using birthday attack, an attacker can significantly reduce the number of attempts needed to find a collision compared to a naive brute-force approach.

For example, for a hash function with an n-bit output, a naive brute-force attack would require about 2ⁿ attempts to find a collision. However, using the birthday attack, an attacker can find a collision in approximately 2^n/2 attempts.

This means that a hash function like SHA-1, which produces a 160-bit hash (digest), now the security property is effectively reduced to 80 bits against birthday attacks.

Hash Function	Output Size (n)	Brute-Force Strength (Pre-image)	Birthday Strength (Collision)	Status
MD5	128-bit	2¹²⁸	2⁶⁴	Broken
SHA-1	160-bit	2¹⁶⁰	2⁸⁰	Broken
SHA-256	256-bit	2²⁵⁶	2¹²⁸	Secure

How an Attacker Uses a Collision

Finding a collision (two inputs, one hash) is bad, but the real danger is a chosen-prefix collision attack. An attacker can:

Create two versions of a document: one “good” (e.g., a legitimate software update) and one “malicious” (e.g., malware).
Use a birthday-style attack to find a small, invisible “padding” to add to both documents so that they produce the exact same hash value.
The attacker gets the “good” document signed by a trusted authority (like a developer signing their software).
The attacker then attaches that valid signature to their “malicious” document.

Because the hash values match, the signature appears valid on the malware, tricking systems into trusting and running it.

How it works?

The easier implementation of a birthday attack can be described in the following pseudocode:

function HASH(input) -> hash_value

function birthday_collision_search(max_iterations):
    hashmap = hashmap_create(initial_capacity = max_iterations * 1.5)
    iterations = 0

    while iterations < max_iterations:
        candidate = generate_random_input()   // e.g., random byte array or random string

        digest = HASH(candidate)

        existing = hashmap_get(hashmap, digest)
        if existing != NULL:
            // Verify inputs are actually different (to avoid same-input duplicates)
            if existing != candidate:
                // Found collision: digest collides for existing and candidate
                return (existing, candidate, digest)
            else:
                // same input seen again (rare with good randomness) — skip
                continue

        // Not seen: store mapping hv -> candidate
        hashmap_put(hashmap, digest, candidate)

        iterations = iterations + 1

    // If reached here, no collision found within the budget
    return NULL

This pseudocode basically generates random inputs, computes their hash values, and stores them in a hashmap. If a hash value is found that already exists in the hashmap, a collision is detected.

Features:

Memory Usage: Since each unique hash value needs to be stored, it has 0(n) space complexity.
Randomness Quality: The effectiveness of the attack relies on the quality of the random input generator. Poor randomness can lead to suboptimal results.
Time: O(n) hash computations and O(1) average lookup per sample → O(n).

Pollard’s Rho implementation

This is a more sophisticated and memory-efficient variation of the birthday attack. Unlike the basic approach, which requires storing every computed hash in a large hash map, Pollard’s Rho finds collisions using minimal memory.

Features:

Memory Efficiency: Uses O(1) space, making it suitable for environments with limited memory.
Time Complexity: About O(2^n/2) operations

Cryptocurrency and Birthday Attacks

So, how does this affect cryptocurrencies?

Consider Bitcoin as an example. Bitcoin uses the SHA-256 hash function for its proof-of-work mechanism. If an attacker could find collisions in SHA-256 using a birthday attack, they could potentially create two different blocks with the same hash.

However, a 2¹²⁸ attack is computationally infeasible as it is still a very large number. So while birthday attacks are a theoretical concern, they do not currently pose a practical threat to Bitcoin’s security. The same thing applies to other cryptocurrencies like Ethereum, which uses the Keccak-256 hash function (similar to SHA-3).

The Takeaway

The Birthday Paradox is a perfect example of how our intuition can fail us when dealing with probabilities. What starts as a fun party trick (finding shared birthdays) scales up to become a fundamental concept in digital security.

It teaches us that “very unlikely” events become “very likely” given enough attempts. This is why modern cryptographic standards like SHA-256 don’t just aim to be “hard” to break; they aim for a security margin so vast (like 2128) that even these clever “halving” attacks remain firmly in the realm of science fiction.

Reference

Corn, P., Habib, M., & Suresh, T. (2019). Birthday Problem | Brilliant Math & Science Wiki. Brilliant.org. https://brilliant.org/wiki/birthday-paradox/
Avoiding collisions Cryptographic hash functions. (2016). Avoiding collisions Cryptographic hash functions. https://cs.wellesley.edu/~cs310/lectures/15_hash_functions_slides_handouts.pdf
Bellare, M., & Rogaway, P. (2005). Introduction to Modern Cryptography. https://web.cs.ucdavis.edu/~rogaway/classes/227/spring05/book/main.pdf
in. (2022). Why are hashes in the bitcoin protocol typically computed twice (double computed)?. Bitcoin Stack Exchange. https://bitcoin.stackexchange.com/questions/6037/why-are-hashes-in-the-bitcoin-protocol-typically-computed-twice-double-computed
Hash Function | Fingerprints for Data. (2025, January 16). Learnmeabitcoin.com. https://learnmeabitcoin.com/technical/cryptography/hash-function/
Stevens, M., Bursztein, E., Karpman, P., Albertini, A., & Markov, Y. (2017). The first collision for full SHA-1. https://shattered.io/static/shattered.pdf
Shirey, R. (2007). Internet Security Glossary, Version 2. https://doi.org/10.17487/rfc4949
Computerphile. (2025, May 13). Hash Collisions & The Birthday Paradox - Computerphile. YouTube. https://www.youtube.com/watch?v=jsraR-el8_o
ethereum.org. (2025). Ethereum Glossary. Ethereum.org. https://ethereum.org/glossary/#keccak-256