difficulty levels — click green to claim

> SHANNON ENTROPY

shannon-entropy difficulty: 2–5 also known as: information entropy, Boltzmann-Gibbs entropy, H-function

The idea in plain English: How much surprise is in a message? If I tell you "the sun will rise tomorrow" — that's zero surprise, zero information. If I tell you the winning lottery numbers — that's a lot of information, because it was unpredictable. Shannon entropy quantifies this mathematically. A message with high entropy is random and unpredictable. A message with low entropy is repetitive and predictable. The unit is the bit — each bit represents one yes/no question you'd need to ask to determine the message.

Why this really exists: Claude Shannon published "A Mathematical Theory of Communication" in 1948 while working at Bell Labs. It was the founding document of information theory — one of the most important intellectual achievements of the 20th century. Shannon showed that information is not a vague concept but a precise, measurable quantity. His work made possible everything from modems and hard drives to the internet itself. The puzzle asks you to compute the Shannon entropy of short strings — a direct application of his famous formula.

▸ The Formula

H = - Σ pᵢ · log₂(pᵢ)

Where:
• H = entropy in bits
• pᵢ = probability (frequency) of symbol i in the message
• Σ = sum over all unique symbols
• log₂ = logarithm base 2

For a string of length N with character counts cᵢ, the probability of character i is pᵢ = cᵢ / N. Plug each pᵢ into the formula, sum them up, and flip the sign. The result is the average number of bits per symbol required to encode the message optimally.

▸ Worked Example: "HELLO"

Let's compute the Shannon entropy of the string "HELLO" step by step.

String: HELLO (length N = 5)

Character frequencies:
H: 1 → p = 1/5 = 0.2
E: 1 → p = 1/5 = 0.2
L: 2 → p = 2/5 = 0.4
O: 1 → p = 1/5 = 0.2

Compute each term:
H: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464
E: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464
L: -0.4 · log₂(0.4) = -0.4 · (-1.322) = 0.529
O: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464

H = 0.464 + 0.464 + 0.529 + 0.464 = 2.049 bits

Normalized per symbol contributions:
H → 0.117 bits, E → 0.117 bits, L → 0.314 bits, O → 0.117 bits
Maximum possible for 5 characters (uniform distribution) ≈ 2.322 bits

The entropy of 2.049 bits per character means that on average, each character in "HELLO" carries about 2 bits of information. The total information content is 5 × 2.049 = 10.245 bits. This is the minimum number of bits needed to encode the string optimally — no compression algorithm can do better without losing information.

▸ How to Compute Entropy (Python)

import math

def shannon_entropy(s):
    if not s:
        return 0.0
    freq = {}
    for ch in s:
        freq[ch] = freq.get(ch, 0) + 1
    N = len(s)
    H = 0.0
    for count in freq.values():
        p = count / N
        H -= p * math.log2(p)
    return H

# Examples:
# >>> shannon_entropy("AAAA")
# 0.0 (all same character = zero entropy)
# >>> shannon_entropy("HELLO")
# 2.048795... (matches our calculation)
# >>> shannon_entropy("ABCD")
# 2.0 (4 uniform symbols = exactly 2 bits each)

💡 A fair coin flip (H/T) gives exactly 1 bit of entropy. A loaded coin that comes up heads 90% of the time gives about 0.469 bits — less surprise, less information.

▸ Difficulty & Puzzle Patterns

Difficulty	What to Expect	Example String	H (bits)
2	Short string, few unique chars, easy mental math	"AAB"	0.918
3	Longer string, more symbols, requires calculator/code	"HELLO WORLD"	≈3.022
4	Mixed-case alphanumeric, varied frequencies	"Node734_!"	≈3.125
5	Long string with many unique characters, precise float output	random 50-char string	≈5.200–5.700

⚠️ The puzzle expects the entropy value rounded to a specific precision (typically 6 decimal places). Use round(H, 6) or the specified precision in the puzzle prompt.

▸ Real-World Connections

Data compression (ZIP, gzip, PNG): Entropy sets the theoretical limit on lossless compression. Huffman coding, arithmetic coding, and LZW all try to reach Shannon's bound. ZIP files work by removing the redundancy that low-entropy data contains.
Cryptography & key randomness: A cryptographic key must have maximum entropy — every bit should be equally likely to be 0 or 1. Low-entropy keys are easily guessed (password crackers exploit this). Entropy measures the strength of a random number generator.
Machine learning (cross-entropy loss): Neural networks minimize cross-entropy between predicted and actual distributions during training. Classification tasks use cross-entropy as the loss function — it's Shannon entropy plus the Kullback-Leibler divergence.
Decision trees (ID3, C4.5): Algorithms like ID3 use information gain (reduction in entropy) to decide which feature to split on at each node. The feature that most reduces entropy is chosen first — this is how many decision tree classifiers work.
Natural language processing: Language has low entropy per character (≈1.0–1.5 bits for English) because of redundancy — that's why you can read txt wth mssng vwls. NLP models use entropy to evaluate language model quality.
Network monitoring & anomaly detection: Unexpected changes in entropy of network traffic patterns can signal intrusions, DDoS attacks, or malware communication.

← Back to all ciphers