> SHANNON ENTROPY
The idea in plain English: How much surprise is in a message? If I tell you "the sun will rise tomorrow" — that's zero surprise, zero information. If I tell you the winning lottery numbers — that's a lot of information, because it was unpredictable. Shannon entropy quantifies this mathematically. A message with high entropy is random and unpredictable. A message with low entropy is repetitive and predictable. The unit is the bit — each bit represents one yes/no question you'd need to ask to determine the message.
Why this really exists: Claude Shannon published "A Mathematical Theory of Communication" in 1948 while working at Bell Labs. It was the founding document of information theory — one of the most important intellectual achievements of the 20th century. Shannon showed that information is not a vague concept but a precise, measurable quantity. His work made possible everything from modems and hard drives to the internet itself. The puzzle asks you to compute the Shannon entropy of short strings — a direct application of his famous formula.
▸ The Formula
Where:
• H = entropy in bits
• pᵢ = probability (frequency) of symbol i in the message
• Σ = sum over all unique symbols
• log₂ = logarithm base 2
For a string of length N with character counts cᵢ, the probability of character i is pᵢ = cᵢ / N. Plug each pᵢ into the formula, sum them up, and flip the sign. The result is the average number of bits per symbol required to encode the message optimally.
▸ Worked Example: "HELLO"
Let's compute the Shannon entropy of the string "HELLO" step by step.
Character frequencies:
H: 1 → p = 1/5 = 0.2
E: 1 → p = 1/5 = 0.2
L: 2 → p = 2/5 = 0.4
O: 1 → p = 1/5 = 0.2
Compute each term:
H: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464
E: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464
L: -0.4 · log₂(0.4) = -0.4 · (-1.322) = 0.529
O: -0.2 · log₂(0.2) = -0.2 · (-2.322) = 0.464
H = 0.464 + 0.464 + 0.529 + 0.464 = 2.049 bits
Normalized per symbol contributions:
H → 0.117 bits, E → 0.117 bits, L → 0.314 bits, O → 0.117 bits
Maximum possible for 5 characters (uniform distribution) ≈ 2.322 bits
The entropy of 2.049 bits per character means that on average, each character in "HELLO" carries about 2 bits of information. The total information content is 5 × 2.049 = 10.245 bits. This is the minimum number of bits needed to encode the string optimally — no compression algorithm can do better without losing information.
▸ How to Compute Entropy (Python)
def shannon_entropy(s):
if not s:
return 0.0
freq = {}
for ch in s:
freq[ch] = freq.get(ch, 0) + 1
N = len(s)
H = 0.0
for count in freq.values():
p = count / N
H -= p * math.log2(p)
return H
# Examples:
# >>> shannon_entropy("AAAA")
# 0.0 (all same character = zero entropy)
# >>> shannon_entropy("HELLO")
# 2.048795... (matches our calculation)
# >>> shannon_entropy("ABCD")
# 2.0 (4 uniform symbols = exactly 2 bits each)
💡 A fair coin flip (H/T) gives exactly 1 bit of entropy. A loaded coin that comes up heads 90% of the time gives about 0.469 bits — less surprise, less information.
▸ Difficulty & Puzzle Patterns
| Difficulty | What to Expect | Example String | H (bits) |
|---|---|---|---|
| 2 | Short string, few unique chars, easy mental math | "AAB" | 0.918 |
| 3 | Longer string, more symbols, requires calculator/code | "HELLO WORLD" | ≈3.022 |
| 4 | Mixed-case alphanumeric, varied frequencies | "Node734_!" | ≈3.125 |
| 5 | Long string with many unique characters, precise float output | random 50-char string | ≈5.200–5.700 |
⚠️ The puzzle expects the entropy value rounded to a specific precision
(typically 6 decimal places). Use round(H, 6) or the specified precision in the
puzzle prompt.
▸ Real-World Connections
- Data compression (ZIP, gzip, PNG): Entropy sets the theoretical limit on lossless compression. Huffman coding, arithmetic coding, and LZW all try to reach Shannon's bound. ZIP files work by removing the redundancy that low-entropy data contains.
- Cryptography & key randomness: A cryptographic key must have maximum entropy — every bit should be equally likely to be 0 or 1. Low-entropy keys are easily guessed (password crackers exploit this). Entropy measures the strength of a random number generator.
- Machine learning (cross-entropy loss): Neural networks minimize cross-entropy between predicted and actual distributions during training. Classification tasks use cross-entropy as the loss function — it's Shannon entropy plus the Kullback-Leibler divergence.
- Decision trees (ID3, C4.5): Algorithms like ID3 use information gain (reduction in entropy) to decide which feature to split on at each node. The feature that most reduces entropy is chosen first — this is how many decision tree classifiers work.
- Natural language processing: Language has low entropy per character (≈1.0–1.5 bits for English) because of redundancy — that's why you can read txt wth mssng vwls. NLP models use entropy to evaluate language model quality.
- Network monitoring & anomaly detection: Unexpected changes in entropy of network traffic patterns can signal intrusions, DDoS attacks, or malware communication.