Reed-Solomon Error Correction
How QR codes and Voyager survive cosmic radiation — with the same math
What is it?
Imagine you write a secret message on a whiteboard, but someone erases part of it. You still have enough pieces to reconstruct the whole thing. That's Reed-Solomon error correction.
In this cipher, a message is encoded as a polynomial — a mathematical curve. The curve's shape is determined by the message's ASCII codes. Then shares (points on the curve) are given to you. You only need a few of them to reconstruct the entire curve and read the message.
Concrete Example
Say the answer is HELLO. The ASCII codes are H=72, E=69, L=76, L=76, O=79. These become the coefficients of a degree-4 polynomial:
P(x) = 72 + 69·x + 76·x² + 76·x³ + 79·x⁴
Ten points on this curve are generated — each is an (x, y) coordinate pair. You're given the points. With any 5 of them (the minimum needed for degree 4), you can reconstruct the polynomial using Lagrange interpolation.
How It Works
- The answer word's ASCII codes become polynomial coefficients: c₀, c₁, c₂, ..., cₖ₋₁
- Shares are generated at x=1, 2, 3, ..., n: share(x) = (x, P(x))
- Higher difficulties add small noise to some shares (simulating bit errors)
- You pick any k shares and run Lagrange interpolation to recover all coefficients
- Convert each coefficient to its ASCII character → the answer
Step-by-Step Solving
import json
def lagrange(shares, mod):
"""Reconstruct polynomial from shares over GF(mod)."""
k = len(shares)
coeffs = [0] * k
for i in range(k):
xi, yi = shares[i]
terms = [1]
for j in range(k):
if i == j: continue
xj = shares[j][0]
# Multiply by (x - xj) / (xi - xj) in modular arithmetic
new_terms = [0] * (len(terms) + 1)
for p, val in enumerate(terms):
new_terms[p] = (new_terms[p] + val * (-xj)) % mod
new_terms[p + 1] = (new_terms[p + 1] + val) % mod
# Divide by (xi - xj)
inv = pow((xi - xj) % mod, -1, mod)
terms = [(v * inv) % mod for v in new_terms]
for p, val in enumerate(terms):
coeffs[p] = (coeffs[p] + yi * val) % mod
return coeffs
shares = [[1, 45], [2, 12], [3, 37], [4, 11], [5, 23]]
k = 5
mod = 97
coeffs = lagrange(shares[:k], mod)
answer = ''.join(chr(c) for c in coeffs)
print(answer) # The decoded word
Difficulty Table
| Level | What Happens | Time Estimate |
|---|---|---|
| 1-2 | 5 shares, degree 3-4, clean | <10s |
| 3-5 | Noise added to some shares (minor errors) | 20-60s |
| 6-7 | More shares, more noise | 1-3m |
Real-World Applications
- QR codes & barcodes: A smudged QR code still scans because Reed-Solomon corrects up to 30% damage. Same math powers Data Matrix, Aztec codes, and even classic 1D barcodes (some use CRC, others full Reed-Solomon).
- Voyager spacecraft: 12+ billion km away, signals are incredibly faint — Reed-Solomon lets NASA recover the original image from garbled transmission
- CDs and DVDs: Scratches and dust are corrected by cross-interleaved Reed-Solomon (CIRC)
- Satellite TV: Rain fade and interference are corrected before you see pixelation
- Disk storage (RAID 6): Uses Reed-Solomon to recover data when up to two drives fail
This is the thread connecting barcodes, QR codes, satellite TV, and this puzzle — they all rely on the same insight: a little redundant information lets machines reconstruct what noise or damage took away. They encode meaning for machine eyes, not human ones, which is exactly the spirit of this website.