You download a 2 GB Linux ISO. The download page shows a SHA-256 hash. You run a command to verify it. All good — but what did you just do, and why does it matter? Checksums are a foundational concept in data integrity, and they show up in more places than most developers notice.
What a Checksum Is
A checksum is a value computed from a block of data. If the data changes — even by one bit — the checksum changes too. Compare checksums before and after transmission or storage and you can detect corruption.
The simplest possible checksum is a parity bit. Add up all the bits in a message; if the total is even, the parity bit is 0; if odd, it's 1. Append it to your message. On the receiving end, repeat the calculation and compare. If they differ, something changed.
Parity is too weak for real use — it misses any even number of bit flips — but the concept generalizes into much more powerful algorithms.
The Luhn Algorithm: Checksums in Your Wallet
Credit card numbers use the Luhn algorithm to catch typos. The check digit (the last digit of your card number) is computed from the others in a specific way. When you type a card number online, the browser can validate it instantly without a server round trip, just by recomputing the check digit.
Card: 4532 0151 2345 6782
^ check digit
Algorithm (simplified):
1. Starting from the second-to-last digit, double every second digit going right to left
2. If doubling produces > 9, subtract 9
3. Sum all digits (doubled and undoubled)
4. Valid if sum % 10 == 0
This catches any single-digit error and most adjacent transpositions — the most common human input mistakes. It's not a cryptographic algorithm — you can compute a valid Luhn number trivially — but it's a great example of using checksums for error detection rather than security.
CRC: The Error Detection Standard
Cyclic Redundancy Check (CRC) is the checksum algorithm in Ethernet frames, ZIP files, PNG images, and dozens of other formats. CRC32 produces a 32-bit checksum; CRC16 produces 16 bits.
The intuition: treat your data as a very large binary number and divide it by a predefined polynomial. The remainder of that division is the CRC. On the receiving end, divide again — if the remainder is zero, the data is intact.
CRC is efficient to compute in hardware, which is why it dominates at the network and storage layer. But it's not designed to be tamper-resistant — a determined attacker can modify data and adjust the CRC to match. CRC is for detecting accidental corruption, not deliberate tampering.
MD5 and SHA Checksums for File Downloads
When a site publishes a SHA-256 hash alongside a download, they're giving you a way to verify the file arrived intact and matches what they published:
# Download the file and verify (Linux/macOS)
sha256sum ubuntu-24.04-desktop.iso
# or
shasum -a 256 ubuntu-24.04-desktop.iso
Compare the output against the hash on the download page. A match means the file is byte-for-byte identical to what was published.
Important caveat: this verifies integrity but not authenticity. If an attacker controls the download server and the page showing the hash, they can substitute both. A checksum on the same server as the file is weak protection against a compromised server. For real authenticity guarantees, you need a cryptographic signature — the publisher signs the hash with their private key, and you verify with their public key.
MD5 checksums are still widely published for downloads, but MD5 is cryptographically broken and shouldn't be used for security purposes. For integrity checking of files, SHA-256 is the current standard. For security-sensitive checksums (passwords, digital signatures), use SHA-256 or better.
The Difference Between a Checksum and a Hash
The terms are often used interchangeably, but there's a real distinction:
- A checksum is any value computed from data to detect errors. It may be non-cryptographic (CRC, Luhn).
- A cryptographic hash is a checksum with additional security properties: collision resistance (hard to find two inputs with the same hash), preimage resistance (hard to reverse), and avalanche effect (tiny input change → completely different output).
All cryptographic hashes are checksums, but not all checksums are cryptographic hashes. When security matters — password storage, digital signatures, certificate fingerprints — you need a cryptographic hash.
See Hashing Algorithms Guide for a deeper look at SHA-256, SHA-3, bcrypt, and when to use each one.
HMAC: Adding Authentication to Checksums
A plain hash verifies integrity but not authenticity. HMAC (Hash-based Message Authentication Code) adds a secret key to the hash:
HMAC(key, message) = hash(key ⊕ opad || hash(key ⊕ ipad || message))
The result: only someone who knows the key can compute or verify the HMAC. This is how APIs sign requests (AWS Signature V4 uses HMAC-SHA256), how JWT HMAC tokens work (HS256), and how cookie signing works in most web frameworks.
import hmac, hashlib
key = b'secret-key'
message = b'order_id=12345&amount=99.99'
mac = hmac.new(key, message, hashlib.sha256).hexdigest()
# → "a3f9c2..."
If the message is tampered with (someone changes amount=99.99 to amount=0.01), the MAC won't match and you reject it.
Checksums in TCP/IP
Every IP packet has a header checksum. Every TCP and UDP segment has a checksum covering the payload. These are 16-bit ones' complement checksums — not cryptographic, but fast to compute in hardware and sufficient to catch the random bit errors that occur in network transmission.
The network stack verifies these automatically. If a packet arrives with a bad checksum, it's discarded and retransmission is requested (TCP) or the packet is silently dropped (UDP). By the time data reaches your application code, network-layer corruption has already been caught.
Verifying a Download: A Practical Example
Here's a complete workflow for verifying a downloaded file:
# 1. Download file and checksum
curl -O https://example.com/release-v2.0.tar.gz
curl -O https://example.com/release-v2.0.tar.gz.sha256
# 2. Verify (macOS)
shasum -a 256 -c release-v2.0.tar.gz.sha256
# 3. Verify (Linux)
sha256sum -c release-v2.0.tar.gz.sha256
# Output if valid:
# release-v2.0.tar.gz: OK
For even stronger guarantees, check whether the project publishes GPG signatures. Then you're not just verifying the file hash — you're verifying that someone with the project's private key signed it.
You can quickly generate and compare SHA-256 and MD5 hashes using the Hash Generator tool without installing anything. For data that needs to be transmitted in text form, the Base64 Encoder handles encoding binary output into ASCII-safe strings — which is how hash values often get embedded in HTTP headers and tokens.
Also worth reading: Encoding vs. Encryption vs. Hashing — the distinctions matter when you're choosing which tool to reach for in a given security scenario.
Checksums are one of those foundational mechanisms you rely on constantly without thinking about it. Every file you download, every network packet you receive, every credit card number you type — all of them are quietly verified by checksum logic running underneath.