How Snowflake IDs Power Twitter, Discord, and Distributed Systems

How Snowflake IDs Power Twitter, Discord, and Distributed Systems

In 2010, Twitter had a problem. Their MySQL-backed auto-increment tweet IDs were buckling under load, the team was sharding the database to survive, and they needed a single 64-bit integer for every new tweet that was unique, roughly time-ordered, and could be generated without coordinating with any central server. Out of that came Snowflake — a tiny, beautifully simple ID scheme that has since shown up inside Discord, Instagram, Sony, and dozens of fintech and gaming backends. If you've ever copied a Discord message link and noticed that giant 18-19 digit number at the end, you've already used a Snowflake.

This post unpacks how those IDs work, why distributed systems keep reinventing them, and the gotchas — clock skew, sequence overflow, epoch choices — that bite teams the first time they roll their own.

Why Auto-Increment Breaks at Scale

Single-database auto-increment is wonderful until it isn't. Every INSERT has to round-trip to the primary, take a lock on the counter, and only then can the row land. That's fine at hundreds of writes per second. At hundreds of thousands, the counter itself becomes the bottleneck — and worse, if you shard the database across N machines, you no longer have one counter to increment. You have N, and they conflict.

Teams usually try a few stopgaps first:

  • UUID v4: random 128 bits, no coordination needed, but unsortable, awful for B-tree index locality, and twice the storage of a 64-bit int.
  • Database sequences with offsets: shard A starts at 1, increments by 10; shard B starts at 2, increments by 10. Works until you reshard.
  • Centralized ticket server: one process hands out batches of IDs. Now you have a single point of failure, and every service has to call it.

Snowflake's pitch is: pack a millisecond timestamp, a machine ID, and a per-millisecond sequence into 64 bits. Each node generates its own IDs locally, no coordination required, and the result still sorts roughly by time when you slap it into an index.

The 64-Bit Layout

The classic Twitter layout splits 64 bits like this:

 bit  63           62 ─ 22 (41 bits)              21 ─ 12 (10)   11 ─ 0 (12)
     ┌──┬───────────────────────────────────────┬────────────┬──────────────┐
     │ 0│      timestamp (ms since epoch)       │ machine ID │  sequence    │
     │  │      69 years from chosen epoch       │  1024 IDs  │  4096/ms     │
     └──┴───────────────────────────────────────┴────────────┴──────────────┘
       sign         most-significant bits                       least-sig
       (always 0)
64-bit Snowflake layout 0 1 sign timestamp (ms since custom epoch) 41 bits → ≈ 69 years of capacity machine ID 10 bits → 1024 nodes sequence 12 bits → 4096 / ms / node
The classic Twitter layout: 1 + 41 + 10 + 12 bits, packed into a positive signed int64.

Bit budget:

  • 1 sign bit, always 0. Keeps the ID a positive signed 64-bit integer so every language with a long/bigint type can read it natively.
  • 41 bits of timestamp in milliseconds. That's 2^41 / (1000 * 60 * 60 * 24 * 365) ≈ 69 years of capacity from your chosen epoch.
  • 10 bits of machine ID. That's 1,024 nodes, usually split into 5 bits of datacenter and 5 bits of worker.
  • 12 bits of sequence. Per-millisecond counter, resets to 0 each millisecond. 2^12 = 4,096 IDs per node per millisecond, or roughly 4 million per second per node.

Multiply that out: 4,096 IDs/ms × 1,024 nodes = ~4.1 billion unique IDs per millisecond, cluster-wide. Twitter has never come close to using a fraction of that.

Generating One in Code

flowchart TD
  Start([nextId called]) --> Now[now = ms since epoch]
  Now --> Cmp{now < lastTs?}
  Cmp -- yes --> Crash[/Throw "clock moved backwards"/]
  Cmp -- no --> Same{now == lastTs?}
  Same -- yes --> Inc[seq = seq + 1 & 0xFFF]
  Inc --> Wrap{seq == 0?}
  Wrap -- yes --> Spin[busy-wait until next ms]
  Spin --> NewMs[now = next ms]
  NewMs --> Reset
  Wrap -- no --> Pack
  Same -- no --> Reset[seq = 0]
  Reset --> Pack[shift &amp; OR:<br/>ts &lt;&lt; 22<br/>| machine &lt;&lt; 12<br/>| seq]
  Pack --> Out([return id])

Here's a minimal generator. Real implementations in Twitter's original Scala code and language ports add monitoring and lock-free atomics, but the core is this:

const EPOCH = 1288834974657n; // Twitter's chosen epoch: Nov 4, 2010
const MACHINE_ID_BITS = 10n;
const SEQUENCE_BITS = 12n;
const MAX_SEQUENCE = (1n << SEQUENCE_BITS) - 1n; // 4095

let lastTimestamp = -1n;
let sequence = 0n;

function nextId(machineId) {
  let now = BigInt(Date.now());

  if (now < lastTimestamp) {
    throw new Error(`Clock moved backwards by ${lastTimestamp - now}ms`);
  }

  if (now === lastTimestamp) {
    sequence = (sequence + 1n) & MAX_SEQUENCE;
    if (sequence === 0n) {
      // Sequence exhausted this ms — spin until the next ms
      while (now <= lastTimestamp) now = BigInt(Date.now());
    }
  } else {
    sequence = 0n;
  }

  lastTimestamp = now;

  return ((now - EPOCH) << (MACHINE_ID_BITS + SEQUENCE_BITS))
       | (BigInt(machineId) << SEQUENCE_BITS)
       | sequence;
}

A few things worth noticing. The shift assembles bits left-to-right: timestamp, then machine, then sequence. The sequence wraps with & MAX_SEQUENCE, and when it wraps to 0 within the same millisecond, the generator busy-waits for the clock to tick. And critically — the function refuses to issue an ID if the clock just moved backwards.

Want to play with this without writing code? The Snowflake ID Generator lets you generate IDs and decode existing ones into their timestamp, machine ID, and sequence components. It's the fastest way to develop intuition for the bit layout.

Clock Skew Will Bite You

Snowflake assumes the system clock moves forward monotonically. NTP can step the clock backward when correcting drift, and a backwards step plus naive "use current time" logic produces duplicate IDs — a catastrophic bug, because your database might happily insert two rows with the same primary key on different shards before anyone notices.

Production-grade implementations handle this in one of three ways:

  1. Reject and crash if now < lastTimestamp. Loud, safe, requires the operator to investigate. Twitter's choice.
  2. Use a monotonic clock source (e.g. CLOCK_MONOTONIC on Linux, process.hrtime() in Node, or System.nanoTime() in Java) to derive timestamps. Doesn't go backward but doesn't correspond to wall time.
  3. Wait it out if the skew is small (< 5ms). Reasonable for occasional NTP nudges, dangerous if the clock leaps by minutes.

Discord generates billions of message IDs per day with their own Snowflake variant — they use the message ID itself as the bucket key for shard routing. Clock skew on a Discord shard would corrupt the very mechanism that finds the data.

If you're decoding raw timestamps from Snowflakes (theirs or yours) for debugging, the Unix Time Converter handles the conversion between epoch milliseconds and human-readable dates.

Choosing an Epoch

Twitter's epoch is November 4, 2010. Discord's is January 1, 2015. Yours should be the day you ship — not the Unix epoch (1970).

Why? Because 41 bits gives you 69 years of capacity from whatever you pick, and burning the first 40 of those years on history that predates your company is wasteful. Pick a recent epoch, document it in your codebase, and treat it as an immutable constant. Changing the epoch later renumbers every ID you've issued.

// Don't do this — wastes 40+ years of timestamp space
const EPOCH = 0n;

// Do this — record exactly when, and never change it
const EPOCH = 1704067200000n; // 2024-01-01 00:00:00 UTC

Cousins: ULID, KSUID, ObjectId

Snowflake (64 b) ts 41 m10 seq12 UUID v7 (128 b) ts 48 random 74 + version 4 + variant 2 ULID (128 b) ts 48 random 80 ObjectId (96 b) ts 32 rand 40 counter 24 timestamp machine ID counter / sequence randomness
All four schemes share a "timestamp prefix + entropy or routing tail" shape. They differ mainly in width and how the tail is split.

Snowflake is one point in a design space, not the only answer. Each cousin makes a different trade-off:

  • UUID v7 is the modern UUID variant: 48-bit Unix-ms timestamp + 74 bits of randomness, formatted as the familiar 36-char hyphenated string. Time-sortable like Snowflake, but 128 bits and no machine ID. Generate one with our UUID Generator.
  • ULID is 128 bits encoded as Crockford base32 (01ARZ3NDEKTSV4RRFFQ69G5FAV). 48-bit timestamp + 80 bits randomness. URL-safe, lexicographically sortable, and human-typeable in a way that UUIDs aren't. The ULID Generator shows the layout.
  • MongoDB ObjectId predates Snowflake by years: 32-bit timestamp + 5 random bytes (replacing the original machine ID + PID) + 3-byte counter, totaling 96 bits as 24 hex chars. Same idea, slightly different bit budget. The MongoDB ObjectId Generator shows how it decomposes.
  • Sonyflake trades sequence space for a 39-bit timestamp in centiseconds and 16 bits of machine ID — fits more nodes, generates fewer IDs per node per second.
  • Instagram's scheme described in their sharding-IDs post packs a logical shard ID instead of a physical machine ID — so the same ID tells you which Postgres database holds the row.

The pattern is identical across all of them: timestamp prefix for sortability, plus enough entropy or routing info to dodge collisions.

When Snowflake Is the Wrong Tool

Snowflake assumes you have a controlled fleet of generators with assigned machine IDs. That doesn't fit every problem.

  • Browsers and untrusted clients can't get a unique machine ID safely. Use UUID v4 or v7 from the client, optionally rewrap server-side.
  • Single-tenant apps with one writer don't need 1,024 machine slots. A 64-bit auto-increment is simpler.
  • Truly random IDs (session tokens, API keys, password reset codes) should not encode a timestamp — that leaks creation time and helps attackers narrow brute force ranges. Use a CSPRNG.
  • Strict secondary sort requirements: Snowflake IDs sort by time, but two IDs created in the same millisecond on different machines are ordered by machine ID, not creation order. If you need true total ordering across a cluster, you need a Lamport clock or a sequencer service, not Snowflake.

Practical Takeaways

If you're picking an ID scheme today, use this checklist:

  1. Need it to fit in a 64-bit int and sort by time? Snowflake or Sonyflake.
  2. Want a string ID that's URL-safe and time-sortable? ULID or UUID v7.
  3. Already on MongoDB? Use ObjectId — it's free.
  4. Single writer, no sharding plan? Auto-increment is fine, don't over-engineer.
  5. Need cryptographic unpredictability? Generate random bytes, don't bolt randomness onto a timestamped scheme.

If you do build your own Snowflake generator, three rules will save you a 3 a.m. page:

  • Validate now >= lastTimestamp on every call. Crash loud rather than emit a duplicate.
  • Pick a recent epoch and never change it. Document the exact millisecond.
  • Assign machine IDs out-of-band (config server, environment variable, Kubernetes pod annotation). Two nodes with the same machine ID will silently collide.

Want to compare the wire formats side by side, or pull the timestamp out of an existing Discord/Twitter ID? The Snowflake ID Generator decodes any 64-bit Snowflake into its component bits, and the related ID generators (UUID, ULID, ObjectId) make it easy to feel the differences before committing to one in production.

FAQ

Should I use Snowflake or UUID v7 for new projects?

UUID v7 unless you have specific reasons for Snowflake. UUID v7 is standardized (RFC 9562, May 2024), 128 bits with built-in collision resistance, time-sortable, and universally supported by databases. Snowflake's win is fitting in a 64-bit int (smaller storage, faster index lookups), but most modern systems can absorb the extra 64 bits. Pick UUID v7 first; reach for Snowflake when 64 bits matters.

How many IDs can a single Snowflake node generate per second?

4,096 per millisecond, or roughly 4.1 million per second per node. With 1,024 machine IDs (10 bits), that's ~4.1 billion IDs per second cluster-wide. Twitter has never come close to that limit. If you exceed 4,096 in a millisecond, the generator busy-waits for the next ms, which adds latency but not duplicates.

What happens if my server's clock moves backwards?

If you naively trust the wall clock, you'll generate duplicate IDs — a catastrophic bug. The right responses: (1) detect and crash loudly (Twitter's choice), (2) use a monotonic clock source like CLOCK_MONOTONIC instead of wall clock, or (3) busy-wait if the skew is small (< 5ms). Never silently issue an ID with a backwards-moving timestamp.

Why pick a custom epoch instead of Unix epoch?

41 timestamp bits give you 69 years from your chosen epoch. If you use Unix epoch (1970), 40 of those years are already burned on history that predates your service. Pick a recent date — the day you ship — and you get the full 69-year window. Twitter's epoch is 2010-11-04; Discord's is 2015-01-01.

How do I assign machine IDs without a central coordinator?

Common patterns: environment variable injected by your orchestrator (Kubernetes pod annotation, AWS instance ID modulo 1024), Zookeeper/etcd ephemeral nodes that auto-release on disconnect, or a dedicated machine-ID service that just hands out ranges. Whatever you do, never let two running instances have the same machine ID — silent collision is the worst failure mode.

Are Snowflake IDs sortable across multiple machines?

Approximately, not strictly. IDs sort primarily by timestamp, but two IDs generated in the same millisecond on different machines are ordered by machine ID, not creation order. For most uses (chronological feeds, indexed queries) this is fine. For strict total ordering across a cluster, you need a Lamport clock or sequencer service, not Snowflake.

Can I use Snowflake IDs in a browser or mobile client?

Not safely — clients can't be assigned reliable machine IDs without a server round-trip, and you'd need to trust client clocks. For client-generated IDs, use UUID v4 (random) or UUID v7 (timestamp + random). If you really need Snowflake-shaped IDs from clients, generate them server-side via an API call.

Are Snowflake IDs predictable enough to be a security risk?

Yes — they encode timestamps and machine IDs visibly. An attacker can predict the next ID in a sequence within a few thousand candidates, and can determine creation time precisely. Don't use Snowflake for session tokens, password reset URLs, API keys, or anything where unpredictability matters. For those, generate random bytes from a CSPRNG.