Today I learned about Snowflake. It is an algorithm for generating unique, global, chronological IDs in a distributed system. It was designed by the Twitter team.

The ID generated by snowflake is 64 bit long. Actually, the original implementation uses 63 bits as the last bit is used to keep the ID signed. The ID consists of few sections:

  • Timestamp — 41 bits — it's the number of milliseconds since selected epoch
  • Node number — 10 bits — used to avoid conflicts
  • Sequence number — 12 bits — per machine sequence number, it allows creating multiple IDs per millisecond, and it's zeroed every second.

The epoch can be selected by the developer. It's an important choice because it determines how much unique IDs can be generated. The timestamp is what makes the IDs sortable in chronological order.