ToolsOps

UUID v4 and v7: when to use each and how to validate

What a UUID is, how v4 (random) differs from v7 (time-ordered), when to use each as a primary key, how to validate and normalize UUIDs and common pitfalls when handling identifiers.

What a UUID is

A UUID (Universally Unique Identifier) is a 128-bit identifier designed to be unique without coordination between machines. It is usually represented as 32 hexadecimal characters in 8-4-4-4-12 groups separated by dashes (for example, 550e8400-e29b-41d4-a716-446655440000). The current standard is RFC 9562, published in 2024 as the successor to RFC 4122; it stays compatible with all existing UUIDs and adds the new versions v6, v7 and v8.

Of the eight defined types, two cover essentially all modern real usage: v4 (random) and v7 (timestamp + random, time-orderable). The rest target specific use cases (v3/v5 derive a deterministic UUID from a namespace and a name, v1 embeds the timestamp and the issuer's node field, v6 is a time-orderable variant predating v7 that is now obsolete, v8 is fully custom).

UUID v4: random and widely supported

v4 is the universal default. 122 bits of cryptographic randomness (the other 6 encode version and variant), recognizable canonical format and native support across all platforms (Postgres has gen_random_uuid(), Java has UUID.randomUUID(), Node has crypto.randomUUID()). If you only need a unique identifier and order does not matter, v4 is the lowest-friction option.

The entropy is astronomically high. The possible value space is about 5.3 × 10^36. For the collision probability to reach 0.0001%, you would have to generate billions of UUIDs per second for decades. In practice, as long as the generator uses a cryptographic source (Web Crypto, /dev/urandom, BCryptGenRandom), collisions are not a real risk.

UUID v7: time-orderable

v7 reserves the 48 most significant bits for a Unix millisecond big-endian timestamp. The result is a lexicographically orderable UUID: if you generate two v7 UUIDs seconds apart, the more recent one is always greater alphabetically. The remainder of the value (the 74 bits that are not timestamp, version or variant) is random, so two UUIDs generated within the same millisecond are still distinct.

The main motivation is database primary keys. Postgres, MySQL, SQLite and other relational engines use B-tree indexes for PKs. With v4, each insert lands at a random position in the index, producing frequent page splits and fragmentation. With v7, inserts are near-sequential: new rows almost always land at the end of the index, improving cache locality, reducing splits and lowering I/O cost on large tables. It is the pattern ULID popularized and the UUID standard adopted in RFC 9562.

UUID v4 vs UUID v7: comparison

Aspectv4v7
Structure128 random bits (122 effective)48 ms timestamp bits + 74 random bits
Time-orderableNoYes (lexicographically)
B-tree index localityPoor (random insertion)Good (near-sequential insertion)
Reveals creation timeNoYes (ms precision)
Native support in libs/DBsUniversalGrowing since 2024
RFCRFC 4122 / RFC 9562 §5.4RFC 9562 §5.7

When to use UUID in databases

The choice between an autoincrement integer and a UUID as primary key depends on three axes:

  • Distributed systems: if you need to generate IDs on multiple servers or on the client without coordination, UUID is the answer. An autoincrement integer requires a server round-trip or coordinated sequences.
  • Public exposure of the ID: an autoincrement integer leaks order and volume (a customer with ID 1234 implies there are about 1234 customers). A UUID does not.
  • Storage and index cost: a UUID takes 16 bytes (Postgres stores it binary, not as a string), versus 4-8 bytes for an integer. In tables with billions of rows the difference matters. v4 also fragments the index; v7 less so.

Pragmatic recommendation: use v7 as PK when the system needs distributable IDs and you do not want to pay the v4 fragmentation tax. Keep autoincrement integers when they already work and you do not have the distributed-generation problem.

Validate and normalize UUIDs

The RFC 9562 canonical form is lowercase with dashes in the 8-4-4-4-12 layout. Any serious system should normalize external UUIDs to that form on receipt: when comparing UUIDs in a table, in a cookie or in a URL, a casing mismatch can produce false negatives. The ToolsOps UUID generator ships a Validate mode that accepts input with or without dashes, in uppercase or lowercase, and returns the version, the variant and the normalized canonical.

The version is read from the high nibble of the 7th byte (char 14 with dashes); the variant is read from the high bits of the 9th byte (char 19). The actually used subset in production is variant RFC 4122 (bits 10xx), which covers all versions a modern system emits or consumes.

Common pitfalls

  • Using v4 as the primary key of a large table and finding out late about index fragmentation. If you arrive late, consider time-partitioned tables before regenerating IDs.
  • Treating the canonical format as a secret when it is only structural. v4 has enough entropy but its shape is recognizable; for secrets prefer crypto.getRandomValues(new Uint8Array(32)) in base64url.
  • Generating UUIDs with Math.random(). Without a cryptographic source real collisions are possible and the IDs are predictable. Always use Web Crypto or the runtime equivalent.
  • Accepting v1 as input without knowing what it does with the node field. If your system processes external UUIDs, better restrict to v4 and v7 on purpose.
  • Assuming a v7 has sub-millisecond precision. Only the high 48 bits are timestamp; two UUIDs generated in the same ms differ only in the random bits and their relative order is arbitrary.
  • Persisting UUIDs as strings in a large table without converting to binary. Postgres and MySQL store UUID natively as 16 binary bytes, half the size of a VARCHAR(36) and faster to index.

How to use the ToolsOps generator

The ToolsOps UUID generator and decoder has three modes. Generate lets you choose v4 or v7 and a count between 1 and 100, with options for uppercase and for stripping dashes for non-canonical formats. Validate accepts a UUID and reports version and variant. Decode accepts a v7 and shows the embedded timestamp as ISO UTC, local time and relative age.

Generation runs 100% in the browser via the Web Crypto API. Neither the produced UUIDs nor the input you paste in Validate or Decode is sent to a server. To validate signatures or other authenticated content, the sibling tool is the JWT decoder and verifier. To verify download file integrity, the hash and checksum calculator.

Frequently asked questions

Why is v7 said to be better than v4 as a primary key?
Relational databases like Postgres and MySQL use B-tree indexes for primary keys. v4 inserts rows at random positions inside the index, causing fragmentation, frequent page splits and poor cache locality. v7 inserts near-sequentially (the first 48 bits are a timestamp), so new rows almost always go to the end of the index. In large tables this translates to less I/O and less fragmentation.
Can a v4 UUID repeat?
Theoretically yes, practically no. 122 effective bits give about 5.3 × 10^36 possible values. For the collision probability to reach 0.0001%, you would have to generate billions of UUIDs per second for decades. If your generator uses a cryptographic source (Web Crypto, /dev/urandom, BCryptGenRandom), collisions are not a real risk.
Is it safe to use a v4 UUID as a session token?
It has enough entropy (122 bits), but the canonical 8-4-4-4-12 lowercase-with-dashes format is very recognizable. If it shows up in a log or URL, it screams its origin. For session secrets prefer an opaque value generated with `crypto.getRandomValues(new Uint8Array(32))` encoded in base64url (256 bits, no recognizable shape). For row or record identifiers, v4 is fine.
Why does this tool not generate v1?
v1 encodes the generation timestamp and the node field, which originally carried the issuer's MAC. Even though RFC 9562 §5.1 now allows randomizing the node field, the data remains ambiguous: receiving a v1 from outside, you do not know if it reveals the server's MAC or not. v7 covers the real use case (time-ordered UUID) without that ambiguity, so I would rather not promote v1 adoption.
What if I receive a UUID in uppercase or without dashes?
RFC 9562 canonicalizes UUID to lowercase with dashes but accepts uppercase as input. The reasonable move is to normalize external UUIDs to canonical lowercase on receipt: when comparing UUIDs (in a table, in a cookie), a casing mismatch can produce false negatives. The tool always normalizes during validate.
How does UUID v7 compare with ULID?
ULID is an identifier similar to v7 with a different layout (48 bits ms timestamp + 80 random bits) and Crockford Base32 encoding instead of hex. The concept is practically identical: time-orderable, unique, no personal data. v7 has the advantage of being a standard UUID (all databases support it as a native uuid type); ULID requires storage as a string or an adapter. If you are starting a new project and want time-ordered IDs, v7 is the lower-friction choice.
Do I have to migrate my existing v4 UUIDs to v7?
No. Mixing v4 and v7 in the same column works; only future inserts will be more localized. If your table is small or fragmentation is not a bottleneck, the cost of migrating (regenerating IDs, updating foreign keys) does not pay off. If you are starting a project or rewriting the data model, v7 is the reasonable default.
Does the library I use already emit v7?
Depends on the version. Some reference points: `uuid` on npm added v7 in version 10. Python 3.11 still does not have it natively (you need `uuid7` from pip). PostgreSQL ships `gen_random_uuid()` for v4; v7 requires an extension or custom function. Java has `UUID.randomUUID()` (v4); for v7 you need a third-party library. Before pinning v7 as a contract, verify your stack emits and reads it correctly.