Semantic IDs

How Residual Quantization breaks giant embedding tables into compact, hierarchical codebooks — dramatically reducing parameters while preserving representation quality.

The Problem: Giant Embedding Tables

In recommendation and search systems, every item (video, product, document) is assigned a unique embedding vector. The model learns one dense vector per item:

Item 1
0.12-0.340.560.91-0.23...0.78
Item 2
-0.230.450.67-0.180.52...-0.89
Item 3
0.34-0.560.780.41-0.67...0.12
Item N
-0.450.67-0.890.150.33...0.34
Parameters = N × D = 1,000,000 × 256 = 256,000,000
With 1 million items and 256-dimensional embeddings, the ID embedding table alone contains 256 million parameters — often the single largest component of the model. This scales linearly with the catalog size and cannot generalize to new or unseen items.

The Idea: Hierarchical Codebooks

Residual Quantization (RQ)

Instead of storing a unique vector for every item, we approximate each embedding as a sum of shared codebook vectors, selected level by level:

C1
K entries
Coarse
+
C2
K entries
Medium
+
C3
K entries
Fine
+
C4
K entries
Very Fine

The process works by quantizing the residual error at each level:

1 Quantize x to nearest entry in C1c1,   residual r1 = x − c1
2 Quantize r1 to nearest entry in C2c2,   residual r2 = r1 − c2
3 Quantize r2 to nearest entry in C3c3,   residual r3 = r2 − c3
4 Quantize r3 to nearest entry in C4c4,   residual r4 = r3 − c4
Reconstruction = c1 + c2 + c3 + c4   ≈   x

Semantic ID (a tuple of codebook indices) ID(x) = (idx1, idx2, idx3, idx4)
Traditional
N × D = 256,000,000
Semantic ID
L × K × D = 4 × 256 × 256 = 262,144

977× fewer parameters — shared codebooks replace per-item storage.

Interactive: Step-by-Step Residual Quantization

Watch how each codebook level refines the approximation in 2D. Choose a target vector and step through the quantization levels.

Level 1 / 4

Semantic ID

( _ , _ , _ , _ )

Reconstruction

Error ‖r‖

-

Why It Works: Residual Decomposition

Vector Stacking: Each Level Adds Finer Detail

The reconstruction is built by summing the codebook vectors chosen at each level. Each successive vector is smaller, capturing finer residual details.

Error Reduction by Level

Each codebook level reduces the remaining quantization error, with the coarsest level capturing the most information:

Summary

Traditional EmbeddingSemantic ID (RQ)
RepresentationOne unique D-dim vector per itemTuple of L codebook indices per item
ParametersN × D (scales with catalog)L × K × D (fixed, shared)
Example1M × 256 = 256M params4 × 256 × 256 = 262K params
New itemsRequires retrainingEncode with existing codebooks
StructureFlat, no hierarchyHierarchical: coarse → fine
Storage per itemD floats (256 × 4 = 1024 bytes)L integers (4 × 1 = 4 bytes)
Key insight: Semantic IDs trade a small amount of reconstruction error for massive parameter savings and a meaningful hierarchical structure. The first codebook index captures coarse similarity (genre, category), while deeper levels encode finer distinctions.