Semantic IDs

How Residual Quantization breaks giant embedding tables into compact, hierarchical codebooks — dramatically reducing parameters while preserving representation quality.

The Problem: Giant Embedding Tables

In recommendation and search systems, every item (video, product, document) is assigned a unique embedding vector. The model learns one dense vector per item:

Item 1 →

0.12-0.340.560.91-0.23...0.78

Item 2 →

-0.230.450.67-0.180.52...-0.89

Item 3 →

0.34-0.560.780.41-0.67...0.12

⋮

Item N →

-0.450.67-0.890.150.33...0.34

Parameters = N × D = 1,000,000 × 256 = 256,000,000

With 1 million items and 256-dimensional embeddings, the ID embedding table alone contains 256 million parameters — often the single largest component of the model. This scales linearly with the catalog size and cannot generalize to new or unseen items.

The Idea: Hierarchical Codebooks

Residual Quantization (RQ)

Instead of storing a unique vector for every item, we approximate each embedding as a sum of shared codebook vectors, selected level by level:

C₁

K entries
Coarse

C₂

K entries
Medium

C₃

K entries
Fine

C₄

K entries
Very Fine

The process works by quantizing the residual error at each level:

1 Quantize x to nearest entry in C₁ → c₁, residual r₁ = x − c₁

2 Quantize r₁ to nearest entry in C₂ → c₂, residual r₂ = r₁ − c₂

3 Quantize r₂ to nearest entry in C₃ → c₃, residual r₃ = r₂ − c₃

4 Quantize r₃ to nearest entry in C₄ → c₄, residual r₄ = r₃ − c₄

Reconstruction x̂ = c₁ + c₂ + c₃ + c₄ ≈ x

Semantic ID (a tuple of codebook indices) ID(x) = (idx₁, idx₂, idx₃, idx₄)

Traditional
N × D = 256,000,000

Semantic ID
L × K × D = 4 × 256 × 256 = 262,144

977× fewer parameters — shared codebooks replace per-item storage.

Interactive: Step-by-Step Residual Quantization

Watch how each codebook level refines the approximation in 2D. Choose a target vector and step through the quantization levels.

Level 1 / 4

Semantic ID

( _ , _ , _ , _ )

Reconstruction

Error ‖r‖

Why It Works: Residual Decomposition

Vector Stacking: Each Level Adds Finer Detail

The reconstruction is built by summing the codebook vectors chosen at each level. Each successive vector is smaller, capturing finer residual details.

Error Reduction by Level

Each codebook level reduces the remaining quantization error, with the coarsest level capturing the most information:

Summary

	Traditional Embedding	Semantic ID (RQ)
Representation	One unique D-dim vector per item	Tuple of L codebook indices per item
Parameters	N × D (scales with catalog)	L × K × D (fixed, shared)
Example	1M × 256 = 256M params	4 × 256 × 256 = 262K params
New items	Requires retraining	Encode with existing codebooks
Structure	Flat, no hierarchy	Hierarchical: coarse → fine
Storage per item	D floats (256 × 4 = 1024 bytes)	L integers (4 × 1 = 4 bytes)

Key insight: Semantic IDs trade a small amount of reconstruction error for massive parameter savings and a meaningful hierarchical structure. The first codebook index captures coarse similarity (genre, category), while deeper levels encode finer distinctions.