Normalization

Normalization is the craft of deciding what goes in which table. The rules have intimidating names — first, second, third normal form — but they all chase one goal: store every fact exactly once. The fastest way to understand why is to work with a table that gets it wrong.

The seed gives you orders_flat, a webshop's entire order history crammed into a single wide table. Each row is one line item, with the customer and product details copied onto it.

sql

SELECT * FROM orders_flat ORDER BY order_id, product_name;

Twelve rows, and the same names, emails, and prices over and over. It looks harmless. It isn't.

Redundancy, measured

Count how many times each customer's details are repeated:

sql

SELECT customer_email, customer_name, count(*) AS copies
FROM orders_flat
GROUP BY customer_email, customer_name
ORDER BY copies DESC;

Ada's name and email exist in five separate rows. That's five chances for them to disagree — and the same goes for every product's price. The textbook name for what happens next is an anomaly, and there are three kinds.

Update anomalies

Ada changes her email address. An update has to touch every copy — and if it misses some, the database now quietly holds two "truths". Simulate a buggy update that only fixes order 101:

sql

UPDATE orders_flat
SET customer_email = 'ada.lovelace@example.com'
WHERE customer_email = 'ada@example.com' AND order_id = 101;
SELECT DISTINCT customer_name, customer_email
FROM orders_flat
WHERE customer_name = 'Ada Lovelace';

Two emails for one person, and no error anywhere. Which one is right? The table can't tell you. That's an update anomaly: redundant copies that can drift apart.

Put it back before we move on — note that the fix itself has to touch all five rows:

sql

UPDATE orders_flat
SET customer_email = 'ada@example.com'
WHERE customer_name = 'Ada Lovelace';

Insert and delete anomalies

Suppose the shop adds a webcam to the catalog. Where does it go? There's no row to put it in until somebody orders one:

sql

INSERT INTO orders_flat (product_name, product_price)
VALUES ('Webcam', 59.00);

Rejected — a product can't exist here without dragging a fake order along. That's an insert anomaly: you can't record one kind of fact without inventing another.

The mirror image is the delete anomaly. Look at Marie's orders:

Redundancy, measured

Update anomalies

Insert and delete anomalies

Keys: what identifies a row?

The normal forms, by example

The split: one table per thing

One-to-many: orders

Your turn: the many-to-many junction

The payoff

Pragmatic denormalization

What you learned