## Curation Note
Database schema design determines application performance and scalability from day one. This skill synthesizes patterns from database engineering communities and real-world scaling experiences. The emphasis on proper normalization with denormalization guidance addresses the common mistake of over-engineering or under-engineering data models. The migration strategy section is crucial because schema changes are the riskiest deployments.
## Schema Design Process
### Step 1: Requirements Analysis
```markdown
## Data Requirements
### Entities
- What things need to be stored?
- What are their attributes?
### Relationships
- How do entities relate?
- Cardinality (1:1, 1:N, N:M)?
### Access Patterns
- What queries will be common?
- Read-heavy or write-heavy?
- Need for real-time or batch?
### Scale Expectations
- Current data volume?
- Growth rate?
- Concurrent users?
```
### Step 2: Normalization
```sql
-- 1NF: Atomic values, no repeating groups
-- Bad
CREATE TABLE orders (
id INT,
items VARCHAR(255) -- "item1,item2,item3"
);
-- Good
CREATE TABLE orders (
id INT PRIMARY KEY
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT REFERENCES orders(id),
item_name VARCHAR(255)
);
```
```sql
-- 2NF: No partial dependencies
-- 3NF: No transitive dependencies
-- Example: E-commerce schema
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id) ON DELETE CASCADE,
street VARCHAR(255),
city VARCHAR(100),
state VARCHAR(50),
country VARCHAR(50),
postal_code VARCHAR(20),
is_default BOOLEAN DEFAULT FALSE
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
inventory_count INT DEFAULT 0
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
shipping_address_id INT REFERENCES addresses(id),
status VARCHAR(50) DEFAULT 'pending',
total DECIMAL(10,2),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(id) ON DELETE CASCADE,
product_id INT REFERENCES products(id),
quantity INT NOT NULL,
unit_price DECIMAL(10,2) NOT NULL
);
```
### Step 3: Indexing Strategy
```sql
-- Primary access patterns
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created ON orders(created_at DESC);
-- Composite indexes for common queries
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- Partial indexes for filtered queries
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- Covering indexes for read-heavy queries
CREATE INDEX idx_products_listing ON products(id, name, price);
```
### Step 4: Strategic Denormalization
```sql
-- Denormalize when:
-- 1. Read frequency >> Write frequency
-- 2. Join complexity hurts performance
-- 3. Data rarely changes
-- Example: Store total on parent (updated via trigger)
CREATE OR REPLACE FUNCTION update_order_total()
RETURNS TRIGGER AS $$
BEGIN
UPDATE orders
SET total = (
SELECT SUM(quantity * unit_price)
FROM order_items
WHERE order_id = NEW.order_id
)
WHERE id = NEW.order_id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER order_items_total_trigger
AFTER INSERT OR UPDATE OR DELETE ON order_items
FOR EACH ROW EXECUTE FUNCTION update_order_total();
```
## Migration Strategy
```sql
-- Safe migration pattern
-- 1. Add new column as nullable
ALTER TABLE users ADD COLUMN new_email VARCHAR(255);
-- 2. Backfill data
UPDATE users SET new_email = email;
-- 3. Add constraints
ALTER TABLE users ALTER COLUMN new_email SET NOT NULL;
CREATE UNIQUE INDEX CONCURRENTLY idx_users_new_email ON users(new_email);
-- 4. Swap columns in application code
-- 5. Drop old column (after verification)
ALTER TABLE users DROP COLUMN email;
ALTER TABLE users RENAME COLUMN new_email TO email;
```
## Best Practices
1. **Always use foreign keys** - Enforce referential integrity
2. **Index foreign keys** - Required for JOIN performance
3. **Use appropriate types** - BIGINT for IDs, TIMESTAMPTZ for times
4. **Add created_at/updated_at** - Essential for debugging
5. **Consider soft deletes** - Add `deleted_at` column
6. **Plan for migrations** - Never make breaking changes
7. **Document schema** - Use comments on tables/columns
## Related Resources
- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
- [Use The Index, Luke](https://use-the-index-luke.com/)