lesson

Data Modelling

Star schema, snowflake schema, normalisation, slowly changing dimensions, and wide tables.

Data Modelling

Dimensional Modelling

Kimball-style dimensional modelling organises data for analytics — optimised for reads, not writes.

Fact Tables

Store measurable events or metrics. Rows are thin — mostly foreign keys and measures.

fact_orders (order_id, date_key, customer_key, product_key, quantity, revenue, cost)

Dimension Tables

Descriptive context for facts. Wide and denormalised for easy filtering.

dim_customer (customer_key, name, email, city, country, segment, created_at)

Star Schema vs Snowflake Schema

Star	Snowflake
Dimensions	Denormalised (flat)	Normalised (split into sub-dims)
Query complexity	Simple JOINs	More JOINs
Storage	More redundancy	Less redundancy
BI tool performance	Faster (fewer joins)	Slower
Recommended	Most analytics cases	When dimension tables are very large

Prefer star schema for most data warehouse use cases — simpler queries, better BI tool performance.

Slowly Changing Dimensions (SCD)

How do you handle dimension changes over time? (e.g. customer changes city)

Type	Strategy	Use when
SCD Type 1	Overwrite old value	History not needed
SCD Type 2	Add new row with valid_from/valid_to	Full history needed
SCD Type 3	Add "previous_value" column	Only current + one prior value needed
SCD Type 4	Separate history table	High-change dimensions

SCD Type 2 example:

dim_customer (
  customer_key   SERIAL PRIMARY KEY,  -- surrogate key
  customer_id    INT,                  -- natural/business key
  name           TEXT,
  city           TEXT,
  valid_from     DATE NOT NULL,
  valid_to       DATE,                 -- NULL = current record
  is_current     BOOLEAN DEFAULT TRUE
)

Wide Tables (One Big Table)

Modern cloud warehouses (Snowflake, BigQuery, Redshift) handle wide tables well. Flattening joins into one wide table:

Eliminates join cost at query time

Works well with columnar storage (only touched columns are scanned)

Popular in dbt with final or mart layer

Tradeoff: harder to maintain, larger storage, schema changes require full refresh.

Normalisation

Normal Form	Rule
1NF	Atomic values, no repeating groups
2NF	1NF + no partial dependency on composite PK
3NF	2NF + no transitive dependency
BCNF	Stricter 3NF — every determinant is a candidate key

OLTP systems typically normalise to 3NF/BCNF to minimise update anomalies. OLAP systems intentionally denormalise for read performance.

Data Modelling

Dimensional Modelling

Fact Tables

Dimension Tables

Star Schema vs Snowflake Schema

Slowly Changing Dimensions (SCD)

Wide Tables (One Big Table)

Normalisation

Resources